The fastest way to get started is with the CPU image:
docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
# Or with Podman:podman run -p 8080:8080 --name local-ai -ti localai/localai:latest
This will:
Start LocalAI (you’ll need to install models separately)
Make the API available at http://localhost:8080
Image Types
LocalAI provides several image types to suit different needs. These images work with both Docker and Podman.
Standard Images
Standard images don’t include pre-configured models. Use these if you want to configure models manually.
CPU Image
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 localai/localai:latest
GPU Images
NVIDIA CUDA 13:
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-13
NVIDIA CUDA 12:
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-12
AMD GPU (ROCm):
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-gpu-hipblas
Intel GPU:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-gpu-intel
Vulkan:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
NVIDIA Jetson (L4T ARM64):
CUDA 12 (for Nvidia AGX Orin and similar platforms):
docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64
CUDA 13 (for Nvidia DGX Spark):
docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
All-in-One (AIO) Images
Recommended for beginners - These images come pre-configured with models and backends, ready to use immediately.
CPU Image
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
GPU Images
NVIDIA CUDA 13:
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-13
NVIDIA CUDA 12:
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-12
AMD GPU (ROCm):
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-aio-gpu-hipblas
Intel GPU:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-aio-gpu-intel
Using Compose
For a more manageable setup, especially with persistent volumes, use Docker Compose or Podman Compose:
Using CDI (Container Device Interface) - Recommended for NVIDIA Container Toolkit 1.14+
The CDI approach is recommended for newer versions of the NVIDIA Container Toolkit (1.14 and later). It provides better compatibility and is the future-proof method:
version: "3.9"services:
api:
image: localai/localai:latest-aio-gpu-nvidia-cuda-12# For CUDA 13, use: localai/localai:latest-aio-gpu-nvidia-cuda-13healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1mtimeout: 20mretries: 5ports:
- 8080:8080environment:
- DEBUG=falsevolumes:
- ./models:/models:cached# CDI driver configuration (recommended for NVIDIA Container Toolkit 1.14+)# This uses the nvidia.com/gpu resource APIdeploy:
resources:
reservations:
devices:
- driver: nvidia.com/gpucount: allcapabilities: [gpu]
Save this as compose.yaml and run:
docker compose up -d
# Or with Podman:podman-compose up -d
Using Legacy NVIDIA Driver - For Older NVIDIA Container Toolkit
If you are using an older version of the NVIDIA Container Toolkit (before 1.14), or need backward compatibility, use the legacy approach:
To persist models and configurations, mount a volume:
docker run -ti --name local-ai -p 8080:8080 \
-v $PWD/models:/models \
localai/localai:latest-aio-cpu
# Or with Podman:podman run -ti --name local-ai -p 8080:8080 \
-v $PWD/models:/models \
localai/localai:latest-aio-cpu
Or use a named volume:
docker volume create localai-models
docker run -ti --name local-ai -p 8080:8080 \
-v localai-models:/models \
localai/localai:latest-aio-cpu
# Or with Podman:podman volume create localai-models
podman run -ti --name local-ai -p 8080:8080 \
-v localai-models:/models \
localai/localai:latest-aio-cpu
What’s Included in AIO Images
All-in-One images come pre-configured with:
Text Generation: LLM models for chat and completion
Image Generation: Stable Diffusion models
Text to Speech: TTS models
Speech to Text: Whisper models
Embeddings: Vector embedding models
Function Calling: Support for OpenAI-compatible function calling
The AIO images use OpenAI-compatible model names (like gpt-4, gpt-4-vision-preview) but are backed by open-source models. See the container images documentation for the complete mapping.
Next Steps
After installation:
Access the WebUI at http://localhost:8080
Check available models: curl http://localhost:8080/v1/models
For AMD: Ensure devices are accessible: ls -la /dev/kfd /dev/dri
NVIDIA Container fails to start with “Auto-detected mode as ’legacy’” error
If you encounter this error:
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: invalid expression
This indicates a Docker/NVIDIA Container Toolkit configuration issue. The container runtime’s prestart hook fails before LocalAI starts. This is not a LocalAI code bug.
Solutions:
Use CDI mode (recommended): Update your docker-compose.yaml to use the CDI driver configuration:
Upgrade NVIDIA Container Toolkit: Ensure you have version 1.14 or later, which has better CDI support.
Check NVIDIA Container Toolkit configuration: Run nvidia-container-cli --query-gpu to verify your installation is working correctly outside of containers.
Verify Docker GPU access: Test with docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
Models not downloading
Check internet connection
Verify disk space: df -h
Check container logs for errors: docker logs local-ai or podman logs local-ai