Programming Tutorials

Posts

Showing posts with the label Linux

Fixing 'Ollama Not Using GPU' in Docker: A Guide for WSL2 and Linux

There are few things more frustrating in AI engineering than watching a powerful Llama 3 or Mistral model crawl at 0.5 tokens per second. You have an RTX 3090 or a hefty server GPU, yet your Dockerized Ollama instance insists on burning up your CPU cores instead. If you are running Ollama inside a Docker container and it fails to detect your NVIDIA GPU, the issue is rarely with Ollama itself. The problem lies in the isolation layer between the Docker daemon and the host kernel’s graphics drivers. This guide provides the architectural root cause and the specific, copy-paste configurations required to force GPU passthrough on both native Linux and WSL2 environments. The Root Cause: Why Docker Isolates Your GPU To fix the issue, you must understand the "gap" in the architecture. Docker containers share the host's OS kernel but maintain their own user space (filesystem, libraries, and binaries). By default, a container acts as a clean slate. It does not have access to the h...