Solving WSL2 GPU Passthrough Issues for AMD Radeon on Windows 11

Data scientists and Windows developers attempting to leverage AMD hardware for machine learning frequently hit a wall when transitioning from Windows to WSL2. You install a high-end Radeon GPU, initialize an Ubuntu subsystem, and install your ML frameworks, only to be met with No devices found errors, rocminfo failures, or persistent segmentation faults when invoking tensors.

Unlike NVIDIA’s tightly integrated CUDA-on-WSL pipeline, achieving stable WSL2 AMD GPU passthrough requires navigating a fragmented driver architecture. This guide details the exact engineering steps to stabilize ROCm on Windows 11, configure your data science WSL2 setup, and correctly bridge your Radeon GPU into a Linux environment.

The Root Cause: Paravirtualization Conflicts and WDDM

To fix the driver crashes, you must first understand how WSL2 handles hardware acceleration. WSL2 does not use traditional PCIe passthrough (like VFIO in KVM). Instead, Microsoft implements GPU Paravirtualization (GPU-PV).

The Windows Display Driver Model (WDDM) host manages the physical hardware. It exposes a virtualized /dev/dxg device to the Linux guest. When developers follow standard Linux ROCm installation guides inside WSL2, they inadvertently install the amdgpu-dkms package. This package contains the bare-metal Linux kernel driver.

When the bare-metal kernel driver attempts to interface directly with the virtualized WSL PCIe bus, it collides with Microsoft's dxgkrnl (DirectX kernel bridge). The driver initialization fails silently, the Radeon Linux container loses its hardware mapping, and your ML scripts default back to the CPU.

The Fix: User-Space ROCm Integration

The solution is to decouple the ROCm installation. We must rely on the Windows host for kernel-space hardware communication while installing only the user-space ROCm runtime inside the WSL2 Ubuntu subsystem.

Step 1: Prepare the Windows 11 Host

Your Windows host must be running a WDDM 3.0+ compliant driver. WSL2 GPU passthrough relies entirely on the host's driver architecture.

Update to the latest AMD Adrenalin or AMD PRO Edition drivers on your Windows host.
Force a WSL kernel update to ensure you have the latest dxgkrnl patches. Open an elevated PowerShell prompt:

wsl --update
wsl --shutdown

Step 2: Purge Broken Installations in WSL2

Boot into your WSL2 Ubuntu terminal. Before applying the correct setup, completely purge any existing, broken amdgpu or ROCm installations that are causing module conflicts.

sudo amdgpu-install --uninstall
sudo apt-get purge "rocm*" "amdgpu*"
sudo apt-get autoremove
sudo rm -rf /opt/rocm /var/lib/dkms/amdgpu

Step 3: Install WSL-Compatible ROCm (User-Space Only)

We will add the AMD ROCm repository, but we will explicitly prevent the installation of the DKMS kernel modules. This allows the WDDM bridge to function uninterrupted.

# Add the ROCm repository (targeting ROCm 6.0 as the current stable standard)
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/rocm.gpg
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.0.2 focal main" | sudo tee /etc/apt/sources.list.d/rocm.list

sudo apt-get update

# IMPORTANT: Install only the user-space libraries and development headers.
# Do NOT install the 'amdgpu-dkms' package.
sudo apt-get install rocm-dev rocm-libs rocminfo

Step 4: Configure Permissions and Environment Variables

For the ROCm runtime to communicate with the virtualized GPU nodes (/dev/kfd and /dev/dri), your Linux user must belong to the video and render groups.

sudo usermod -aG video $USER
sudo usermod -aG render $USER

Next, map the ROCm binaries to your system path. Append the following to your ~/.bashrc or ~/.zshrc:

export PATH=/opt/rocm/bin:/opt/rocm/opencl/bin:$PATH
export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
# Force ROCm to utilize the PCIe atomics required by WSL2
export HSA_ENABLE_SDMA=0

Apply the changes (source ~/.bashrc) and verify the hardware bridge:

rocminfo

If successful, this command will output your GPU architecture without throwing a segmentation fault or "Failed to get device" error.

Deep Dive: Enabling PyTorch and Architecture Overrides

Modern ML frameworks like PyTorch compile their backend binaries against specific AMD architectures (typically CDNA enterprise cards like the MI200 series, or RDNA3 flagship cards like the RX 7900 XTX - gfx1100).

If you are running a consumer-grade RDNA2 card (e.g., RX 6700 XT) in your data science WSL2 setup, PyTorch will fail to recognize the GPU even if rocminfo sees it. This is due to an architecture string mismatch in the HIP (Heterogeneous-compute Interface for Portability) compiler.

To bypass this, you must spoof the target architecture using the HSA_OVERRIDE_GFX_VERSION environment variable.

Setting the Architecture Override

If you have an RDNA2 card (RX 6000 series), append this to your shell profile:

# Spoofs the architecture to the fully supported gfx1030 (RX 6800/6900)
export HSA_OVERRIDE_GFX_VERSION=10.3.0

Installing and Verifying PyTorch for ROCm

With the architecture overridden and the user-space runtime isolated, you can safely install the ROCm-compiled version of PyTorch.

# Create a modern Python virtual environment
python3 -m venv ~/ml-env
source ~/ml-env/bin/activate

# Install PyTorch specifically compiled for ROCm
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

Verify the tensor allocation mechanism with a Python script:

import torch

def verify_amd_passthrough():
    print(f"PyTorch Version: {torch.__version__}")
    if torch.cuda.is_available():
        print("Success: ROCm backend is successfully bridging to WSL2.")
        print(f"Device Name: {torch.cuda.get_device_name(0)}")
        
        # Test tensor allocation on the GPU
        x = torch.rand(5, 5).cuda()
        print("Tensor allocation successful on Radeon GPU.")
    else:
        print("Failure: GPU not detected. Check WDDM driver or HSA_OVERRIDE variables.")

if __name__ == "__main__":
    verify_amd_passthrough()

Common Pitfalls and Edge Cases

1. WSL2 OOM (Out of Memory) Kills

By default, Windows limits WSL2 to 50% of total system RAM. Large language models (LLMs) or heavy batch sizes will cause Linux to run out of memory and silently terminate the Python process, often masquerading as a GPU driver crash.

Fix: Override the memory limit globally. Create or edit C:\Users\<YourUsername>\.wslconfig on the Windows host:

[wsl2]
memory=24GB
swap=8GB

Restart WSL (wsl --shutdown) for this to take effect.

2. The DirectML Fallback

If you are operating on older architecture (e.g., Vega or RDNA1) where native ROCm on Windows simply cannot compile the necessary HIP kernels, you must pivot to Microsoft's DirectML stack. DirectML bypasses ROCm entirely and translates ML graphs into DirectX 12 calls.

While slower than native ROCm, it is highly stable for legacy hardware. You can implement it by installing torch-directml instead of the ROCm PyTorch wheels.

Conclusion

Solving AMD GPU passthrough issues on WSL2 requires a fundamental shift in how you view the driver stack. By treating WSL2 as a paravirtualized environment, avoiding bare-metal kernel modules, and utilizing environment variable overrides, you can establish a highly performant, stable Radeon data science environment directly on Windows 11. Strict adherence to user-space ROCm libraries ensures that the Windows Direct X kernel successfully manages the hardware, leaving your Linux subsystem free to execute complex machine learning workloads.

Programming Tutorials

Search This Blog