Skip to main content

Posts

Showing posts with the label MLX

How to Fix PyTorch and MLX GPU Utilization Issues on Apple Silicon (M1/M2/M3)

  Running a local LLM Apple Silicon environment should be blazingly fast given the memory bandwidth of modern Mac hardware. Yet, developers frequently encounter inference speeds of just 1-2 tokens per second, accompanied by maxed-out CPU cores and an entirely idle GPU. This bottleneck occurs because standard Python environments and machine learning libraries do not default to Apple's Metal API. Resolving this requires explicitly configuring your code to utilize Metal Performance Shaders Python bindings or adopting Apple's specialized array framework. The Root Cause: Why macOS Defaults to CPU In the established AI/ML ecosystem, Nvidia's CUDA is the default backend for hardware acceleration. When a framework like PyTorch cannot locate a CUDA-enabled GPU, its fallback mechanism defaults directly to the CPU. Apple Silicon operates on a completely different architecture using Metal Performance Shaders (MPS). PyTorch does support MPS, but it requires specific build parameters, an...