Skip to main content

Posts

Showing posts with the label Transformers

Solving 'OSError: We couldn't connect to huggingface.co' in Offline Mode

  Nothing stops a production deployment faster than an unexpected network call in an environment designed to be isolated. You have carefully containerized your machine learning inference service, verified the model files are inside the Docker image, and deployed it to an air-gapped Kubernetes cluster. Yet, upon startup, the application crashes with the dreaded error: OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like <model_id> is not the path to a directory containing a file named config.json. This error is misleading. Often, the files  are  there, but the library is prioritizing a network handshake over the local filesystem. This guide covers the root cause of this behavior in the Hugging Face  transformers  library and provides the production-grade configuration to enforce offline execution. Root Cause Analysis: Why  from_pretrained  Pings the Internet To solve th...

How to Fix "CUDA Out of Memory" When Fine-Tuning Llama 3 with LoRA

  You have successfully downloaded the Llama 3 8B weights. You set up your PyTorch environment, configured a basic LoRA adapter, and launched the training script on your RTX 3090 or Google Colab T4. Then, before the first epoch even starts, you hit the wall:  RuntimeError: CUDA out of memory. Tried to allocate... This is the single most common barrier for engineers moving from using LLMs to fine-tuning them. It is frustrating because the math  seems  like it should work. If Llama 3 8B is roughly 15GB in half-precision (FP16), why does it crash a 24GB or even 40GB card? The answer lies in the hidden memory overhead of the training process itself. This guide provides a root cause analysis of VRAM consumption and a production-grade code solution using QLoRA (Quantized LoRA) to fit Llama 3 training pipelines onto consumer hardware. The Root Cause: Where Did the VRAM Go? To fix the OOM (Out Of Memory) error, you must understand what consumes GPU memory. It is not just the...