Programming Tutorials

Posts

Showing posts with the label Hugging Face

Fix 'pull model manifest: 429' Rate Limit Error in Ollama

You provision a new instance for AI model deployment, initiate a 40GB model pull, and watch the progress bar climb. Suddenly, the transfer halts mid-stream. The terminal throws a fatal error: pull model manifest: 429 Too Many Requests . This HTTP 429 error is a hard block preventing DevOps teams and data scientists from provisioning local large language models (LLMs). Resolving the Ollama pull model manifest 429 error requires understanding network egress architecture and implementing authenticated retrieval pipelines. Understanding the Root Cause of the 429 Error The 429 Too Many Requests status code indicates that the client has exceeded the rate limit imposed by the upstream server. When pulling models natively via Ollama from external registries like Hugging Face (e.g., ollama pull hf.co/user/model ), you are subject to the Hugging Face Hub's API limits. By default, unauthenticated requests to the Hugging Face Hub are heavily rate-limited based on the...

Fixing Llama 3.1 Fine-Tuning Errors: The Padding Token & `eot_id` Trap

You have curated a high-quality instruction dataset. You have set up your QLoRA config. You launch SFTTrainer , and within seconds, your training loop crashes with an IndexError: index out of range , or worse, your loss flatlines at 0.0 or NaN . This is the most common bottleneck engineers face when migrating from Llama 2 to Llama 3 or 3.1. The issue isn't your dataset quality; it is a fundamental misalignment between the Llama 3.1 tokenizer’s special tokens, the default padding behavior in Hugging Face’s transformers library, and how the model interprets "End of Turn" versus "End of Text." This guide details the root cause of these convergence failures and provides the production-grade code required to fix them. The Root Cause: Why Llama 3.1 Breaks Standard Pipelines The Llama 3 family introduced a massive vocabulary expansion (128k tokens) and a shift in special token usage. In older models (and Llama 2), the End of Sentence (EOS) ...