If you are building autonomous agents with local LLMs like Llama 3 (via Ollama) and LangChain, you have likely encountered the infamous OutputParserException or JSONDecodeError.
The scenario is almost always the same: You prompt your agent to return structured data for a tool call. The model generates 99% correct output, but fails on a trailing comma, a missing quote, or by wrapping the JSON in Markdown backticks. Your agent crashes, and your workflow breaks.
While GPT-4 is generally compliant with strict JSON syntax, quantized local models (like Llama 3 8B) trade precision for speed and memory efficiency. This article details the root cause of these parsing failures and provides a production-grade, code-first solution to sanitize and parse "dirty" JSON from local models using LangChain.
The Root Cause: Why Llama 3 Struggles with Strict JSON
To fix the problem, we must understand why it happens. The issue usually stems from three distinct behaviors in local models:
- Quantization Noise: Most developers run
llama3:8b-instruct-q4_k_m. The quantization process (compressing weights from 16-bit to 4-bit) slightly degrades the model's ability to adhere to rigid syntactic rules over long context windows. - The "Helpfulness" Alignment: Instruct models are trained to be chatty. Even when told to only output JSON, Llama 3 often adds a preamble ("Here is the JSON you requested:") or wraps the output in markdown code blocks (
```json ... ```). - JavaScript vs. JSON Confusion: The training data includes vast amounts of JavaScript. In JavaScript objects, trailing commas are valid (
{ "key": "value", }). In strict JSON, trailing commas are illegal. Llama 3 frequently conflates the two.
Standard Python json.loads() and LangChain's default parsers are strict. They do not tolerate these deviations, leading to immediate pipeline failures.
The Fix: A Robust, Self-Healing Output Parser
We cannot rely on prompt engineering alone to guarantee valid JSON from an 8B model. Instead, we must implement a Robust Output Parser that sanitizes the raw text before attempting to parse it.
We will build a custom LangChain parser that handles:
- Stripping Markdown backticks.
- Removing preambles and postscripts.
- Fixing common syntax errors (like trailing commas) using regex or the
json_repairstrategy.
Prerequisites
Ensure you have your environment set up with the latest LangChain core and Ollama integration.
pip install langchain langchain-ollama pydantic
The Implementation
Below is a complete, copy-pasteable implementation using Python 3.10+, Pydantic v2, and LangChain Expression Language (LCEL).
We will create a RobustJsonParser class that extends BaseOutputParser.
import json
import re
from typing import Any, Type, TypeVar
from langchain_core.output_parsers import BaseOutputParser
from langchain_core.exceptions import OutputParserException
from pydantic import BaseModel, Field
T = TypeVar("T", bound=BaseModel)
class RobustJsonParser(BaseOutputParser[T]):
"""
A robust parser that cleans LLM output before attempting Pydantic validation.
Handles Markdown backticks, trailing commas, and preambles.
"""
pydantic_object: Type[T]
def parse(self, text: str) -> T:
try:
# 1. Attempt strict parsing first (fastest path)
clean_text = text.strip()
return self.pydantic_object.model_validate_json(clean_text)
except Exception:
# 2. If strict parsing fails, enter repair mode
return self._repair_and_parse(text)
def _repair_and_parse(self, text: str) -> T:
# Step A: Extract JSON from Markdown code blocks
match = re.search(r"```json\s*([\s\S]*?)\s*```", text)
if match:
text = match.group(1)
else:
# Attempt to find the first '{' and last '}' to strip preambles
start = text.find("{")
end = text.rfind("}")
if start != -1 and end != -1:
text = text[start : end + 1]
# Step B: Fix Trailing Commas (Common Llama 3 error)
# Regex looks for a comma followed by closing brace/bracket and removes it
text = re.sub(r",\s*([\]}])", r"\1", text)
# Step C: Fix missing quotes around keys (if necessary)
# This is a basic heuristic for keys that are purely alphanumeric
text = re.sub(r'([{,]\s*)([a-zA-Z0-9_]+)(\s*:)', r'\1"\2"\3', text)
try:
# Step D: Load into Python Dict then Validate
json_dict = json.loads(text)
return self.pydantic_object.model_validate(json_dict)
except json.JSONDecodeError as e:
raise OutputParserException(
f"Failed to parse JSON even after repair attempts. Original Output: {text}"
) from e
except Exception as e:
raise OutputParserException(f"Validation failed: {str(e)}")
# --- Usage Example ---
from langchain_ollama import ChatOllama
from langchain_core.prompts import PromptTemplate
# 1. Define your desired Output Schema using Pydantic
class AgentAction(BaseModel):
tool: str = Field(description="The name of the tool to use")
tool_input: str = Field(description="The input query for the tool")
confidence: float = Field(description="Confidence score between 0 and 1")
# 2. Initialize the Model
llm = ChatOllama(
model="llama3",
temperature=0,
# IMPORTANT: Setting format='json' helps, but doesn't guarantee schema compliance
format="json"
)
# 3. Create the Parser
parser = RobustJsonParser(pydantic_object=AgentAction)
# 4. Define the Prompt
prompt = PromptTemplate(
template="""
You are an AI agent.
Analyze the user request and determine the correct tool to use.
User Request: {query}
Output MUST be a raw JSON object matching this schema:
{format_instructions}
""",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
# 5. Build the Chain (LCEL)
chain = prompt | llm | parser
# 6. Run the Chain
try:
# Intentionally vague query to test reasoning
response = chain.invoke({"query": "Can you check the weather in Tokyo?"})
print("✅ Parsed Successfully:")
print(f"Tool: {response.tool}")
print(f"Input: {response.tool_input}")
print(f"Confidence: {response.confidence}")
except OutputParserException as e:
print(f"❌ Fatal Error: {e}")
Deep Dive: How the Sanitization Logic Works
Let's break down the _repair_and_parse method, which is the engine ensuring your agent doesn't crash.
1. Markdown Extraction
Llama 3 often ignores instructions to output raw JSON and instead formats it for readability:
Here is your JSON:
```json
{ "tool": "search" }
The regex `r"```json\s*([\s\S]*?)\s*```"` captures everything inside the code block. If the code block isn't present, the fallback logic uses string slicing (`find("{")` and `rfind("}")`) to isolate the JSON object from the conversational filler.
### 2. The Trailing Comma Killer
This is the most frequent error with Llama 3. It generates:
```json
{
"tool": "calculator",
"tool_input": "2+2",
}
The regex re.sub(r",\s*([\]}])", r"\1", text) identifies a comma followed immediately by a closing brace (} or ]) and removes the comma while keeping the brace. This transforms invalid JSON into valid JSON.
3. Pydantic Integration
Note that we pass the class type (AgentAction) into the parser. By inheriting from BaseOutputParser[T], we maintain full type safety. The final step uses model_validate, ensuring that even if the JSON is valid syntax, it adheres to your strict schema (e.g., confidence must be a float).
Common Pitfalls and Edge Cases
While the robust parser handles 90% of cases, be aware of these specific scenarios when working with local LLMs.
The "JSON Mode" Trap
Ollama allows you to set format="json" in the model initialization.
llm = ChatOllama(model="llama3", format="json")
Pros: Forces the model to output valid JSON syntax. Cons: It often degrades reasoning capabilities. In "JSON Mode," the model focuses so heavily on syntax that it may hallucinate values to fit the structure. Recommendation: Use format="json" in conjunction with the RobustJsonParser. The flag handles the syntax, and your parser handles the schema validation and edge cases.
Hallucinated Fields
Sometimes, Llama 3 will add fields that aren't in your Pydantic model. By default, Pydantic v2 will ignore extra fields. If you want to enforce strictness (forbidding extra fields), update your Pydantic config:
class AgentAction(BaseModel):
model_config = {"extra": "forbid"}
# ... fields ...
However, for local agents, extra="ignore" (default) is usually safer to prevent crashes due to model "creativity."
Conclusion
Building reliable agents with local LLMs requires accepting that the models are imperfect. We cannot rely on them to follow syntax rules as strictly as GPT-4.
By shifting the burden of syntax compliance from the Prompt (which costs context tokens) to the Parser (which uses cheap CPU cycles), we create resilient systems. The RobustJsonParser implementation above ensures that your Llama 3 agents spend less time crashing on JSONDecodeError and more time executing tasks.