Force Reliable JSON Output from Claude 3.5 Using Tool Use

Every AI engineer has faced this specific nightmare. You spend hours refining a prompt to extract structured data, adding constraints like "OUTPUT ONLY JSON," "NO PREAMBLE," and "DO NOT CHAT."

You run the test. It works. You deploy to production.

Then, inevitably, an edge case hits. Claude 3.5 decides to be helpful. instead of returning { "status": "success" }, it returns:

"Certainly! Here is the JSON you requested regarding the user status:
{ "status": "success" }
I hope this helps!"

Your JSON.parse() throws an exception. Your pipeline crashes. You write a brittle Regex to extract content between backticks, adding technical debt to your codebase.

There is a better way. Prompt engineering is not the solution for structural integrity; architecture is. By leveraging Tool Use (Function Calling), we can force Claude to bypass its conversational training and output deterministic, parseable JSON every single time.

The Root Cause: Why LLMs Struggle with "JSON Only"

To solve the problem, you must understand the architecture. Large Language Models (LLMs) like Claude 3.5 Sonnet are autoregressive probabilistic models trained primarily on human conversation.

They are optimized for RLHF (Reinforcement Learning from Human Feedback). This training specifically rewards the model for being polite, conversational, and explanatory. When you ask a question, the model’s weights are biased toward responding like a helpful assistant, which usually implies using natural language to introduce data.

When you use "System Prompting" to suppress this (e.g., "Do not speak"), you are fighting against the model's fundamental training weights. You are asking a chat bot to stop chatting.

Tool Use changes the paradigm. When an LLM detects a tool definition, it switches modes. It is no longer generating a chat response; it is generating arguments for a function signature. This process utilizes a specific subset of training data optimized for code generation and syntax adherence, drastically reducing the probability of conversational drift.

The Fix: Forcing Tool Use

The strategy is simple: instead of asking Claude to reply with JSON, we force Claude to call a function where the arguments match our desired JSON schema. We don't actually need to execute the function on our backend; we simply capture the arguments it generates.

Implementation in TypeScript

We will use the official @anthropic-ai/sdk. In this example, we want to extract customer sentiment and action items from a support ticket.

Prerequisites:

npm install @anthropic-ai/sdk zod

Here is the complete, production-ready implementation:

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// 1. Define the schema strictly. 
// While we aren't executing code, this schema acts as the guardrail.
const EXTRACTION_TOOL_NAME = 'record_customer_interaction';

const extractionSchema = {
  name: EXTRACTION_TOOL_NAME,
  description: 'Records structured data from a customer support interaction.',
  input_schema: {
    type: 'object',
    properties: {
      sentiment_score: {
        type: 'number',
        description: 'Sentiment from 1 (angry) to 10 (delighted)',
      },
      category: {
        type: 'string',
        enum: ['refund', 'technical_issue', 'feature_request', 'other'],
      },
      action_items: {
        type: 'array',
        items: { type: 'string' },
        description: 'List of specific tasks required to resolve the ticket',
      },
      urgent: {
        type: 'boolean',
      }
    },
    required: ['sentiment_score', 'category', 'action_items', 'urgent'],
  },
};

async function extractStructuredData(rawText: string) {
  const msg = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20240620',
    max_tokens: 1024,
    messages: [{ role: 'user', content: rawText }],
    tools: [extractionSchema],
    // CRITICAL: This forces the model to use the tool.
    // It cannot choose to chat. It MUST call this function.
    tool_choice: { type: 'tool', name: EXTRACTION_TOOL_NAME },
  });

  // Parse the output
  const toolUseBlock = msg.content.find(
    (block) => block.type === 'tool_use'
  );

  if (!toolUseBlock) {
    throw new Error('Model failed to use the required tool');
  }

  // The input is already a strictly typed JSON object
  return toolUseBlock.input;
}

// Example Usage
(async () => {
  const input = "I'm extremely frustrated. My login doesn't work and I need a refund immediately.";
  
  try {
    const result = await extractStructuredData(input);
    console.log(JSON.stringify(result, null, 2));
  } catch (err) {
    console.error(err);
  }
})();

Implementation in Python

In Python, we can leverage Pydantic to generate our schemas automatically. This ensures our internal data models stay in sync with the LLM's expected output.

Prerequisites:

pip install anthropic pydantic

import os
import json
from typing import List, Literal
from anthropic import Anthropic
from pydantic import BaseModel, Field

client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

# 1. Define the data structure using Pydantic
class CustomerInteraction(BaseModel):
    sentiment_score: int = Field(..., description="1-10 scale")
    category: Literal['refund', 'technical_issue', 'feature_request', 'other']
    action_items: List[str]
    urgent: bool

# 2. Convert Pydantic model to JSON Schema for the API
def get_schema(pydantic_model: type[BaseModel]):
    schema = pydantic_model.model_json_schema()
    return {
        "name": "record_customer_interaction",
        "description": "Extracts structured data from support text",
        "input_schema": schema
    }

def extract_data(text_input: str) -> CustomerInteraction:
    tool_def = get_schema(CustomerInteraction)
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        max_tokens=1024,
        messages=[{"role": "user", "content": text_input}],
        tools=[tool_def],
        # CRITICAL: Force the tool usage
        tool_choice={"type": "tool", "name": "record_customer_interaction"}
    )

    # Extract the tool use block
    for block in response.content:
        if block.type == "tool_use":
            # Validate directly back into Pydantic
            return CustomerInteraction(**block.input)
            
    raise ValueError("Claude did not attempt to use the tool.")

# Example Usage
if __name__ == "__main__":
    raw_text = "The system is slow, but I don't need a refund. Just fix the latency."
    result = extract_data(raw_text)
    
    print(f"Category: {result.category}")
    print(f"Action Items: {result.action_items}")
    # Output: 
    # Category: technical_issue
    # Action Items: ['Fix system latency']

Deep Dive: Why `tool_choice` is the Secret Weapon

In the code examples above, the key configuration is tool_choice.

By default, if you provide tools, the LLM creates a tool_choice: "auto" setting. This means the model can use the tool, but it can also choose to ask clarifying questions or just chat normally.

By setting tool_choice: { type: 'tool', name: '...' }, we disable the model's ability to select the "text" output mode. We constrain the generation path. The model looks at the probability distribution for the next token, and the only valid paths allowed are tokens that conform to the JSON structure of the tool arguments.

This effectively turns the LLM from a Chatbot into a "Semantic Transformation Engine." It takes unstructured text in and pushes structured JSON out.

Handling Edge Cases: Chain of Thought (CoT)

There is one downside to forcing JSON output immediately: Performance degradation on complex logic.

LLMs "think" by generating tokens. If you force immediate JSON output, the model cannot "reason" through a problem before answering. It has to guess the answer instantly.

If your JSON structure requires complex deduction (e.g., "Analyze the legal risk of this contract"), you need to allow the model to think first.

The Two-Step Pattern

To solve this, add a reasoning field to your tool definition.

const extractionSchema = {
  // ...
  input_schema: {
    type: 'object',
    properties: {
      reasoning: {
        type: 'string',
        description: 'Think step-by-step about the analysis here before setting the final score.'
      },
      final_risk_score: {
        type: 'number'
      }
    }
    // ...
  }
}

By placing the reasoning field first in the JSON object properties, the model generates the explanation (doing the "thinking") before it generates the final_risk_score. Since LLMs generate linearly, the "thinking" tokens are already in the context window when it calculates the score, leading to significantly higher accuracy.

Conclusion

Stop fighting your LLMs with "Do not chat" prompts. It is a losing battle against the model's fundamental training.

By treating output schemas as Tools and strictly enforcing them via tool_choice, you convert probabilistic conversational noise into reliable, type-safe data pipelines. This approach is essential for building production-grade AI agents that interact with downstream APIs and databases.

Programming Tutorials

Search This Blog