You have set up your Next.js 15 application, configured your LangChain chains, and everything works perfectly in the console. But when you connect it to your React frontend, the application hangs. The user stares at a loading spinner for five seconds, and then the entire response snaps into existence at once.
This destroys the User Experience. The "magic" of LLMs lies in the token-by-token streaming effect—the typewriter illusion that makes the AI feel alive and responsive.
Achieving this in the App Router is surprisingly difficult. You are battling three adversaries: the serialization boundary between React Server Components (RSC) and the client, the mismatch between LangChain’s async iterables and standard Web Streams, and the strict typing of TypeScript.
This guide provides a production-grade, rigorous solution to implement real-time streaming using Next.js 15 Route Handlers and LangChain.
The Root Cause: Why Streaming Breaks
To fix the problem, we must understand the architecture.
1. The Serialization Barrier
In Next.js 15, Server Actions and Server Components communicate with the Client via serialized JSON. Standard LangChain responses are complex objects. While Next.js can serialize basic data, it cannot serialize an active, open TCP stream over a Server Action return value easily without third-party wrappers (like Vercel’s AI SDK).
2. Node Streams vs. Web Streams
LangChain (and Node.js) historically relied on Node-specific streams. However, the modern web (and the Next.js Edge Runtime) relies on the Web Streams API (ReadableStream). Mismatching these results in buffering, where the server waits for the stream to close before sending the first byte.
3. The Protocol Problem
When you stream, you aren't just sending a string. You are sending chunks of bytes. The frontend needs to listen to these chunks, decode them from Uint8Array to UTF-8 strings, and append them to the React state immediately.
The Solution
We will bypass complex Server Action wrappers and use a standard HTTP Route Handler. This allows us to return a raw ReadableStream, giving us low-level control over the byte stream.
Prerequisites
Ensure you have the necessary packages installed:
npm install langchain @langchain/openai ai clsx tailwind-merge
Step 1: Create the Streaming Route Handler
We need a backend endpoint that accepts a prompt, initiates a LangChain model, and—crucially—converts the LangChain output into a streamable response.
Create app/api/chat/route.ts:
import { NextRequest, NextResponse } from "next/server";
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";
export const runtime = "edge"; // Optional: Use Edge for lower latency
export async function POST(req: NextRequest) {
try {
const { prompt } = await req.json();
if (!prompt) {
return NextResponse.json({ error: "Prompt is required" }, { status: 400 });
}
// 1. Initialize the Model
// Ensure streaming is set to true
const model = new ChatOpenAI({
modelName: "gpt-4o",
temperature: 0.7,
streaming: true,
openAIApiKey: process.env.OPENAI_API_KEY,
});
// 2. Create the Stream
// We use a TransformStream to convert LangChain's string chunks
// into the Uint8Array format required by the browser.
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
// LangChain's .stream() returns an AsyncIterable
const streamResponse = await model.stream([
new HumanMessage(prompt),
]);
for await (const chunk of streamResponse) {
// chunk.content is the string token
const content = chunk.content;
// Allow string or generic content, check for existence
if (typeof content === "string" && content.length > 0) {
controller.enqueue(encoder.encode(content));
}
}
} catch (error) {
controller.error(error);
} finally {
controller.close();
}
},
});
// 3. Return the Response
return new NextResponse(stream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
},
});
} catch (error: any) {
return NextResponse.json(
{ error: error.message || "Internal Server Error" },
{ status: 500 }
);
}
}
Step 2: Create the Frontend Client Component
Now we need a UI that can consume this stream. We cannot use a simple await fetch(). We must lock the reader and read chunks until the stream closes.
Create components/ChatInterface.tsx:
"use client";
import { useState, useRef, FormEvent } from "react";
interface Message {
role: "user" | "assistant";
content: string;
}
export default function ChatInterface() {
const [input, setInput] = useState("");
const [messages, setMessages] = useState<Message[]>([]);
const [isLoading, setIsLoading] = useState(false);
// Ref to handle stream cancellation if needed
const abortControllerRef = useRef<AbortController | null>(null);
const handleSubmit = async (e: FormEvent) => {
e.preventDefault();
if (!input.trim() || isLoading) return;
const userMessage: Message = { role: "user", content: input };
setMessages((prev) => [...prev, userMessage]);
setInput("");
setIsLoading(true);
// Create a placeholder for the AI response
setMessages((prev) => [...prev, { role: "assistant", content: "" }]);
abortControllerRef.current = new AbortController();
try {
const response = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ prompt: userMessage.content }),
signal: abortControllerRef.current.signal,
});
if (!response.ok) throw new Error("Network response was not ok");
if (!response.body) throw new Error("No response body");
// 1. Get the Reader
const reader = response.body.getReader();
const decoder = new TextDecoder();
// 2. Loop through the stream
while (true) {
const { done, value } = await reader.read();
if (done) break;
// 3. Decode the chunk
const chunk = decoder.decode(value, { stream: true });
// 4. Update the UI state
// We update the *last* message in the array
setMessages((prev) => {
const newMessages = [...prev];
const lastMsg = newMessages[newMessages.length - 1];
// Ensure we are appending to the assistant's message
if (lastMsg.role === "assistant") {
lastMsg.content += chunk;
}
return newMessages;
});
}
} catch (error: any) {
if (error.name === 'AbortError') {
console.log("Stream stopped by user");
} else {
console.error("Stream error:", error);
}
} finally {
setIsLoading(false);
abortControllerRef.current = null;
}
};
const stopStream = () => {
if (abortControllerRef.current) {
abortControllerRef.current.abort();
}
};
return (
<div className="max-w-2xl mx-auto p-6 space-y-6">
<div className="space-y-4 min-h-[300px] border p-4 rounded-lg bg-gray-50">
{messages.map((msg, idx) => (
<div
key={idx}
className={`p-3 rounded-lg ${
msg.role === "user"
? "bg-blue-100 ml-auto max-w-[80%]"
: "bg-white border mr-auto max-w-[80%]"
}`}
>
<p className="whitespace-pre-wrap text-sm">{msg.content}</p>
</div>
))}
</div>
<form onSubmit={handleSubmit} className="flex gap-3">
<input
className="flex-1 p-2 border rounded-md"
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask something..."
disabled={isLoading}
/>
{isLoading ? (
<button
type="button"
onClick={stopStream}
className="px-4 py-2 bg-red-500 text-white rounded-md hover:bg-red-600"
>
Stop
</button>
) : (
<button
type="submit"
className="px-4 py-2 bg-blue-600 text-white rounded-md hover:bg-blue-700"
>
Send
</button>
)}
</form>
</div>
);
}
Deep Dive: Why This Implementation Works
There are three specific architectural decisions in the code above that ensure reliability and performance.
1. Manual ReadableStream Construction
We wrap the LangChain async iterator (model.stream) inside a native ReadableStream. The start(controller) method acts as a bridge. As LangChain yields a token, we immediately enqueue it to the controller. This forces the Next.js network layer to flush the buffer to the client immediately, rather than waiting for the entire loop to finish.
2. TextEncoder and TextDecoder
Network streams transmit binary data (Uint8Array), not JavaScript strings.
- Server Side: We use
TextEncoderto turn the string "Hello" into[72, 101, 108, 108, 111]. - Client Side: We use
TextDecoderwith{ stream: true }. This flag is vital. If a multi-byte character (like an emoji or Kanji) gets split between two network packets,stream: truetells the decoder to keep the partial byte in an internal buffer until the rest of the bytes arrive in the next chunk.
3. Functional State Updates
In React, state updates are asynchronous and batched.
setMessages((prev) => { ... })
Using the functional update form is mandatory here. Because the stream chunks arrive extremely fast (every 20-50ms), accessing messages directly without the prev pointer would result in a "stale closure." You would constantly overwrite the previous chunk with the new chunk, rather than appending to it.
Common Pitfalls and Edge Cases
Handling Timeout Limitations
Vercel's default timeout for Serverless Functions is often 10-60 seconds (depending on your plan). LLM responses can take longer. Fix: By adding export const runtime = "edge"; in the Route Handler, you move the execution to the Edge Runtime, which supports streaming responses efficiently without the strict timeout limitations of standard Serverless functions (though CPU time is limited, streaming is I/O bound, which is fine).
Handling JSON in Streams
The example above streams raw text. What if you need to return citations or source documents alongside the text? You cannot simply JSON.stringify the chunk, because you'll receive partial JSON strings on the frontend. Fix: Use a delimiter protocol. Send data as type:content\n.
0:This is text1:{"source": "wiki"}Parse the line prefix on the frontend to determine if the chunk is display text or metadata.
Conclusion
Streaming LangChain responses in Next.js 15 requires moving away from high-level abstractions and understanding the underlying web standards. By utilizing Route Handlers, the ReadableStream API, and proper React state management, you can build AI interfaces that feel instant and professional.
Implementing this pattern ensures your application remains scalable, responsive, and ready for the demands of modern AI-driven user experiences.