Skip to main content

How to Stream LangChain Responses in Next.js 15 (App Router Guide)

 You have set up your Next.js 15 application, configured your LangChain chains, and everything works perfectly in the console. But when you connect it to your React frontend, the application hangs. The user stares at a loading spinner for five seconds, and then the entire response snaps into existence at once.

This destroys the User Experience. The "magic" of LLMs lies in the token-by-token streaming effect—the typewriter illusion that makes the AI feel alive and responsive.

Achieving this in the App Router is surprisingly difficult. You are battling three adversaries: the serialization boundary between React Server Components (RSC) and the client, the mismatch between LangChain’s async iterables and standard Web Streams, and the strict typing of TypeScript.

This guide provides a production-grade, rigorous solution to implement real-time streaming using Next.js 15 Route Handlers and LangChain.

The Root Cause: Why Streaming Breaks

To fix the problem, we must understand the architecture.

1. The Serialization Barrier

In Next.js 15, Server Actions and Server Components communicate with the Client via serialized JSON. Standard LangChain responses are complex objects. While Next.js can serialize basic data, it cannot serialize an active, open TCP stream over a Server Action return value easily without third-party wrappers (like Vercel’s AI SDK).

2. Node Streams vs. Web Streams

LangChain (and Node.js) historically relied on Node-specific streams. However, the modern web (and the Next.js Edge Runtime) relies on the Web Streams API (ReadableStream). Mismatching these results in buffering, where the server waits for the stream to close before sending the first byte.

3. The Protocol Problem

When you stream, you aren't just sending a string. You are sending chunks of bytes. The frontend needs to listen to these chunks, decode them from Uint8Array to UTF-8 strings, and append them to the React state immediately.

The Solution

We will bypass complex Server Action wrappers and use a standard HTTP Route Handler. This allows us to return a raw ReadableStream, giving us low-level control over the byte stream.

Prerequisites

Ensure you have the necessary packages installed:

npm install langchain @langchain/openai ai clsx tailwind-merge

Step 1: Create the Streaming Route Handler

We need a backend endpoint that accepts a prompt, initiates a LangChain model, and—crucially—converts the LangChain output into a streamable response.

Create app/api/chat/route.ts:

import { NextRequest, NextResponse } from "next/server";
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";

export const runtime = "edge"; // Optional: Use Edge for lower latency

export async function POST(req: NextRequest) {
  try {
    const { prompt } = await req.json();

    if (!prompt) {
      return NextResponse.json({ error: "Prompt is required" }, { status: 400 });
    }

    // 1. Initialize the Model
    // Ensure streaming is set to true
    const model = new ChatOpenAI({
      modelName: "gpt-4o",
      temperature: 0.7,
      streaming: true, 
      openAIApiKey: process.env.OPENAI_API_KEY,
    });

    // 2. Create the Stream
    // We use a TransformStream to convert LangChain's string chunks 
    // into the Uint8Array format required by the browser.
    const encoder = new TextEncoder();
    
    const stream = new ReadableStream({
      async start(controller) {
        try {
          // LangChain's .stream() returns an AsyncIterable
          const streamResponse = await model.stream([
            new HumanMessage(prompt),
          ]);

          for await (const chunk of streamResponse) {
            // chunk.content is the string token
            const content = chunk.content;
            
            // Allow string or generic content, check for existence
            if (typeof content === "string" && content.length > 0) {
              controller.enqueue(encoder.encode(content));
            }
          }
        } catch (error) {
          controller.error(error);
        } finally {
          controller.close();
        }
      },
    });

    // 3. Return the Response
    return new NextResponse(stream, {
      headers: {
        "Content-Type": "text/event-stream",
        "Cache-Control": "no-cache",
        "Connection": "keep-alive",
      },
    });

  } catch (error: any) {
    return NextResponse.json(
      { error: error.message || "Internal Server Error" }, 
      { status: 500 }
    );
  }
}

Step 2: Create the Frontend Client Component

Now we need a UI that can consume this stream. We cannot use a simple await fetch(). We must lock the reader and read chunks until the stream closes.

Create components/ChatInterface.tsx:

"use client";

import { useState, useRef, FormEvent } from "react";

interface Message {
  role: "user" | "assistant";
  content: string;
}

export default function ChatInterface() {
  const [input, setInput] = useState("");
  const [messages, setMessages] = useState<Message[]>([]);
  const [isLoading, setIsLoading] = useState(false);
  
  // Ref to handle stream cancellation if needed
  const abortControllerRef = useRef<AbortController | null>(null);

  const handleSubmit = async (e: FormEvent) => {
    e.preventDefault();
    if (!input.trim() || isLoading) return;

    const userMessage: Message = { role: "user", content: input };
    setMessages((prev) => [...prev, userMessage]);
    setInput("");
    setIsLoading(true);

    // Create a placeholder for the AI response
    setMessages((prev) => [...prev, { role: "assistant", content: "" }]);

    abortControllerRef.current = new AbortController();

    try {
      const response = await fetch("/api/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ prompt: userMessage.content }),
        signal: abortControllerRef.current.signal,
      });

      if (!response.ok) throw new Error("Network response was not ok");
      if (!response.body) throw new Error("No response body");

      // 1. Get the Reader
      const reader = response.body.getReader();
      const decoder = new TextDecoder();

      // 2. Loop through the stream
      while (true) {
        const { done, value } = await reader.read();
        
        if (done) break;

        // 3. Decode the chunk
        const chunk = decoder.decode(value, { stream: true });

        // 4. Update the UI state
        // We update the *last* message in the array
        setMessages((prev) => {
          const newMessages = [...prev];
          const lastMsg = newMessages[newMessages.length - 1];
          // Ensure we are appending to the assistant's message
          if (lastMsg.role === "assistant") {
            lastMsg.content += chunk;
          }
          return newMessages;
        });
      }
    } catch (error: any) {
      if (error.name === 'AbortError') {
        console.log("Stream stopped by user");
      } else {
        console.error("Stream error:", error);
      }
    } finally {
      setIsLoading(false);
      abortControllerRef.current = null;
    }
  };

  const stopStream = () => {
    if (abortControllerRef.current) {
      abortControllerRef.current.abort();
    }
  };

  return (
    <div className="max-w-2xl mx-auto p-6 space-y-6">
      <div className="space-y-4 min-h-[300px] border p-4 rounded-lg bg-gray-50">
        {messages.map((msg, idx) => (
          <div
            key={idx}
            className={`p-3 rounded-lg ${
              msg.role === "user"
                ? "bg-blue-100 ml-auto max-w-[80%]"
                : "bg-white border mr-auto max-w-[80%]"
            }`}
          >
            <p className="whitespace-pre-wrap text-sm">{msg.content}</p>
          </div>
        ))}
      </div>

      <form onSubmit={handleSubmit} className="flex gap-3">
        <input
          className="flex-1 p-2 border rounded-md"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Ask something..."
          disabled={isLoading}
        />
        {isLoading ? (
          <button
            type="button"
            onClick={stopStream}
            className="px-4 py-2 bg-red-500 text-white rounded-md hover:bg-red-600"
          >
            Stop
          </button>
        ) : (
          <button
            type="submit"
            className="px-4 py-2 bg-blue-600 text-white rounded-md hover:bg-blue-700"
          >
            Send
          </button>
        )}
      </form>
    </div>
  );
}

Deep Dive: Why This Implementation Works

There are three specific architectural decisions in the code above that ensure reliability and performance.

1. Manual ReadableStream Construction

We wrap the LangChain async iterator (model.stream) inside a native ReadableStream. The start(controller) method acts as a bridge. As LangChain yields a token, we immediately enqueue it to the controller. This forces the Next.js network layer to flush the buffer to the client immediately, rather than waiting for the entire loop to finish.

2. TextEncoder and TextDecoder

Network streams transmit binary data (Uint8Array), not JavaScript strings.

  • Server Side: We use TextEncoder to turn the string "Hello" into [72, 101, 108, 108, 111].
  • Client Side: We use TextDecoder with { stream: true }. This flag is vital. If a multi-byte character (like an emoji or Kanji) gets split between two network packets, stream: true tells the decoder to keep the partial byte in an internal buffer until the rest of the bytes arrive in the next chunk.

3. Functional State Updates

In React, state updates are asynchronous and batched.

setMessages((prev) => { ... })

Using the functional update form is mandatory here. Because the stream chunks arrive extremely fast (every 20-50ms), accessing messages directly without the prev pointer would result in a "stale closure." You would constantly overwrite the previous chunk with the new chunk, rather than appending to it.

Common Pitfalls and Edge Cases

Handling Timeout Limitations

Vercel's default timeout for Serverless Functions is often 10-60 seconds (depending on your plan). LLM responses can take longer. Fix: By adding export const runtime = "edge"; in the Route Handler, you move the execution to the Edge Runtime, which supports streaming responses efficiently without the strict timeout limitations of standard Serverless functions (though CPU time is limited, streaming is I/O bound, which is fine).

Handling JSON in Streams

The example above streams raw text. What if you need to return citations or source documents alongside the text? You cannot simply JSON.stringify the chunk, because you'll receive partial JSON strings on the frontend. Fix: Use a delimiter protocol. Send data as type:content\n.

  • 0:This is text
  • 1:{"source": "wiki"} Parse the line prefix on the frontend to determine if the chunk is display text or metadata.

Conclusion

Streaming LangChain responses in Next.js 15 requires moving away from high-level abstractions and understanding the underlying web standards. By utilizing Route Handlers, the ReadableStream API, and proper React state management, you can build AI interfaces that feel instant and professional.

Implementing this pattern ensures your application remains scalable, responsive, and ready for the demands of modern AI-driven user experiences.