Building modern interfaces for Large Language Models (LLMs) requires more than just streaming text. As developers integrate engines like Perplexity, they encounter a specific friction point: converting raw text citations into interactive UI elements.
The Perplexity API returns a response where the generated text contains static markers (e.g., [1], [2]) and a separate citations array containing the source URLs. To create a professional user experience, you must parse these markers and replace them with interactive components without breaking the React render cycle or introducing security vulnerabilities.
This guide details the root cause of the rendering challenge and provides a production-ready React implementation to parse, map, and render interactive citations.
Understanding the Data Structure
Before writing the parser, we must understand the raw shape of the data. When querying the Perplexity API (specifically models like sonar-medium-online or pplx-7b-online), the JSON response typically looks like this:
{
"id": "5f3a2b...",
"model": "sonar-medium-online",
"created": 17098234,
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "React Server Components allow you to render components on the server [1]. This reduces the bundle size sent to the client [2]."
}
}
],
"citations": [
"https://react.dev/reference/rsc/server-components",
"https://nextjs.org/docs/app/building-your-application/rendering/server-components"
]
}
The challenge is evident: the content string refers to indices (1-based) that correspond to the citations array (0-based).
The Core Challenge: String Injection vs. Component Rendering
A common mistake is attempting to use String.prototype.replace() with raw HTML strings and injecting them via dangerouslySetInnerHTML.
Why dangerouslySetInnerHTML Fails Here
- XSS Vulnerabilities: Injecting un-sanitized HTML from an external API is a security risk.
- Loss of React Context: If you inject a standard
<a>tag string, you lose the ability to use React components (like<Tooltip>or<Popover>) or Next.js<Link>components for internal routing. - Event Handling: You cannot attach React event handlers (
onClick,onMouseEnter) to string-injected HTML.
The correct approach requires transforming the string into an array of React Nodes.
The Solution: Regex-Driven Splitting
To solve this, we utilize a powerful but often overlooked feature of JavaScript's String.prototype.split(). If the regular expression used in split contains capturing parentheses, the matched results are included in the output array.
We will use this behavior to deconstruct the text stream into linear segments, identifying which segments are citations and which are plain text.
The Parsing Logic
We need a regex that identifies the pattern [n], where n is a number.
/(\[\d+\])/g
\[and\]: Escaped brackets to match literal characters.\d+: Matches one or more digits.(...): The capturing group ensures the delimiter itself is returned in the array.
Implementation: The CitationRenderer Component
Below is a complete, TypeScript-typed React component. It accepts the raw text and the citations list, then handles the parsing and rendering safely.
This component handles the offset math (converting [1] to index 0) and validates bounds to prevent crashes if the LLM hallucinates a citation index that doesn't exist.
import React, { useMemo } from 'react';
interface CitationRendererProps {
text: string;
citations: string[];
}
export const CitationRenderer: React.FC<CitationRendererProps> = ({
text,
citations
}) => {
// Memoize the parsing to prevent expensive regex operations on every re-render
const elements = useMemo(() => {
// Regex to split by citation markers like [1], [2], etc.
// The capturing group ([\d+]) ensures the marker is included in the parts array.
const parts = text.split(/(\[\d+\])/g);
return parts.map((part, index) => {
// Check if the current part is a citation marker
const citationMatch = part.match(/^\[(\d+)\]$/);
if (citationMatch) {
// Extract the number from the string "1"
const citationIndex = parseInt(citationMatch[1], 10) - 1;
const url = citations[citationIndex];
// Safety check: ensure the citation exists in the provided array
if (url) {
return (
<CitationChip
key={`${index}-${citationIndex}`}
index={citationIndex + 1}
url={url}
/>
);
}
}
// Return regular text nodes for non-citation parts
return <span key={index}>{part}</span>;
});
}, [text, citations]);
return (
<div className="leading-7 text-gray-800 dark:text-gray-200">
{elements}
</div>
);
};
// Sub-component for the interactive citation
const CitationChip = ({ index, url }: { index: number; url: string }) => {
return (
<a
href={url}
target="_blank"
rel="noopener noreferrer"
className="
inline-flex items-center justify-center
align-baseline mx-0.5 px-1.5 py-0.5
text-[10px] font-bold text-blue-600 bg-blue-50
rounded-full cursor-pointer
hover:bg-blue-100 hover:text-blue-700
transition-colors duration-200
border border-blue-200
no-underline translate-y-[-2px]
"
aria-label={`Citation ${index}`}
>
{index}
</a>
);
};
Deep Dive: How It Works
1. The Split Technique
When text.split(/(\[\d+\])/g) runs on "Text [1] End", the resulting array is: ["Text ", "[1]", " End"]
Without the capturing parentheses in the regex, the output would simply be: ["Text ", " End"] The separator would be lost. By capturing it, we maintain the sequence of the content while isolating the markers we need to replace.
2. Zero-Based Index Mapping
Perplexity (and most academic standards) use 1-based indexing for display ([1]). Arrays in JavaScript are 0-based. The line const citationIndex = parseInt(citationMatch[1], 10) - 1; performs this translation. It extracts the digit captured by the internal match (\d+) and decrements it to access the correct URL in the citations array.
3. Rendering Safety
The condition if (url) is critical. LLMs are non-deterministic. It is possible for the model to generate text containing [5] even if it only provided 3 citation URLs. Without this check, your UI would render a broken link or crash.
Styling and UX Considerations
In the code above, we used Tailwind CSS for styling. There are specific choices made for usability:
translate-y-[-2px]: This gives the citation a "superscript" feel without actually breaking the line-height rhythm of the paragraph, which often happens with the native<sup>tag.- Hit Area: We added padding (
px-1.5) to ensure the clickable area is large enough for mobile users, despite the text being small (text-[10px]). - Accessibility: The
aria-labelensures screen readers announce "Citation 1" rather than just reading the number "one" mid-sentence.
Handling Edge Cases
When implementing this in a production environment, consider these edge cases:
Consecutive Citations
The text might contain [1][2] without spaces.
- Result: Our regex split handles this perfectly.
- Output:
["...", "[1]", "", "[2]", "..."]. The empty string in the middle renders as nothing, and the two chips appear side-by-side.
Streaming Responses
If you are streaming the response token-by-token:
- The
textprop will update frequently. - The regex might break if a chunk ends halfway through a marker (e.g.,
...content [1). - Fix: While
useMemohelps performance, for streaming, you might see the brackets "flicker" before converting to a chip. This is generally acceptable. For a smoother experience, you can implement a buffer that only updates the rendered output when a token completes a sentence or a citation marker.
Conclusion
Parsing unstructured text from APIs like Perplexity into structured React components is a necessary step for building polished AI interfaces. By moving away from string replacement and embracing regex-based array splitting, you ensure your application remains secure, performant, and capable of rendering rich, interactive citations.