Skip to main content

Implementing Text Highlights & Anchors: Workarounds for Google Docs API Limitations

 Building integrations for Google Docs often starts with a simple premise: "I want to highlight a specific phrase," or "I need to anchor a comment to this generated paragraph."

In the Google Docs UI, this is trivial. You highlight with your mouse, and the DOM handles the rest. However, when working with the Google Docs REST API, you hit a distinct barrier. There is no anchorTo("text") method. Instead, the API relies exclusively on absolute startIndex and endIndex integers.

If you attempt to perform simple string matching on the document body, your integration will break. Structural elements (tables, images), styling changes (bold/italic splits), and Unicode characters create a divergence between the visible text length and the internal API index.

This post details the architectural implementation required to reliably calculate text anchors for the Google Docs API, bypassing these indexing limitations.

The Root Cause: Linear Indexing vs. Structural Elements

To solve the anchoring problem, we must understand how Google Docs stores data. It does not use a tree structure like the HTML DOM for indexing. Instead, it uses a linear property map.

Everything in the document—characters, table cells, images, headers—exists on a single integer timeline.

Why String.indexOf Fails

A naive approach is fetching the document body.content, joining all text strings, and running a Regex search. Here is why that fails:

  1. Structural Padding: A table isn't just text. The start of a table, a row, and a cell all consume integer indices in the API, but they represent zero characters in a plain string string.
  2. Split TextRuns: If a user bolds the middle of a word (e.g., "Hardworking"), the API returns three separate textRun objects. A simple search for "Hardworking" will fail because the string is fragmented in the JSON response.
  3. Inline Objects: An image inside a sentence consumes 1 index unit (kix.inlineObject) but has no text representation.

If your code calculates that a phrase starts at index 150 based on a text-only string, the real index in the API might be 158 due to hidden structural elements preceding it.

The Solution: The "Shadow Index" Map

The only robust solution is to construct a Shadow Index. We will generate a plain-text version of the document for searching, but simultaneously generate an array that maps every character in that plain-text string back to its true API index.

The Algorithm

  1. Traverse the Document structure recursively (handling nested tables).
  2. Accumulate two data structures:
    • searchableText: A standard string of all text content concatenated.
    • indexMap: An array of integers. indexMap[i] holds the real Google Docs API index for the character at searchableText[i].
  3. Search using standard Regex on searchableText.
  4. Resolve the match indices using indexMap to get the payload for the API.

Implementation

We will use TypeScript to ensure type safety, specifically leveraging the googleapis types.

1. Types and Recursive Extraction

First, we need a recursive function to flatten the document. Note that we must respect the nesting of Tables -> TableRows -> TableCells -> Content.

import { docs_v1 } from 'googleapis';

interface TextLocation {
  text: string;
  mapping: number[]; // Maps string index -> Real Docs API index
}

/**
 * Recursively extracts text and maps it to the true Google Docs index.
 */
function extractContent(
  elements: docs_v1.Schema$StructuralElement[] = []
): TextLocation {
  let fullText = '';
  const mapping: number[] = [];

  for (const element of elements) {
    // 1. Handle Paragraphs (TextRuns, Inline Objects, etc.)
    if (element.paragraph) {
      const elements = element.paragraph.elements || [];
      for (const el of elements) {
        const startIndex = el.startIndex || 0;
        
        // Handle TextRuns
        if (el.textRun && el.textRun.content) {
          const content = el.textRun.content;
          
          for (let i = 0; i < content.length; i++) {
            fullText += content[i];
            // The API index increments linearly with the string
            mapping.push(startIndex + i);
          }
        }
        
        // Handle Inline Objects (Images, Chips)
        // These consume index space but contribute no text to search
        // We do strictly NOTHING to the fullText here, 
        // because we can't "search" for an image via text.
      }
    }

    // 2. Handle Tables (Recursive)
    else if (element.table && element.table.tableRows) {
      for (const row of element.table.tableRows) {
        if (row.tableCells) {
          for (const cell of row.tableCells) {
            // Recursive call for cell content
            const cellResult = extractContent(cell.content || []);
            fullText += cellResult.text;
            mapping.push(...cellResult.mapping);
          }
        }
      }
    }
    
    // 3. Handle Section Breaks / TableOfContents (Optional based on need)
  }

  return { text: fullText, mapping };
}

2. The Search and Resolve Logic

Now that we have a 1:1 mapping between our searchable string and the API indices, finding the exact coordinates for a batchUpdate is trivial.

interface HighlightRequest {
  range: {
    startIndex: number;
    endIndex: number;
    segmentId?: string; // Important for headers/footers
  };
  color: {
    rgbColor: { red: number; green: number; blue: number };
  };
}

/**
 * Generates highlight requests for all occurrences of a search phrase.
 */
export function generateHighlightRequests(
  document: docs_v1.Schema$Document,
  searchPhrase: string
): docs_v1.Schema$Request[] {
  
  if (!document.body?.content) return [];

  // 1. Build the Shadow Index
  const { text, mapping } = extractContent(document.body.content);

  // 2. Perform Standard Regex Search
  // Escaping regex characters is critical for production stability
  const escapedPhrase = searchPhrase.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  const regex = new RegExp(escapedPhrase, 'gi');
  
  const requests: docs_v1.Schema$Request[] = [];
  let match;

  // 3. Iterate matches and resolve Real Indices
  while ((match = regex.exec(text)) !== null) {
    const matchStartStringIndex = match.index;
    const matchEndStringIndex = match.index + match[0].length;

    // Map back to Google Docs API Indices
    // Note: The endIndex in Docs API is exclusive, so we map the character AFTER the match
    // or take the last mapped index + 1 if at the very end.
    const apiStartIndex = mapping[matchStartStringIndex];
    const apiEndIndex = mapping[matchEndStringIndex - 1] + 1;

    // Guard clause: If mapping fails (rare edge case with structure boundaries)
    if (apiStartIndex === undefined || apiEndIndex === undefined) continue;

    // 4. Construct the API Request
    requests.push({
      updateTextStyle: {
        range: {
          startIndex: apiStartIndex,
          endIndex: apiEndIndex,
        },
        textStyle: {
          backgroundColor: {
            color: {
              rgbColor: { red: 1, green: 0.9, blue: 0 }, // Yellow
            },
          },
        },
        fields: 'backgroundColor',
      },
    });
  }

  return requests;
}

3. Execution

To apply these changes, you send the generated requests array to the batchUpdate endpoint.

async function applyHighlights(
  docsClient: docs_v1.Docs, 
  documentId: string, 
  phrase: string
) {
  // 1. Fetch current document state
  const doc = await docsClient.documents.get({ documentId });

  // 2. Calculate indices locally
  const requests = generateHighlightRequests(doc.data, phrase);

  if (requests.length === 0) {
    console.log('No matches found.');
    return;
  }

  // 3. Push updates
  await docsClient.documents.batchUpdate({
    documentId,
    requestBody: {
      requests,
    },
  });
  
  console.log(`Applied ${requests.length} highlights.`);
}

Deep Dive: Handling Edge Cases

While the code above covers 90% of use cases, production environments require handling subtle edge cases.

1. Unicode and Emoji Handling

JavaScript strings are UTF-16. The Google Docs API also uses UTF-16 code units for indexing. This is a fortunate alignment. If a user types an emoji, it consumes 2 units in JS length and usually 2 units in the Docs API index. The mapping.push(startIndex + i) loop inside the TextRun handler naturally accounts for surrogate pairs because content.length in JS iterates over code units, not code points.

2. Tabs and Newlines

Google Docs usually represents tabs as \t and newlines as \n within the textRun.content. However, specific structural elements (like sectionBreak) also force new pages. The recursive extractor above naturally handles \n because they appear in the textRun. However, ensure your searchPhrase accounts for whitespace normalization. It is often best to replace \s+ in your Regex with a generic spacer to match newlines loosely.

3. Read-Time vs. Write-Time Race Conditions

This implementation calculates indices based on a snapshot of the document. If the document is being edited collaboratively while your script runs:

  1. You fetch the doc (Version 100).
  2. User types 5 characters at the top of the doc (Version 101).
  3. You send a batchUpdate using indices calculated from Version 100.
  4. Result: Your highlight is shifted by 5 characters.

The Fix: Use the WriteControl parameter (available in modern API versions) to assert the required revision ID. If the revision has advanced, the API will reject your request, allowing you to retry the fetch-calculate-update cycle.

await docsClient.documents.batchUpdate({
  documentId,
  requestBody: {
    requests,
    writeControl: {
      requiredRevisionId: doc.data.revisionId // Enforce optimistic locking
    }
  },
});

Conclusion

The Google Docs API is powerful, but its treatment of the document as a linear timeline rather than a semantic tree creates complexity for text manipulation. You cannot rely on simple string indexing.

By implementing a Shadow Index—recursively mapping content to API indices—you can bridge the gap between human-readable text and the machine-readable integer map. This approach ensures your highlights, comments, and anchors land exactly where they belong, regardless of tables, images, or complex styling within the document.