Skip to main content

Preserving Bold & Bullet Styles When Replacing Text via Google Docs API

 Few things are more frustrating in Google Workspace automation than watching your carefully crafted template break during execution. You set up a perfect legal contract or invoice template with bold headers, specific fonts, and indented bullet points. You use the standard replaceAllText request.

The result? The text updates, but the formatting vanishes. Bold placeholders become plain text, or worse, bullet points collapse into a single unreadable paragraph.

While replaceAllText is the convenient route, it is a blunt instrument. It lacks the nuance to respect the underlying textRun architecture of the Google Docs JSON model. To maintain strict control over typography and list hierarchies, developers must abandon the convenience method in favor of scanning the document structure and executing precise batchUpdate operations.

The Root Cause: How Google Docs "Sees" Text

To understand why styles disappear, you must understand the Google Docs JSON representation. A document is not a flat string of HTML; it is a tree of StructuralElement objects.

Within a paragraph, text is broken into Text Runs. A new run begins whenever the styling changes.

Consider this visual text: {{CLIENT_NAME}} - Pending

In the JSON structure, this is likely two or three distinct objects:

  1. startIndex: 10content: "{{CLIENT_NAME}}"textStyle: { bold: true }
  2. startIndex: 25content: " - "textStyle: { bold: false }
  3. startIndex: 28content: "Pending"textStyle: { italic: true }

When you use replaceAllText, the API attempts to modify the content of these runs. However, if the replacement text contains newline characters (common in addresses or lists), or if the placeholder sits on the boundary of a style change, the API often resets the textStyle to the paragraph's default or creates a new run with default formatting.

To guarantee style preservation, we must utilize Index-Based Insertion. By inserting the new text inside the existing stylized text run (specifically at the starting index of the placeholder), the new text inherits the style of that run.

The Solution: Scan, Calculate, Batch Update

The reliable fix involves a three-step algorithmic approach:

  1. Scan the document to find the exact startIndex of every placeholder.
  2. Reverse Sort the matches. (We must process from the end of the document to the beginning so that early edits don't shift the indices of later edits).
  3. Execute a batchUpdate that performs an insertText followed by a deleteContentRange for each match.

The Implementation

Below is a complete, production-ready TypeScript solution using Node.js and the googleapis library. This code assumes you have already authenticated and initialized the docs client.

import { docs_v1, google } from 'googleapis';

// Interface for our replacement jobs
interface ReplacementJob {
  placeholder: string;
  replacementValue: string;
}

// Interface to track where matches are found
interface MatchLocation {
  placeholder: string;
  startIndex: number;
  endIndex: number;
  replacementValue: string;
}

/**
 * Robustly replaces text while preserving existing formatting (Bold, Italic, specific fonts).
 * 
 * @param docId - The Google Doc ID
 * @param replacements - Array of key-value pairs to replace
 * @param auth - Authenticated Google OAuth2 client
 */
export async function replaceWithStylePreservation(
  docId: string,
  replacements: ReplacementJob[],
  auth: any
) {
  const docs = google.docs({ version: 'v1', auth });

  // 1. Fetch the current document structure
  const doc = await docs.documents.get({ documentId: docId });
  const content = doc.data.body?.content;

  if (!content) throw new Error('Document body is empty');

  // 2. Scan document to find all occurrences of placeholders
  const matches: MatchLocation[] = [];

  // Helper to escape regex special characters
  const escapeRegExp = (string: string) => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');

  content.forEach((element) => {
    // We only care about Paragraphs for this example (ignoring Tables/Sections for brevity)
    if (element.paragraph && element.paragraph.elements) {
      element.paragraph.elements.forEach((el) => {
        const textRun = el.textRun;
        if (textRun && textRun.content) {
          replacements.forEach(({ placeholder, replacementValue }) => {
            const regex = new RegExp(escapeRegExp(placeholder), 'g');
            let match;
            while ((match = regex.exec(textRun.content!)) !== null) {
              // Calculate absolute index in the document
              const absoluteStartIndex = (textRun.startIndex || 0) + match.index;
              
              matches.push({
                placeholder,
                startIndex: absoluteStartIndex,
                endIndex: absoluteStartIndex + placeholder.length,
                replacementValue,
              });
            }
          });
        }
      });
    }
  });

  if (matches.length === 0) {
    console.log('No placeholders found.');
    return;
  }

  // 3. Sort matches in descending order based on startIndex.
  // CRITICAL: Modifying the document shifts indices. Working backwards prevents
  // the need to recalculate offsets for subsequent edits.
  matches.sort((a, b) => b.startIndex - a.startIndex);

  // 4. Construct the Batch Requests
  const requests: docs_v1.Schema$Request[] = [];

  matches.forEach((match) => {
    // Step A: Insert the new text at the start index.
    // Because we insert AT the index where the bold/styled text begins,
    // the new text inherits that style.
    requests.push({
      insertText: {
        text: match.replacementValue,
        location: {
          index: match.startIndex,
        },
      },
    });

    // Step B: Delete the old placeholder.
    // Note: We insert first, then delete. The delete range must account for
    // the text we just inserted? No, because we use specific indices.
    // However, simpler logic: 
    // 1. Insert at X.
    // 2. The old text is now shifted to X + length_of_new_text.
    // 3. Delete from (X + length_of_new_text) to (X + length_of_new_text + length_of_placeholder).
    
    requests.push({
      deleteContentRange: {
        range: {
          startIndex: match.startIndex + match.replacementValue.length,
          endIndex: match.startIndex + match.replacementValue.length + match.placeholder.length,
        },
      },
    });
  });

  // 5. Execute BatchUpdate
  await docs.documents.batchUpdate({
    documentId: docId,
    requestBody: {
      requests,
    },
  });

  console.log(`Successfully processed ${matches.length} replacements.`);
}

Deep Dive: Why This Works

The success of this approach relies on the mechanics of the insertText operation within the Google Docs operational transform model.

When you insert characters at index N, the API checks the textStyle of the character currently existing at N-1. If N is the start of a paragraph or a specific run, it looks at the style associated with the run starting at N.

By identifying {{PLACEHOLDER}} (which is bold) and inserting "Actual Value" exactly at the starting index of {{PLACEHOLDER}}, "Actual Value" becomes part of that bold Text Run.

The Reverse Sort (Step 3 in the code) is the unsung hero here. If you have a document with 10 placeholders and you replace the first one, the total character count of the document changes (unless the replacement length exactly matches the placeholder length). This would shift the absolute indices of the remaining 9 placeholders. By iterating backwards (bottom of the document to the top), we ensure that the indices we calculated during the scan phase remain valid for every operation in the batch.

Handling Edge Cases

While the code above covers 95% of use cases, real-world documents are messy. Here is how to handle the edge cases.

1. Placeholders Inside Tables

The code above iterates through body.content. However, Google Docs nests tables inside body.content, and table cells contain more structural elements. Fix: You must make the scanning function recursive. If element.table exists, iterate through table.tableRows, then tableCells, then content, calling the scanner recursively.

2. Partial Styling

Rarely, a user might style a placeholder inconsistently, such as {{USER_**NAME}}** where only the latter half is bold. Result: The code above finds the textRun containing the string. If the string is split across two runs, the regex match in a single run will fail. Fix: Ensure your template placeholders are unformatted text or consistently formatted. Scanning across run boundaries requires complex string reconstruction which hurts performance significantly.

3. Newlines in Replacements (Bullet Points)

If replacementValue contains \n and you are replacing text inside a bulleted list:

  • Standard Behavior: The new lines usually create new list items (which is often desired).
  • Pitfall: If the placeholder was the only text in the bullet point, deleting it might collapse the list structure depending on the exact index of deletion.
  • Refinement: Ensure your deleteContentRange does not include the newline character at the end of the paragraph (which holds the bullet metadata). The calculation in the code above is safe because it calculates endIndex based strictly on the placeholder string length.

Conclusion

The replaceAllText method is excellent for quick, unformatted string swaps. However, when you are generating professional documents where brand guidelines, typography, and layout integrity are non-negotiable, you cannot rely on it.

By parsing the JSON tree and manually constructing insert and delete requests in reverse order, you gain absolute control over the document generation process, ensuring your automated outputs look as polished as if they were hand-typed.