Cursor vs Offset Pagination: Optimizing Large Data Retrieval in REST APIs

When a production database scales, few architectural choices cause as much silent degradation as standard OFFSET and LIMIT pagination. An endpoint that returns data in 20 milliseconds during testing can easily spike to multi-second response times in production when users navigate deep into a dataset.

This latency spike, known as the deep pagination performance problem, is a direct result of how relational database engines execute offset commands. Fixing this requires shifting the API architecture from offset-based retrieval to keyset (cursor) pagination.

The Mechanics of Deep Pagination Performance Degradation

To understand why a REST API fails under deep pagination, you must examine the database execution plan. Relational databases like PostgreSQL and MySQL do not maintain a physical index of row offsets.

When an API executes a query like SELECT * FROM transactions ORDER BY created_at DESC LIMIT 50 OFFSET 500000;, the database engine cannot mathematically jump to row 500,000. Instead, it must read, sort, and process 500,050 rows. After pulling these rows into memory, it discards the first 500,000 and returns the remaining 50.

As the offset grows, the database performs increasingly massive sequential scans or index traverses. This wastes high amounts of CPU cycles and pollutes the database buffer pool by evicting frequently accessed data in favor of discarded rows. The linear degradation of deep pagination performance eventually leads to API timeouts and degraded throughput for the entire system.

Implementing Cursor Pagination in a REST API

Cursor pagination (or keyset pagination) solves this by utilizing indexed columns as pointers. Instead of telling the database how many rows to skip, the API tells the database exactly where to start reading based on the last retrieved row.

To optimize API database queries, the REST API must exchange an opaque "cursor" token with the client. This cursor encapsulates the positional data of the last row the client saw.

Step 1: The Base64 Cursor Utility

It is an industry standard to encode the cursor as a Base64 string. This prevents the client from attempting to manipulate the underlying pagination logic and allows the backend to change the cursor structure (e.g., adding multi-column sorting) without breaking client contracts.

// utils/cursor.ts
export interface CursorData {
  createdAt: string;
  id: string;
}

export function encodeCursor(data: CursorData): string {
  const jsonString = JSON.stringify(data);
  return Buffer.from(jsonString).toString('base64');
}

export function decodeCursor(cursor: string): CursorData | null {
  try {
    const jsonString = Buffer.from(cursor, 'base64').toString('utf-8');
    return JSON.parse(jsonString) as CursorData;
  } catch (error) {
    return null; // Invalid cursor
  }
}

Step 2: The TypeScript/Express Implementation

In this example, we build an Express endpoint that fetches transactions. We will sort by created_at descending. Because created_at is rarely perfectly unique, we use the primary key id as a deterministic tie-breaker.

// controllers/transactionController.ts
import { Request, Response } from 'express';
import { Pool } from 'pg';
import { encodeCursor, decodeCursor, CursorData } from '../utils/cursor';

const pool = new Pool({ connectionString: process.env.DATABASE_URL });

export async function getTransactions(req: Request, res: Response) {
  const limit = Math.min(parseInt(req.query.limit as string) || 50, 100);
  const cursorQuery = req.query.cursor as string | undefined;
  
  let queryText = `
    SELECT id, amount, status, created_at 
    FROM transactions 
  `;
  const queryParams: any[] = [limit];

  if (cursorQuery) {
    const decoded = decodeCursor(cursorQuery);
    if (!decoded) {
      return res.status(400).json({ error: 'Invalid cursor format' });
    }

    // Using tuple comparison for the cursor
    queryText += ` WHERE (created_at, id) < ($2, $3) `;
    queryParams.push(decoded.createdAt, decoded.id);
  }

  queryText += ` ORDER BY created_at DESC, id DESC LIMIT $1`;

  try {
    const { rows } = await pool.query(queryText, queryParams);

    let nextCursor: string | null = null;
    if (rows.length === limit) {
      const lastRow = rows[rows.length - 1];
      nextCursor = encodeCursor({
        createdAt: lastRow.created_at.toISOString(),
        id: lastRow.id,
      });
    }

    return res.json({
      data: rows,
      meta: {
        next_cursor: nextCursor,
        has_more: nextCursor !== null,
      }
    });
  } catch (error) {
    console.error('Database query failed:', error);
    return res.status(500).json({ error: 'Internal server error' });
  }
}

Why Keyset Pagination Optimizes API Database Queries

The transition from OFFSET to a WHERE clause fundamentally changes the database execution plan. When a cursor pagination REST API provides specific starting values, the database leverages B-Tree indexes to perform an index seek.

In a B-Tree index, finding a specific node (like created_at = '2023-10-25T10:00:00Z') operates in O(log N) time complexity. Once the database locates the specific node matching the cursor, it simply reads the next 50 nodes in order. The time required to fetch the data remains constant, whether the client is requesting the first page or the ten-thousandth page.

Furthermore, cursor pagination eliminates data drift. In offset pagination, if new rows are inserted at the top of the dataset between API calls, the entire offset shifts down. This results in the client seeing duplicate records across pages. Because cursors are anchored to a specific physical row, concurrent inserts and deletes do not alter the relative position of the pagination boundary.

Handling Non-Unique Sort Columns and Determinism

A critical aspect of REST API performance tuning is ensuring strict determinism in your queries. A common pitfall when implementing cursors is sorting by a non-unique column, such as a status or a truncated timestamp.

If you paginate using WHERE created_at < $1 ORDER BY created_at DESC, and 100 records share the exact same timestamp, the database does not guarantee the order of those 100 records. A cursor pointing to the 50th record might skip the remaining 50 records entirely on the next request.

The Tuple Comparison Solution

To ensure strict ordering, the ORDER BY clause must always terminate with a globally unique, immutable column (usually the primary key). The WHERE clause must match this structure using a Row Value Constructor (tuple comparison).

In PostgreSQL and modern MySQL, the syntax (created_at, id) < ($2, $3) accurately handles ties. The database evaluates this as:

Is created_at strictly less than $2? If so, include it.
If created_at equals $2, is the id strictly less than $3? If so, include it.

Required Indexing Strategy

For the database to execute this tuple comparison optimally, you must create a composite index that matches the exact sorting direction of the query. Standard single-column indexes will not prevent a sequential scan.

-- Optimal composite index for the cursor query
CREATE INDEX idx_transactions_cursor 
ON transactions (created_at DESC, id DESC);

By defining the index with the exact sort direction (DESC), the database engine can scan the index directly without needing an expensive in-memory sort operation before applying the limit.

Architectural Decision Framework

While cursor pagination offers immense performance benefits, it is not a silver bullet for every scenario. It restricts UI design, as clients cannot jump arbitrarily to "Page 45"—they must traverse pages sequentially.

Offset pagination remains acceptable for small, static datasets or admin dashboards where the total row count is aggressively limited and arbitrary page jumping is a strict business requirement. However, for infinite scrolling feeds, mobile applications, and large-scale public APIs, implementing cursor pagination is a mandatory system design pattern to ensure sustained performance and database stability.

Programming Tutorials

Search This Blog