Skip to main content

UUIDv4 vs UUIDv7: Fixing B-Tree Fragmentation in PostgreSQL

 If you are using standard uuid_generate_v4() or gen_random_uuid() as a Primary Key in PostgreSQL, you have likely architected a performance time bomb.

It usually detonates when the table size exceeds available RAM. In the early days of an application, with 100,000 rows, your entire index fits in shared_buffers (Postgres's page cache). Writes are fast because they are effectively in-memory operations.

However, as you scale to 50M+ rows, insert latency stops being linear and hits a "random I/O cliff." This post analyzes why the B-Tree fails with random inputs and how to solve it using UUIDv7 (RFC 9562).

The Root Cause: B-Tree Mechanics and Random I/O

PostgreSQL uses B-Tree indexes for Primary Keys by default. B-Trees are balanced tree structures optimized for sorted data. They strive to keep the tree balanced to ensure O(log n) lookup times.

The Problem with Randomness (UUIDv4)

A UUIDv4 is 128 bits of pseudo-randomness. When you insert a new row with a random key, Postgres cannot simply append it to the end of the index. It must:

  1. Traverse the B-Tree to find the specific leaf page where this random value belongs to maintain sort order.
  2. Load that page from disk into shared_buffers (if it's not already there).
  3. Insert the key.

The Performance Penalty

When your index size exceeds RAM, the probability that the specific leaf page required for an insert is currently in memory approaches zero.

  1. Cache Thrashing: Every INSERT triggers a random read I/O to fetch the page from the disk before the write can occur. Your Write IOPS are bottlenecked by your Read IOPS.
  2. Page Splitting & WAL Bloat: Inserting into the middle of full pages forces page splits. Postgres has to move half the tuples to a new page. This generates massive amounts of Write Ahead Log (WAL) data, increasing replication lag and disk pressure.
  3. Fragmentation: Random inserts leave pages partially full (often around 50-60% fill factor after splits), bloating the physical size of the index on disk and reducing cache efficiency.

The Solution: UUIDv7 (Time-Ordered)

UUIDv7, standardized in RFC 9562 (May 2024), solves this by embedding a Unix timestamp in the high bits of the ID.

Structure of UUIDv7:

  • 48 bits: Unix Timestamp in milliseconds.
  • 4 bits: Version (0111).
  • 12 bits: Pseudo-random data (or counter for sub-ms precision).
  • 2 bits: Variant (10).
  • 62 bits: Pseudo-random data.

Because the high bits are time-based, UUIDv7 values are monotonic. New IDs are almost always numerically greater than previous IDs.

Why This Fixes B-Tree Performance

  1. Sequential Writes: New keys go to the "right edge" of the B-Tree.
  2. Cache Locality: The leaf page receiving the insert is the most recently created one, meaning it is almost certainly hot in shared_buffers.
  3. Reduced Page Splits: Pages fill up completely before a new page is allocated. You achieve near 100% fill factor, reducing index size on disk.

Implementation

Scenario A: PostgreSQL 17+ (Native Support)

If you are running PostgreSQL 17 or higher, support is built-in. You do not need extensions.

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT uuidv7(),
    user_id UUID NOT NULL,
    total_amount NUMERIC(10, 2) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Example Insert
INSERT INTO orders (user_id, total_amount) 
VALUES (gen_random_uuid(), 99.99) 
RETURNING id;

Scenario B: PostgreSQL < 17 (The Polyfill)

For production systems on PG 13, 14, 15, or 16, you can implement a robust UUIDv7 generator using PL/pgSQL and pgcrypto (or standard random functions).

This function strictly adheres to the bit-layout defined in the RFC.

-- Ensure pgcrypto is enabled for random bytes generation if needed, 
-- though we can use native gen_random_bytes in modern PG versions.
CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE OR REPLACE FUNCTION generate_uuid_v7()
RETURNS uuid
AS $$
DECLARE
    v_time double precision := null;
    v_unix_t_ms bytea;
    v_rand_a bytea;
    v_rand_b bytea;
    v_output bytea;
BEGIN
    -- 1. Get current time in milliseconds
    v_time := EXTRACT(EPOCH FROM clock_timestamp()) * 1000;
    
    -- 2. Convert time to 48-bit Big Endian bytea
    -- We cast to bigint first, then to bytea
    v_unix_t_ms := substring(
        int8send(floor(v_time)::bigint) 
        from 3 for 6
    );

    -- 3. Generate random bits
    v_rand_a := gen_random_bytes(2); -- Need 12 bits, getting 16
    v_rand_b := gen_random_bytes(8); -- Need 62 bits, getting 64

    -- 4. Construct the UUID
    -- Layout:
    -- unix_ts_ms (48 bits)
    -- ver (4 bits) - Version 7
    -- rand_a (12 bits)
    -- var (2 bits) - Variant 1
    -- rand_b (62 bits)

    v_output := v_unix_t_ms ||
                -- Masking for Version 7 (0111) in the high nibble of the 7th byte
                set_byte(
                    substring(v_rand_a from 1 for 2),
                    0,
                    (get_byte(substring(v_rand_a from 1 for 2), 0) & 15) | 112
                ) ||
                -- Masking for Variant 1 (10) in the high bits of the 9th byte
                set_byte(
                    v_rand_b,
                    0,
                    (get_byte(v_rand_b, 0) & 63) | 128
                );

    RETURN encode(v_output, 'hex')::uuid;
END
$$ LANGUAGE plpgsql VOLATILE PARALLEL SAFE;

-- Usage in Table Definition
CREATE TABLE audit_logs (
    id UUID PRIMARY KEY DEFAULT generate_uuid_v7(),
    event_name TEXT NOT NULL,
    payload JSONB
);

Client-Side Generation (Node.js/TypeScript)

Often, it is preferable to generate IDs in the application layer to return the ID to the client before the database round-trip completes. Use the standard uuid library (version 10+ supports v7).

import { v7 as uuidv7 } from 'uuid';
import { db } from './db'; // hypothetical db client

interface User {
  id: string;
  email: string;
}

export async function createUser(email: string): Promise<User> {
  // Generate monotonic ID application-side
  const id = uuidv7();
  
  await db.query(
    'INSERT INTO users (id, email) VALUES ($1, $2)',
    [id, email]
  );
  
  return { id, email };
}

Migration Strategy: Can I switch existing tables?

You cannot magically convert existing UUIDv4 data to UUIDv7, but you can mix them in the same column if the column type is UUID.

However, the performance benefits only apply to the new data. The B-Tree will contain a chaotic mix of random v4 keys (scattered everywhere) and ordered v7 keys (appending to the right).

For a complete fix on an existing massive table:

  1. Add a new column: new_id with type UUID.
  2. Backfill: Populate new_id with v7 UUIDs (you may have to fake the timestamp based on a created_at column to maintain index clustering correlation).
  3. Swap: Promote new_id to Primary Key.

Note: This is a heavy blocking operation for the PK swap. Use standard zero-downtime migration patterns (triggers + dual writes).

Conclusion

UUIDv4 allows for decentralized ID generation but sacrifices database write performance due to index fragmentation. BIGSERIAL offers performance but leaks business intelligence (enumerability) and complicates distributed systems.

UUIDv7 is the architectural standard for modern high-scale systems. It provides the locality of reference required for high-throughput B-Tree insertion while maintaining the global uniqueness and stateless generation properties of UUIDs.

If you are designing a schema today, use UUIDv7.