Solving the Dual-Write Problem: Implementing the Transactional Outbox Pattern

It’s the distributed system nightmare every backend engineer encounters eventually.

Your application successfully commits a record to the database (e.g., a new user sign-up). Immediately after, it attempts to publish an event to Apache Kafka to trigger downstream services (e.g., send a welcome email). But the network blips. The message broker is unreachable. The application throws an error, but the database transaction is already committed.

You now have a dual-write inconsistency. The user exists in your primary database, but the rest of your microservices architecture has no idea. The system is out of sync, and manual reconciliation is painful.

This guide details how to solve this strictly using the Transactional Outbox Pattern with PostgreSQL, Kafka, and Debezium.

The Anatomy of the Dual-Write Problem

Before fixing it, we must understand why the naive approach fails. In a monolithic architecture, you might rely on a single ACID transaction. In microservices, you are dealing with two distinct infrastructure components that cannot share a single transaction context without heavy performance penalties (like Two-Phase Commit/XA).

Consider this standard, yet flawed, TypeScript pseudo-code:

// ❌ THE ANTI-PATTERN
async function registerUser(user: UserDto) {
  // Step 1: Write to PostgreSQL
  const newUser = await db.users.create(user);
  
  // Step 2: Publish to Kafka
  // IF THIS FAILS, your data is inconsistent.
  await kafkaProducer.send({
    topic: 'user-events',
    messages: [{ value: JSON.stringify(newUser) }]
  });
  
  return newUser;
}

If the database write succeeds but the Kafka publish fails, you have "phantom data." If you reverse the order (publish first, then write to DB), and the DB write fails, downstream services act on data that was never persisted.

The Solution: Transactional Outbox Pattern

The Transactional Outbox pattern guarantees that a message is sent if and only if the database transaction completes successfully.

Instead of sending the message directly to Kafka, we write the message to a specifically designed "Outbox" table within the same database transaction as the business data.

Start Transaction.
Insert User into users table.
Insert Event into outbox table.
Commit Transaction.

Because SQL transactions are atomic, either both writes happen, or neither does.

We then use a Change Data Capture (CDC) tool—Debezium—to read the database transaction log (Write-Ahead Log) and stream the contents of the outbox table to Kafka.

Step 1: Database Schema Design

We need a dedicated table to store the events. We use PostgreSQL JSONB for the payload to allow schema flexibility for different event types.

-- 1. The Business Table
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    full_name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- 2. The Outbox Table
CREATE TABLE outbox (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type VARCHAR(255) NOT NULL, -- e.g., 'USER'
    aggregate_id VARCHAR(255) NOT NULL,   -- e.g., The User UUID
    type VARCHAR(255) NOT NULL,           -- e.g., 'USER_CREATED'
    payload JSONB NOT NULL,               -- The actual event data
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Step 2: Writing the Transactional Code

Here is a rigorous implementation using Node.js and the pg library. We ensure that the user creation and the event recording happen inside a single BEGIN / COMMIT block.

import { Pool } from 'pg';
import { v4 as uuidv4 } from 'uuid';

const pool = new Pool({ /* connection config */ });

interface CreateUserRequest {
  email: string;
  fullName: string;
}

export async function createUserWithOutbox(data: CreateUserRequest) {
  const client = await pool.connect();

  try {
    // 1. Start the Transaction
    await client.query('BEGIN');

    // 2. Insert Business Data
    const userId = uuidv4();
    const insertUserQuery = `
      INSERT INTO users (id, email, full_name)
      VALUES ($1, $2, $3)
      RETURNING *;
    `;
    const userResult = await client.query(insertUserQuery, [
      userId, 
      data.email, 
      data.fullName
    ]);
    const newUser = userResult.rows[0];

    // 3. Insert Outbox Event (In the SAME transaction)
    const eventId = uuidv4();
    const eventPayload = {
      id: newUser.id,
      email: newUser.email,
      source: 'user-service'
    };

    const insertOutboxQuery = `
      INSERT INTO outbox (id, aggregate_type, aggregate_id, type, payload)
      VALUES ($1, $2, $3, $4, $5);
    `;
    
    await client.query(insertOutboxQuery, [
      eventId,
      'USER',              // Aggregate Type
      newUser.id,          // Aggregate ID (for Kafka partitioning)
      'USER_CREATED',      // Event Type
      JSON.stringify(eventPayload)
    ]);

    // 4. Commit the Transaction
    await client.query('COMMIT');
    
    return newUser;

  } catch (err) {
    // 5. Rollback on ANY failure
    await client.query('ROLLBACK');
    console.error('Transaction failed, rolling back:', err);
    throw err;
  } finally {
    client.release();
  }
}

At this stage, we have achieved atomicity. The event is safely stored in Postgres. Now we need to get it to Kafka.

Step 3: Configuring Debezium (CDC)

Instead of writing a custom poller to query the outbox table (which adds latency and load), we use Debezium running on Kafka Connect. Debezium taps into PostgreSQL's Write-Ahead Log (WAL). It sees the changes as they happen at the database level and streams them.

Prerequisite: Postgres Configuration

Your postgresql.conf must have wal_level set to logical:

wal_level = logical

Debezium Connector Configuration

Submit the following JSON configuration to your Kafka Connect cluster. This tells Debezium to watch the outbox table and stream inserts to a Kafka topic.

{
  "name": "outbox-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",
    "database.hostname": "postgres-db",
    "database.port": "5432",
    "database.user": "dbuser",
    "database.password": "dbpassword",
    "database.dbname": "microservices_db",
    "topic.prefix": "dbserver1",
    "plugin.name": "pgoutput",
    
    // Critical: Only capture the outbox table
    "table.include.list": "public.outbox",
    
    // Transform formatting to remove Debezium metadata wrappers
    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.route.topic.replacement": "${routedByValue}",
    "transforms.outbox.table.field.event.key": "aggregate_id",
    "transforms.outbox.table.field.event.payload": "payload",
    "transforms.outbox.route.by.field": "type"
  }
}

Understanding the Configuration

table.include.list: We filter specifically for the outbox table. We don't want to stream the raw users table because that exposes internal database schema to external consumers.
transforms.outbox: This is the Debezium Outbox Event Router SMT (Single Message Transform). It is crucial.
- It takes the payload column from your table and makes it the actual Kafka message body.
- It takes the aggregate_id and makes it the Kafka Record Key (ensuring ordering by user ID).
- It routes the message to a specific topic based on the type column (or a static mapping).

Deep Dive: Why This Works

The "magic" here relies on the durability of the database log.

When you issue a COMMIT in PostgreSQL, the database flushes the transaction details to the WAL on disk before acknowledging success to the client. Even if the database crashes immediately after, the WAL contains the record of the outbox insert.

Debezium acts as a replication client. It creates a Replication Slot in Postgres. Postgres remembers exactly where Debezium left off in the WAL. If Kafka goes down, or Debezium crashes, the pointer remains. Once the infrastructure recovers, Debezium resumes reading the WAL from that exact position. No events are lost.

Common Pitfalls and Edge Cases

While robust, this architecture introduces new challenges you must handle.

1. At-Least-Once Delivery & Idempotency

Debezium guarantees at-least-once delivery. In rare network failure scenarios, Debezium might restart and re-send a few events that were already processed.

The Fix: Your downstream consumers must be idempotent.

// Consumer Logic Example
async function handleUserCreated(event) {
  // Check if we've already processed this message ID
  if (await isMessageProcessed(event.id)) {
    return; // Skip duplicate
  }
  
  await sendWelcomeEmail(event.email);
  await markMessageProcessed(event.id);
}

2. Outbox Table Growth

Since you are inserting a row for every event, the outbox table will grow indefinitely.

The Fix: Implement a cleanup strategy. You cannot delete the row immediately after insertion because Debezium needs time to read it from the WAL.

Partitioning: Partition the table by time and drop old partitions.
Vacuum Process: A separate background job that deletes rows older than X minutes/hours.

-- Simple cleanup job (run periodically via cron or pg_cron)
DELETE FROM outbox WHERE created_at < NOW() - INTERVAL '1 hour';

3. Message Ordering

Kafka only guarantees ordering within a partition.

The Fix: Ensure you populate the aggregate_id correctly in your SQL insert. In the Debezium config, we mapped aggregate_id to the Kafka message key. If all events for User A have the same key, they land in the same partition and are processed sequentially.

Conclusion

The Dual-Write problem is a fundamental challenge in microservices, but it is solvable. By leveraging the Transactional Outbox pattern, you decouple your service's availability from the message broker's availability while ensuring strict data consistency.

You move the complexity from the application code (retries, error handling, 2PC) to the infrastructure (Debezium, Kafka Connect), resulting in a system that is resilient, scalable, and easier to reason about.

Programming Tutorials

Search This Blog