Skip to main content

Beyond WAL: Optimizing SQLite for High-Concurrency Writes in Node.js

 You migrated your SQLite database to Write-Ahead Logging (WAL) mode. You expected the promised concurrency gains—readers not blocking writers, and snapshot isolation. Yet, as your Node.js application scales and write volume spikes, your logs are filling with SQLITE_BUSY errors.

Increasing the busy_timeout is a band-aid, not a fix. In high-throughput Node.js environments, specifically those operating at the Edge or in microservices, relying solely on SQLite's internal locking mechanism leads to contention.

To solve this, we must move beyond basic PRAGMA settings and implement a deterministic write strategy. This post details the root cause of WAL bottlenecks in asynchronous runtimes and provides a production-grade serialization pattern using better-sqlite3.

The Anatomy of a Bottleneck: Why WAL Fails Under Load

To fix the problem, you must understand the locking model. In default rollback mode, a write locks the entire database file. WAL improves this by appending changes to a separate -wal file.

However, WAL does not grant unlimited concurrent writes.

1. The Single Writer Constraint

SQLite allows multiple simultaneous readers but strictly one simultaneous writer.

When a write transaction begins, it acquires a RESERVED lock. It eventually promotes this to an EXCLUSIVE lock to commit. While WAL makes this period brief, it is not instantaneous.

2. The Asynchronous Mismatch

Node.js is event-driven and non-blocking. If your API receives 100 simultaneous POST requests, Node.js will attempt to initiate 100 SQLite write transactions almost purely in parallel.

Since SQLite can only process one write at a time, 99 of those requests immediately hit a lock. SQLite attempts to wait (defined by busy_timeout), but if the transaction volume exceeds the throughput of the disk I/O, requests time out.

3. The Checkpoint Stall

The WAL file cannot grow indefinitely. Periodically, SQLite must run a "Checkpoint" to merge WAL pages back into the main .db file. During a passive checkpoint, if readers are active, the WAL cannot be fully truncated.

However, under heavy write pressure, the WAL file grows large, slowing down directory lookups. If a RESTART or FULL checkpoint is triggered, it may block all writers until it completes.

Phase 1: Aggressive Pragma Tuning

Before changing application architecture, we must optimize the database engine configuration. The defaults are designed for safety on unreliable hardware, not performance on managed infrastructure.

Apply the following PRAGMA statements immediately upon connection.

journal_mode = WAL

Enables Write-Ahead Logging. This is the baseline requirement.

synchronous = NORMAL

By default (FULL), SQLite calls fsync() after every transaction commit to ensure data survives a power loss. Setting this to NORMAL in WAL mode allows the OS to handle the flush.

  • Risk: You might lose the specific transactions in the WAL if the OS crashes or power fails.
  • Safety: You will not corrupt the database if the application crashes.
  • Gain: Massive write performance boost because we are not waiting for physical disk spin/flash.

busy_timeout = 5000

Set a fallback timeout (in milliseconds). While our code optimization aims to avoid relying on this, it is a necessary safety net for external interruptions (like backups).

temp_store = MEMORY

Forces temporary tables and indices to live in RAM, reducing disk I/O.

Phase 2: Application-Side Serialization

The most effective way to eliminate SQLITE_BUSY in Node.js is to stop fighting the database's single-writer constraint and embrace it.

Instead of letting 100 async requests fight for the lock, we implement a Synchronous Write Queue.

We will use better-sqlite3. Unlike sqlite3 (which is asynchronous), better-sqlite3 is synchronous. In the context of SQLite, synchronous is better. Since the database is a local file, the context switching overhead of async/await often exceeds the I/O time for small writes.

The Implementation

Below is a robust Database class that implements the Singleton pattern, enforces correct pragmas, and utilizes transaction batching.

import Database from 'better-sqlite3';
import type { Database as SQLiteDB, Transaction } from 'better-sqlite3';
import path from 'node:path';

interface DBConfig {
  path: string;
  timeout?: number;
}

export class SQLiteManager {
  private static instance: SQLiteManager;
  private db: SQLiteDB;

  private constructor(config: DBConfig) {
    // 1. Initialize with better-sqlite3
    // 'verbose' can be added for logging in dev
    this.db = new Database(config.path, {
      timeout: config.timeout ?? 5000, 
    });

    this.configurePragmas();
  }

  public static getInstance(config: DBConfig): SQLiteManager {
    if (!SQLiteManager.instance) {
      SQLiteManager.instance = new SQLiteManager(config);
    }
    return SQLiteManager.instance;
  }

  private configurePragmas(): void {
    // 2. Performance Tuning
    this.db.pragma('journal_mode = WAL');
    this.db.pragma('synchronous = NORMAL');
    this.db.pragma('temp_store = MEMORY');
    
    // Optional: Increases memory map size to keep DB in RAM (e.g., 256MB)
    // Adjust based on your available system memory
    this.db.pragma('mmap_size = 268435456');
  }

  /**
   * Executes a read-only query.
   * Since WAL allows concurrent reads, we don't need to queue this.
   */
  public query<T = any>(sql: string, params: any[] = []): T[] {
    try {
      const stmt = this.db.prepare(sql);
      return stmt.all(params) as T[];
    } catch (error) {
      console.error('Read Error:', error);
      throw error;
    }
  }

  /**
   * Executes a write operation immediately.
   * In better-sqlite3, this is synchronous and blocks the event loop.
   * For short writes (< 5ms), this is more efficient than async overhead.
   */
  public execute(sql: string, params: any[] = []):  Database.RunResult {
    try {
      const stmt = this.db.prepare(sql);
      return stmt.run(params);
    } catch (error) {
      console.error('Write Error:', error);
      throw error;
    }
  }

  /**
   * WRAPPED TRANSACTION
   * This is critical for data integrity. 
   * It ensures multiple writes succeed or fail together.
   */
  public transaction<T>(
    action: (db: SQLiteDB) => T
  ): T {
    const txn = this.db.transaction((fn: (db: SQLiteDB) => T) => {
      return fn(this.db);
    });
    
    // Immediate mode acquires the write lock right away,
    // preventing deadlocks where a read upgrades to a write.
    return txn.immediate(action);
  }
  
  public close(): void {
    this.db.close();
  }
}

// Usage Example
const dbManager = SQLiteManager.getInstance({ 
  path: path.resolve(process.cwd(), 'data.db') 
});

// A High-Concurrency Write Handler
export function recordEvent(eventType: string, payload: Record<string, any>) {
  // Because this is synchronous, Node.js will complete this function 
  // before processing the next event in the loop. 
  // The "Queue" is effectively Node's own call stack.
  
  dbManager.transaction((db) => {
    const insert = db.prepare(
      'INSERT INTO events (type, payload, created_at) VALUES (?, ?, ?)'
    );
    insert.run(eventType, JSON.stringify(payload), Date.now());
    
    // Additional logic that must be part of the same transaction
    const updateStats = db.prepare(
      'UPDATE stats SET count = count + 1 WHERE type = ?'
    );
    updateStats.run(eventType);
  });
}

Why This Fixes SQLITE_BUSY

1. Synchronous Execution

By using better-sqlite3, we leverage Node.js's single-threaded nature to our advantage. When recordEvent runs, it blocks the event loop. No other request can attempt a write until this function finishes.

You have effectively moved the lock queue from the filesystem (SQLite trying to lock the file) to the application memory (Node waiting to execute the next function). Memory queuing is exponentially faster than disk locking loops.

2. Immediate Transactions

Notice txn.immediate. Standard SQLite transactions start as deferred (reading allowed). If you read and then write, you have to upgrade the lock, which causes deadlocks if two connections try to upgrade simultaneously. IMMEDIATE mode grabs the write lock instantly.

3. Reduced System Call Overhead

better-sqlite3 bypasses the C++ to JavaScript asynchronous bridge. For simple queries, this can be 10x faster than sqlite3. Faster execution means the lock is held for less time, increasing overall throughput.

Advanced Scenario: The Dedicated Writer Worker

If your write operations are heavy (e.g., heavy JSON parsing, constraint checking) and blocking the main event loop is causing latency on your HTTP read endpoints, you should offload writes to a Worker Thread.

This architecture ensures your API remains responsive while maintaining a single, serialized writer.

// writer-worker.ts
import { parentPort } from 'node:worker_threads';
import { SQLiteManager } from './db-manager';

const db = SQLiteManager.getInstance({ path: './data.db' });

if (parentPort) {
  parentPort.on('message', (task) => {
    try {
      if (task.type === 'WRITE') {
        const result = db.execute(task.sql, task.params);
        parentPort?.postMessage({ id: task.id, status: 'success', result });
      }
    } catch (err) {
      parentPort?.postMessage({ 
        id: task.id, 
        status: 'error', 
        error: err instanceof Error ? err.message : 'Unknown error' 
      });
    }
  });
}

In your main thread, you would push messages to this worker. The worker processes them one by one (sequentially), guaranteeing no SQLITE_BUSY contention.

Common Pitfalls and Edge Cases

1. The Network File System (NFS) Trap

Never run SQLite over NFS, SMB, or EFS (AWS). File locking protocols over networks are notoriously unreliable. SQLite requires strictly correct file locking. If you need a shared database, use a client-server DB (Postgres) or block replication tools like Litestream.

2. Long-Running Transactions

The serialization strategy fails if a single transaction takes 2 seconds. Keep transactions atomic and small. Do data processing in JavaScript before opening the transaction block.

3. Vacuuming

WAL files can grow indefinitely if not checkpointed. While PRAGMA wal_autocheckpoint handles this generally, heavy loads might require a separate process or a scheduled cron job to run PRAGMA wal_checkpoint(TRUNCATE) during low-traffic windows to reclaim disk space.

Conclusion

SQLITE_BUSY is not a sign that SQLite is "too weak" for production; it is a signal that your access pattern conflicts with the engine's locking model.

By switching to journal_mode=WAL, relaxing flush safety with synchronous=NORMAL, and most importantly, aligning your Node.js application architecture to serialize writes via better-sqlite3, you can achieve thousands of writes per second on a single instance without contention.