Memory leaks in garbage-collected languages are annoying. Memory leaks in Haskell are existential threats.
The problem is rarely that you forgot to free memory. The problem is that you told the runtime not to calculate it yet. In production, this manifests as the sawtooth pattern of doom: memory usage climbs steadily, the Garbage Collector (GC) works harder and harder to traverse a growing graph of unevaluated computations (thunks), and eventually, the application pauses for seconds at a time before the OOM killer intervenes.
Traditional profiling (-p -hc) is invasive. It requires recompilation, changes runtime characteristics, and often distorts the very race conditions you are trying to catch.
In 2025, the standard for diagnosing these issues in production is ghc-debug. This tool allows you to snapshot the heap of a running executable, analyze the closure graph, and pinpoint exactly which unevaluated thunk is retaining gigabytes of memory.
The Root Cause: The Haystack of Thunks
To fix a space leak, you must understand the STG (Spineless Tagless G-machine) representation of your data.
When you write let x = a + b, GHC does not compute x. It allocates a Thunk on the heap. This thunk is a closure containing:
- A code pointer (to the
+function). - Pointers to the environment (variables
aandb).
If x is never evaluated to Weak Head Normal Form (WHNF), that thunk remains. If x is part of a long-running recursive structure (like a state accumulator), you don't just have one thunk; you have a linked list of thunks, each pointing to the previous one.
The Space Leak: A thunk takes up small space (pointer + overhead). However, a thunk keeps its environment alive. If a tiny thunk refers to a 500MB ByteString that you thought you discarded, the GC cannot collect that ByteString. This is a "retainer" leak.
The Scenario: The "Strict" Map Trap
A common misconception is that using Data.Map.Strict eliminates space leaks. It only forces the keys and the spine of the map. It does not force the values inside the map unless the value type itself enforces strictness.
Consider this production service tracking metrics.
The Leaky Application
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
module Main where
import GHC.Generics (Generic)
import qualified Data.Map.Strict as M
import Control.Concurrent (threadDelay)
import Control.Monad (forever)
import Data.IORef
import GHC.Debug.Stub (withGhcDebug) -- Dependency: ghc-debug-stub
-- THE PROBLEM DATA TYPE
-- Even though we put this in a Strict Map, the fields 'count' and 'score'
-- are lazy by default in Haskell.
data UserMetric = UserMetric
{ count :: Int
, score :: Double
} deriving (Show, Generic)
type State = M.Map String UserMetric
updateMetric :: UserMetric -> UserMetric
updateMetric (UserMetric c s) =
-- These additions create thunks because UserMetric is lazy
UserMetric (c + 1) (s + 1.5)
main :: IO ()
main = withGhcDebug $ do
putStrLn "Starting server with ghc-debug enabled..."
ref <- newIORef M.empty
-- Simulate high-throughput event loop
forever $ do
modifyIORef ref $ \m ->
M.insertWith (\_ old -> updateMetric old) "user_123" (UserMetric 1 1.0) m
-- Artificial delay to allow us to attach the debugger
threadDelay 1000
If you run this, the heap grows indefinitely. The UserMetric values inside the map are not fully evaluated; they are building a chain of (1 + 1 + ...) thunks.
The Fix: Analysis with ghc-debug
Instead of guessing, we prove the leak. We use ghc-debug-client to connect to the running process via a socket.
Step 1: Running the Analysis
Create a separate analysis script (e.g., Debugger.hs). This script connects to the socket exposed by withGhcDebug in the main app.
module Main where
import GHC.Debug.Client
import GHC.Debug.Retainers
import GHC.Debug.Snapshot
import qualified Data.List as L
main :: IO ()
main = withDebugProbe "ghc-debug-socket" $ \trace -> do
putStrLn "Connected to application..."
-- 1. Pause the application and take a snapshot of the heap
snapshot <- requestSnapshot trace
-- 2. Analyze the heap graph
let graph = snapshotGraph snapshot
putStrLn $ "Heap size: " ++ show (heapGraphSize graph) ++ " objects"
-- 3. Search for UserMetric closures
-- We look for constructors matching our data type name
let metrics = findClosure (=="UserMetric") graph
case metrics of
[] -> putStrLn "No UserMetric objects found."
(c:_) -> do
putStrLn $ "Found " ++ show (length metrics) ++ " UserMetric objects."
-- 4. Check if they are thunks or values
-- In a leak scenario, we expect to see specific closure types
-- or deep retention chains.
-- 5. Find what is retaining these objects (The root of the evil)
retainers <- findRetainers trace c
putStrLn "Retainer stack trace for the first Metric:"
mapM_ print retainers
-- 6. Census: Count Thunks specifically
-- This is the smoking gun.
s <- census2LevelClosureType trace
let thunkCount = L.lookup "Thunk" s
putStrLn $ "Total Thunks in Heap: " ++ show thunkCount
Note: You run the leaky app in one terminal, and this debugger script in another. It communicates over the unix socket ghc-debug-socket.
The output evidence
When running the debugger against the leaky app, you will see the "Thunk" count rising linearly with the loop count. The UserMetric objects are technically in WHNF (the constructor UserMetric exists), but the fields inside them point to Thunk closures rather than Int or Double primitives.
The Solution: StrictData
We have identified that UserMetric is retaining thunks in its fields. We must enforce strictness at the data definition level.
While you can use bang patterns (!Int) on individual fields, the robust solution for data structures intended for state accumulation is the StrictData language extension. This forces fields to be evaluated to WHNF immediately upon construction.
The Corrected Code
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE StrictData #-} -- <--- THE FIX
module Main where
import GHC.Generics (Generic)
import qualified Data.Map.Strict as M
import Control.Concurrent (threadDelay)
import Control.Monad (forever)
import Data.IORef
import GHC.Debug.Stub (withGhcDebug)
-- With StrictData, Int and Double are unboxed/strict where possible
-- or at least evaluated to WHNF on construction.
data UserMetric = UserMetric
{ count :: Int
, score :: Double
} deriving (Show, Generic)
type State = M.Map String UserMetric
-- No changes needed here, but now this triggers evaluation
updateMetric :: UserMetric -> UserMetric
updateMetric (UserMetric c s) =
UserMetric (c + 1) (s + 1.5)
main :: IO ()
main = withGhcDebug $ do
putStrLn "Starting optimized server..."
ref <- newIORef M.empty
forever $ do
modifyIORef ref $ \m ->
-- insertWith combined with StrictData ensures
-- the new value is forced before the map structure is updated.
M.insertWith (\_ old -> updateMetric old) "user_123" (UserMetric 1 1.0) m
threadDelay 1000
Why This Works
Before (Lazy Fields)
M.insertWithcallsupdateMetric.updateMetricreturnsUserMetric (thunk_1) (thunk_2).Map.Strictevaluates theUserMetricconstructor (WHNF).- The fields remain pointers to unevaluated additions.
- Heap:
Map -> UserMetric -> Thunk (+) -> Thunk (+) -> ...
After (StrictData)
M.insertWithcallsupdateMetric.- Because
UserMetrichas strict fields, constructingUserMetric (c+1) (s+1.5)forces the addition immediately. - The thunks are consumed, the math is done, and the raw bits are stored (or pointers to evaluated primitives).
- Heap:
Map -> UserMetric -> Int# / Double#
Conclusion
Space leaks are not mysterious ghosts; they are simply a disconnect between your mental model of data flow and the STG machine's execution strategy.
Don't blindly sprinkle bang patterns (!) hoping for the best. Use ghc-debug to visualize the heap graph. Once you see the chain of thunks retaining your memory, the fix becomes obvious: enforce strictness at the data definition boundary using StrictData or explicit bang patterns, ensuring your long-lived state contains values, not promises.