The most insidious failure mode in Haskell production systems is the slow-burning memory leak. Your application runs perfectly for days, but the Resident Set Size (RSS) creeps upward until the OOM killer terminates the process. Standard heap profiling (-hT) often changes the runtime characteristics enough to hide the bug (the "Heisenbug" effect) or requires restarting the process, destroying the state you need to inspect.
The modern solution is ghc-debug. This toolset allows you to connect to a running Haskell process, inspect the heap graph programmatically, and identify thunk buildup without stopping the world for extended periods or recompiling with heavy instrumentation.
The Root Cause: Thunks and WHNF
Haskell’s memory leaks are rarely "leaks" in the C/C++ sense (unfreed memory). They are almost always unwanted retention.
Because Haskell is lazy, an expression like acc + 1 is not evaluated immediately. It allocates a "thunk" (a closure representing the computation). If you store this thunk in a long-lived data structure (like a Map or State monad) without forcing it, the runtime builds a linked list of closures:
-- What you think you have:
10000
-- What you actually have in the heap:
((((0 + 1) + 1) + 1) ... + 1)
Data structures often only evaluate values to Weak Head Normal Form (WHNF). For a data constructor, WHNF means the constructor is known, but the fields inside might still be thunks. If your strictness annotations only force the outer layer, the inner thunks persist, retaining references to old data and growing the heap.
The Setup: A Leaky Application
Let's create a realistic scenario: a long-running service aggregating metrics. We will implement a Metrics type that looks correct but contains a subtle laziness bug.
1. The Leaky Service (Main.hs)
This application listens on a socket (simulated here via a loop) and updates a map of user metrics.
Dependencies: ghc-debug-stub, containers, text
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE NumericUnderscores #-}
module Main where
import GHC.Debug.Stub (withGhcDebug)
import Control.Concurrent (threadDelay)
import qualified Data.Map.Strict as M
import Data.Text (Text)
-- THE BUG: Fields are lazy by default.
-- Even though we use Data.Map.Strict, it only forces the 'Metrics'
-- constructor to WHNF, leaving 'requestCount' as a growing thunk.
data Metrics = Metrics
{ requestCount :: Int
, lastActive :: Int
} deriving (Show)
type State = M.Map Text Metrics
main :: IO ()
main = withGhcDebug $ do
putStrLn "Starting server with ghc-debug enabled..."
loop M.empty 0
loop :: State -> Int -> IO ()
loop state tick = do
-- Simulate work: 10ms delay
threadDelay 10_000
let user = "user_1"
-- Update state
let newState = M.alter (updateMetrics tick) user state
-- Print stats every 1000 ticks so we know it's alive
if tick `mod` 1000 == 0
then putStrLn $ "Tick: " <> show tick <> " | Heap growing..."
else pure ()
loop newState (tick + 1)
updateMetrics :: Int -> Maybe Metrics -> Maybe Metrics
updateMetrics tick Nothing =
Just $ Metrics 1 tick
updateMetrics tick (Just (Metrics count _)) =
-- This expression builds a thunk: (count + 1)
Just $ Metrics (count + 1) tick
Run this application. It will begin consuming memory slowly but surely.
The Diagnosis: Inspecting with ghc-debug
Instead of guessing, we use ghc-debug-client to take a snapshot of the heap from a separate process. We will write a debugger script to find what is dominating memory.
2. The Debugger Script (Debugger.hs)
This script connects to the socket exposed by withGhcDebug, requests a heap graph, and performs a census.
Dependencies: ghc-debug-client, ghc-debug-common
module Main where
import GHC.Debug.Client
import GHC.Debug.Retainers
import GHC.Debug.Profile
import qualified Data.List as L
import qualified Data.Map as Map
socketPath :: FilePath
socketPath = "/tmp/ghc-debug" -- Default location created by stub
main :: IO ()
main = withDebuggee Connect socketPath $ \d -> do
putStrLn "Connected to application..."
-- 1. Pause the app and request the root of the heap
run d $ do
putStrLn "Requesting heap census..."
-- 2. Perform a census by closure type (Constructor name)
-- This downloads a subset of the heap graph necessary for profiling
c <- census2LevelClosureType d (const True)
-- 3. Print top 10 heap objects
let topObjects = take 10 $ L.sortOn (negate . countSize . snd) (Map.toList c)
liftIO $ putStrLn "\n=== TOP HEAP OBJECTS (Count, Size) ==="
liftIO $ mapM_ printObject topObjects
-- 4. Advanced: Find what is retaining 'Int' thunks
-- If we see high 'Int' count, we want to know WHO points to them.
-- (In a real scenario, you'd target specific closure types found in step 3)
precache d -- Speeds up traversal
liftIO $ putStrLn "\n=== RETAINER ANALYSIS ==="
-- Find up to 5 paths to closures of type "Int" (which are likely our thunks)
retainers <- findRetainers d (Just 5) (isaConstructor "Int")
liftIO $ displayRetainers retainers
printObject :: (String, Count) -> IO ()
printObject (name, Count n size) =
putStrLn $ name <> ": " <> show n <> " objects, " <> show size <> " bytes"
-- Helper to pretty print retainer stacks
displayRetainers :: [[ClosurePtr]] -> IO ()
displayRetainers [] = putStrLn "No retainers found."
displayRetainers (r:_) = do
putStrLn "Example path to leaking object:"
-- Note: Real implementation would decode ClosurePtrs to names
-- visualizing the chain: Root -> Map -> Metrics -> Int (Thunk)
print r
3. Analysis Results
Running Debugger.hs while Main.hs is running produces output similar to this:
=== TOP HEAP OBJECTS (Count, Size) ===
Int: 50000 objects, 800000 bytes
Metrics: 1 objects, 24 bytes
...
Interpretation:
- We see a massive count of
Intobjects. - In a strict application,
Intvalues are often unboxed or shared. Seeing thousands of distinctIntclosures usually indicatesS#(Stack) thunks waiting to be evaluated. - The retainer analysis (conceptual output) reveals a chain:
Root -> ... -> Map -> Metrics -> Int.
The Metrics object exists, but inside it, the Int field is pointing to a massive chain of computations rather than a raw number.
The Fix: Strict Fields
The issue is that Data.Map.Strict evaluates the Metrics value to WHNF.
Metrics (count + 1) tickis evaluated.- The constructor
Metricsis applied. - The Reference to
(count + 1)is stored in the first field. - The addition is not performed.
To fix this, we must strictly enforce evaluation of the fields inside the data structure.
The Corrected Code
Modify the data definition in Main.hs:
-- FIX: Add bang patterns (!) to enforce strictness
data Metrics = Metrics
{ requestCount :: !Int -- <--- Strict field
, lastActive :: !Int -- <--- Strict field
} deriving (Show)
Alternatively, if you cannot modify the type definition (e.g., it comes from a library), you must force evaluation before insertion:
-- Alternative Fix using DeepSeq
import Control.DeepSeq (($!!))
updateMetrics :: Int -> Maybe Metrics -> Maybe Metrics
updateMetrics tick Nothing =
Just $ Metrics 1 tick
updateMetrics tick (Just (Metrics count _)) =
-- ($!!) fully evaluates the argument to Normal Form before wrapping in Just
Just $!! Metrics (count + 1) tick
Why This Works
When you add the bang pattern (!Int), GHC changes the memory representation of Metrics.
- Without Bangs:
Metricscontains a pointer to a heap object which could be a evaluatedIntor a thunk(1 + 1). - With Bangs: When
Metricsis constructed, the runtime must evaluate the arguments to WHNF immediately. SinceInt's WHNF is the number itself, the addition executes immediately. - Unpacking: With
-O2, GHC will likely go further and "unpack" theIntdirectly into theMetricsobject payload, removing the pointer entirely and reducing memory usage significantly.
Conclusion
Memory leaks in Haskell are almost always unforced thunks hiding in long-lived data structures. Tools like ghc-debug allow you to surgically identify these leaks in running processes without the guesswork of traditional heap profiling.
- Instrument your entry point with
withGhcDebug. - Snapshot the heap with a custom client script.
- Identify high-count closures (usually
Int,Maybe, or list nodes). - Trace retainers to find the data structure holding them.
- Strictify fields with bang patterns to prevent thunk buildup.