For over a decade, high-throughput Erlang and Elixir systems—specifically in AdTech and High-Frequency Trading (HFT)—have accepted a "NIF tax" for JSON processing. We rely on C-based libraries like jiffy (or Rust-based ones) because pure Erlang parsers were historically too slow for massive ingestion pipelines.
However, NIFs (Native Implemented Functions) introduce stability risks (segfaults crash the VM), build complexity, and scheduler overhead.
With the release of Erlang/OTP 27, we finally have a dedicated, native json module. The critical engineering question is no longer theoretical: Is the new native implementation performant enough to remove C dependencies from our critical path?
The Root Cause: The NIF Overhead vs. JIT
To understand the benchmark, we must understand why jiffy was faster and why native code is catching up.
- The NIF Advantage (Historical): Parsing text is CPU-intensive. C compilers optimize this aggressively (SIMD instructions, pointer arithmetic).
jiffyoffloads this work outside the BEAM. - The NIF Disadvantage (Latency):
- Scheduler Collapse: Long-running NIFs can block BEAM schedulers unless they are "dirty NIFs," which have their own context-switching overhead.
- Data Copying: Crossing the boundary often requires copying binaries or transforming internal C structs into Erlang terms (Maps/Lists), which impacts heap allocation.
- The OTP 27 Native Advantage:
- JIT Optimization: The BEAM JIT (introduced in OTP 24) compiles the new
jsonmodule directly to machine code. - Zero-Copy Sub-binaries: The native parser leverages Erlang's sub-binary matching capabilities efficiently, creating references rather than copies where possible.
- IO Lists: The encoder generates IO lists (nested lists of binaries) directly, which is the native format for Erlang sockets (
gen_tcp), avoiding the finaliolist_to_binaryconcatenation step that many libraries perform unnecessarily.
- JIT Optimization: The BEAM JIT (introduced in OTP 24) compiles the new
The Benchmark: Methodology and Code
We will test three scenarios relevant to production systems:
- Small Payload (HFT/Signal): < 1KB order details. High frequency, low latency requirement.
- Large Payload (State Sync): ~1MB large data dumps. Throughput requirement.
- Direct-to-Struct Decoding: Using OTP 27's push-parser to decode directly into Elixir structs, skipping the intermediate Map allocation (a common bottleneck in
jiffyusage).
Prerequisites
- Erlang/OTP 27+
- Elixir 1.17+
bencheefor statistical significance.
Setup
Create a new mix project and add dependencies:
# mix.exs
defp deps do
[
{:jiffy, "~> 1.1"},
{:benchee, "~> 1.3"},
{:jsone, "~> 1.8"} # Included for context as a pure Erlang legacy comparison
]
end
The Benchmark Script
Save this as bench/json_bench.exs. This script utilizes the new :json module's callback interface to optimize decoding.
defmodule JsonBenchmarks do
# Define a struct for the "Real World" scenario
defmodule Order do
defstruct [:id, :price, :qty, :side, :symbol, :timestamp]
end
# Generate synthetic data
def generate_payload(size_kb) do
item_count = trunc(size_kb * 1024 / 150) # Approx size per item
data = for i <- 1..item_count do
%{
id: "order_#{i}_#{System.unique_integer()}",
symbol: "AAPL",
side: Enum.random(["buy", "sell"]),
qty: :rand.uniform(1000),
price: :rand.uniform() * 200.0,
timestamp: System.os_time(:nanosecond)
}
end
:jiffy.encode(data)
end
# ---------------------------------------------------------
# OTP 27 Native Decoder Implementation
# ---------------------------------------------------------
# Standard decode to Maps (Apple-to-Apples with Jiffy)
def native_decode_maps(json_bin) do
:json.decode(json_bin)
end
# Optimized: Decode directly to Structs (Skip Map allocation)
def native_decode_structs(json_bin) do
:json.decode(json_bin, [], &struct_decoders/3)
end
# Callbacks for :json.decode/3
# This acts as a SAX-style push parser transformed into a value
defp struct_decoders(:object_start, _acc), do: []
defp struct_decoders(:object_key, key), do: key
defp struct_decoders(:object_value, value, acc), do: [value | acc]
defp struct_decoders(:object_end, values, _acc) do
# Values come in reverse order of keys in JSON.
# For a rigorous schema, we match/bind. For bench, we demonstrate construction.
# Note: In prod, you would match keys to struct fields specifically.
# This is a simplified reconstruction for benchmarking allocations.
[ts, price, qty, side, sym, id] = values
%Order{
id: id,
symbol: sym,
side: side,
qty: qty,
price: price,
timestamp: ts
}
end
defp struct_decoders(:array_start, _acc), do: []
defp struct_decoders(:array_push, value, acc), do: [value | acc]
defp struct_decoders(:array_end, values, _acc), do: :lists.reverse(values)
defp struct_decoders(_event, value, _acc), do: value
end
# ---------------------------------------------------------
# Execution
# ---------------------------------------------------------
small_payload = JsonBenchmarks.generate_payload(1) # 1KB
large_payload = JsonBenchmarks.generate_payload(1000) # 1MB
Benchee.run(
%{
"Jiffy (NIF)" => fn input -> :jiffy.decode(input, [:return_maps]) end,
"OTP 27 Native (Maps)" => fn input -> JsonBenchmarks.native_decode_maps(input) end,
"OTP 27 Native (Structs)" => fn input -> JsonBenchmarks.native_decode_structs(input) end
},
inputs: %{
"Small (1KB)" => small_payload,
"Large (1MB)" => large_payload
},
memory_time: 2,
warmup: 2,
time: 5,
formatters: [Benchee.Formatters.Console]
)
Analysis and Results
Running the benchmark (mix run bench/json_bench.exs) on an M-series silicon or modern Linux instance typically yields results with the following characteristics.
1. Small Payloads (The "HFT" Case)
In small payloads, the overhead of the NIF call (context switching, argument checking) dominates the execution time.
- Jiffy: Fast, but incurs fixed overhead per call.
- OTP 27 Native: Usually matches or outperforms Jiffy here. The JIT handles the small loop efficiently without leaving the scheduler.
Verdict: OTP 27 is ready for production for API endpoints and message passing.
2. Large Payloads (Throughput)
- Jiffy: Still retains a raw throughput advantage (often 1.5x - 2x faster) on pure byte crunching for massive files due to AVX/SIMD optimizations in C.
- OTP 27 Native: significantly faster than legacy pure-Erlang parsers (like
jsoneorpoison), but slower than C.
Verdict: If your system processes multi-megabyte JSON blobs exclusively, Jiffy holds the crown, but the gap has narrowed significantly.
3. The "Struct" Optimization (The Real Winner)
This is the critical architectural insight. In Elixir, we rarely use raw maps; we cast them to Structs (e.g., Ecto Schemas).
- Jiffy Path:
JSON Binary->NIF->Erlang Map->Elixir Enum.map->Struct.- This generates massive garbage (maps are created and immediately discarded).
- OTP 27 Callback Path:
JSON Binary->:json.decode->Struct.- By using the callback function (
&struct_decoders/3), we construct the Struct directly from the parser events.
- By using the callback function (
In memory-constrained environments or high-concurrency scenarios, OTP 27 Native (Structs) often beats Jiffy in total system efficiency because it generates significantly less garbage collection pressure, even if the raw parsing CPU time is slightly higher.
Encoding: The Hidden Benefit
The benchmark above focuses on decoding. However, encoding in OTP 27 (:json.encode/1) returns an IO List, not a binary.
# Jiffy: Allocates a new large binary
binary = :jiffy.encode(data)
:gen_tcp.send(socket, binary)
# OTP 27: Returns a list of pointers (Zero Copy)
iolist = :json.encode(data)
:gen_tcp.send(socket, iolist)
For network-bound applications, native encoding avoids a massive memory copy operation at the final stage of the pipeline.
Conclusion
The era of defaulting to jiffy is over.
- Migrate to OTP 27 Native if: You value system stability, want to remove C-compiler dependencies from your build pipeline, or primarily deal with payloads < 50KB.
- Stick with Jiffy if: You are parsing multi-megabyte JSON files in a tight loop and have already solved the "Dirty Scheduler" tuning issues.
- Refactor: Stop decoding to Maps and then casting to Structs. Use
:json.decode/3with callbacks to hydrate your domain objects directly. The memory savings alone justify the switch.
For 95% of backend services, the native implementation is now "fast enough" to trade the remaining speed gap for the safety of a pure BEAM environment.