Honest benchmarks: where my STL loses to std::

I reimplemented 28 STL containers in C++23 and benchmarked every one against its std:: equivalent. Some lost. I left the losses in the README and in the charts, because a container library that only reports its wins is telling you which workloads it picked, not how fast it is.

The one number: a ratio against std

Every benchmark reduces to a single comparable quantity, the ratio of my median time to the standard library's median time on the same workload:

r = \frac{t_{\text{my-stl}}}{t_{\text{std}}}, \qquad r < 1 \;\Rightarrow\; \text{faster}, \quad r > 1 \;\Rightarrow\; \text{slower}

Lower is better. A ratio of 1.0 means I matched the standard library; a ratio of 0.5 means I ran in half the time; a ratio of 1.2 means I am 20% slower. The chart anchors on that 1.0 line in amber so a win and a loss read at a glance.

workloadmy-stl / std

flat_set build + find (vs std::set)43.6 ns vs 189.9 ns
0.23x
flat_map build + find (vs std::map)50.2 ns vs 133.0 ns
0.38x
deque push_back + pop_front (vs std::deque)1.72 ns vs 2.89 ns
0.60x
vector push_back, no reserve (vs std::vector)1.61 ns vs 2.52 ns
0.64x
map build + find (vs std::map)239.4 ns vs 213.4 ns
1.12x
set build + find (vs std::set)216.5 ns vs 190.6 ns
1.14x
unordered_map emplace (vs std::unordered_map)48.9 ns vs 41.3 ns
1.18x

ratio = my-stl / std median ns/op, lower is better; amber rule is std at 1.0x. 200,000 ops/case, median of 5, Apple M2, Apple clang 17, -O3 -DNDEBUG. Source: public/benchmarks/my-stl.json (SHA 5e209d4).

Where the wins are real, and why

The big wins are not me out-engineering the standard library. They are a different data structure winning on a workload that suits it. My flat_map stores entries in a sorted contiguous array, so a build-then-find workload hits cache the way a node-based std::map cannot. The result is 0.378× of std::map (about 2.6x faster), and flat_set reaches 0.230× of std::set.

Where I lose, kept in

My tree-based map and set and my hash table run 12 to 18% slower than the system library: 1.122× for the map, 1.136x for the set, 1.183x for unordered_map. That is the gap between a clean from-scratch red-black tree and a standard library that has had decades of tuning poured into node layout, allocator behavior, and rebalancing constants. Coming within 18% of that is a result I am happy to report honestly, which is more useful than a cherry-picked headline.

Built on each other, like a real library

The containers self-host instead of duplicating storage logic, which is the structural point of the project. map and set are built on the same red-black tree; lru-cache is a list plus an unordered-map; unordered-map self-hosts on the repo's own vector and forward-list for buckets and chains. One real data structure backs many containers, the way a standard library is actually layered.

map, set: built on the repo's red-black tree (the 670-line core data structure).
flat_map, flat_set: built on vector: sorted contiguous storage, the cache win above.
lru-cache: built on list + unordered-map for $O(1)$ get and evict.

Making the numbers reproducible, not asserted

The numbers in the README are derived artifacts, not hand-typed claims. A Python pipeline turns raw benchmark runs into a committed CSV and JSON summary plus the light and dark SVG charts, and the published numbers match the raw data files exactly. The harness gets the microbenchmark hygiene right where most homemade ones go wrong:

do_not_optimize via an inline-asm memory clobber, so the compiler cannot delete the work being measured.
a warmup pass plus median-of-N sampling, so one cold run does not skew the result.
every run records CPU, compiler, git SHA, and dirty state, so a number is always traceable to the exact build that produced it.

meta = collect_meta()   # cpu, compiler, git sha + dirty, cmake flags
# results.json -> bench_summary.csv -> ratio-{light,dark}.svg
# README numbers are generated from this, never typed

scripts/bench/run.py: every result carries the build that made it

Across 28 containers and 55 Catch2 test cases on a 3-OS CI matrix, the honest version is the whole deliverable. The code and the raw benchmark data are at github.com/ShreeChaturvedi/my-stl.

The one number: a ratio against std

Every benchmark reduces to a single comparable quantity, the ratio of my median time to the standard library's median time on the same workload:

r = \frac{t_{\text{my-stl}}}{t_{\text{std}}}, \qquad r < 1 \;\Rightarrow\; \text{faster}, \quad r > 1 \;\Rightarrow\; \text{slower}

ratio

workloadmy-stl / std

flat_set build + find (vs std::set)43.6 ns vs 189.9 ns
0.23x
flat_map build + find (vs std::map)50.2 ns vs 133.0 ns
0.38x
deque push_back + pop_front (vs std::deque)1.72 ns vs 2.89 ns
0.60x
vector push_back, no reserve (vs std::vector)1.61 ns vs 2.52 ns
0.64x
map build + find (vs std::map)239.4 ns vs 213.4 ns
1.12x
set build + find (vs std::set)216.5 ns vs 190.6 ns
1.14x
unordered_map emplace (vs std::unordered_map)48.9 ns vs 41.3 ns
1.18x

Where the wins are real, and why

Where I lose, kept in

Built on each other, like a real library

map, set

built on the repo's red-black tree (the 670-line core data structure).

flat_map, flat_set

built on vector: sorted contiguous storage, the cache win above.

lru-cache

built on list + unordered-map for

O(1)

get and evict.

Making the numbers reproducible, not asserted

do_not_optimize via an inline-asm memory clobber, so the compiler cannot delete the work being measured.

a warmup pass plus median-of-N sampling, so one cold run does not skew the result.

every run records CPU, compiler, git SHA, and dirty state, so a number is always traceable to the exact build that produced it.

meta = collect_meta()   # cpu, compiler, git sha + dirty, cmake flags
# results.json -> bench_summary.csv -> ratio-{light,dark}.svg
# README numbers are generated from this, never typed

scripts/bench/run.py: every result carries the build that made it

Across 28 containers and 55 Catch2 test cases on a 3-OS CI matrix, the honest version is the whole deliverable. The code and the raw benchmark data are at github.com/ShreeChaturvedi/my-stl.

The one number: a ratio against std

Where the wins are real, and why

Where I lose, kept in

Built on each other, like a real library

Making the numbers reproducible, not asserted

Onemoment

The one number: a ratio against std

Where the wins are real, and why

Where I lose, kept in

Built on each other, like a real library

Making the numbers reproducible, not asserted