I reimplemented 28 STL containers in C++23 and benchmarked every one against its std:: equivalent. Some lost. I left the losses in the README and in the charts, because a container library that only reports its wins is telling you which workloads it picked, not how fast it is.
The one number: a ratio against std
Every benchmark reduces to a single comparable quantity, the ratio of my median time to the standard library's median time on the same workload:
Lower is better. A ratio of 1.0 means I matched the standard library; a ratio of 0.5 means I ran in half the time; a ratio of 1.2 means I am 20% slower. The chart anchors on that 1.0 line in amber so a win and a loss read at a glance.
- flat_set build + find (vs std::set)43.6 ns vs 189.9 ns0.23x
- flat_map build + find (vs std::map)50.2 ns vs 133.0 ns0.38x
- deque push_back + pop_front (vs std::deque)1.72 ns vs 2.89 ns0.60x
- vector push_back, no reserve (vs std::vector)1.61 ns vs 2.52 ns0.64x
- map build + find (vs std::map)239.4 ns vs 213.4 ns1.12x
- set build + find (vs std::set)216.5 ns vs 190.6 ns1.14x
- unordered_map emplace (vs std::unordered_map)48.9 ns vs 41.3 ns1.18x
Where the wins are real, and why
The big wins are not me out-engineering the standard library. They are a different data structure winning on a workload that suits it. My flat_map stores entries in a sorted contiguous array, so a build-then-find workload hits cache the way a node-based std::map cannot. The result is 0.378× of std::map (about 2.6x faster), and flat_set reaches 0.230× of std::set.
Where I lose, kept in
My tree-based map and set and my hash table run 12 to 18% slower than the system library: 1.122× for the map, 1.136x for the set, 1.183x for unordered_map. That is the gap between a clean from-scratch red-black tree and a standard library that has had decades of tuning poured into node layout, allocator behavior, and rebalancing constants. Coming within 18% of that is a result I am happy to report honestly, which is more useful than a cherry-picked headline.
Built on each other, like a real library
The containers self-host instead of duplicating storage logic, which is the structural point of the project. map and set are built on the same red-black tree; lru-cache is a list plus an unordered-map; unordered-map self-hosts on the repo's own vector and forward-list for buckets and chains. One real data structure backs many containers, the way a standard library is actually layered.
- map, set
- built on the repo's red-black tree (the 670-line core data structure).
- flat_map, flat_set
- built on
vector: sorted contiguous storage, the cache win above. - lru-cache
- built on
list+unordered-mapfor get and evict.
Making the numbers reproducible, not asserted
The numbers in the README are derived artifacts, not hand-typed claims. A Python pipeline turns raw benchmark runs into a committed CSV and JSON summary plus the light and dark SVG charts, and the published numbers match the raw data files exactly. The harness gets the microbenchmark hygiene right where most homemade ones go wrong:
do_not_optimizevia an inline-asm memory clobber, so the compiler cannot delete the work being measured.- a warmup pass plus median-of-N sampling, so one cold run does not skew the result.
- every run records CPU, compiler, git SHA, and dirty state, so a number is always traceable to the exact build that produced it.
meta = collect_meta() # cpu, compiler, git sha + dirty, cmake flags
# results.json -> bench_summary.csv -> ratio-{light,dark}.svg
# README numbers are generated from this, never typedAcross 28 containers and 55 Catch2 test cases on a 3-OS CI matrix, the honest version is the whole deliverable. The code and the raw benchmark data are at github.com/ShreeChaturvedi/my-stl.