Flat ordered multimap container. More...

#include <vector>
#include <algorithm>
#include <memory>
#include <utility>
#include <type_traits>
#include <cstddef>
#include <stdexcept>
#include "jh/conceptual/tuple_like.h"
#include "jh/conceptual/container_traits.h"

Classes
class	jh::flat_multimap< K, V, Alloc >
	Flat ordered multimap implemented as a sorted contiguous container. More...

Detailed Description

Flat ordered multimap container.

Copyright: Copyright 2025 JeongHan-Bae <mastropseudo@gmail.com>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Full license: GitHub

Author: JeongHan-Bae <mastropseudo@gmail.com>

Design Rationale — Why `flat_multimap` Exists

jh::flat_multimap<K,V> is not an extension of ordered_map to support multiple values per key, nor is it a tree-based multimap variant. It exists to serve a fundamentally different access pattern.

In practical systems, a multimap is rarely used merely because "a key can have multiple values". Instead, the defining operations are:

efficient range queries for a single key (equal_range),
batch processing of all values associated with a key,
bulk erasure of all entries for a key (erase(key)).

These operations are structurally hostile to tree-based containers that maintain balance via rotations. In a contiguous AVL tree, removing a range of equivalent keys would require repeated node removal, each potentially triggering rebalancing and rotations. The resulting cost is not only higher, but also less predictable.

For this reason, the ordered_* family deliberately does not provide bulk-erasure or multimap-style range deletion APIs. Their design goal is stable, per-element operations with bounded rebalancing cost — a goal that conflicts with multimap semantics.

Flat Multimap: Turning an Algorithm into a Container

flat_multimap embraces a different principle:

Store elements contiguously, keep them stably sorted by key, and express multimap semantics as range operations over a flat sequence.

Internally, the container is little more than a std::vector<std::pair<K,V>> maintained in sorted order. Multimap operations are implemented using binary search (lower_bound, upper_bound) and contiguous range operations on the underlying storage.

This effectively packages the "sorted vector + binary search"; algorithm into a first-class container, with explicit semantics for:

contiguous storage of equivalent keys,
range-oriented lookup,
batch erasure with a single compaction step.

In workloads such as in-memory tables, indexing layers, routing maps, or subscription registries, this model aligns naturally with the dominant access patterns: scan, group, and rebuild.

Why Not Extend the AVL-Based `ordered_map`?

While it is theoretically possible to add multimap support to a contiguous AVL structure, doing so would compromise its core invariants:

range deletion would trigger repeated rotations,
erase(key) would devolve into multiple independent erase operations,
cost predictability would degrade under multimap workloads.

Rather than overloading the ordered_* containers with semantics they are ill-suited for, flat_multimap provides a purpose-built structure whose behavior is transparent and intentionally biased toward bulk and range-oriented operations.

Why There Is No `flat_multiset`

A multiset variant would add little semantic value. A sorted sequence of keys with duplicates is already fully expressible as a std::vector<K> with std::stable_sort.

Unlike flat_multimap, which must expose key-value association and range-based deletion, a hypothetical flat_multiset would amount to a thin wrapper over an existing algorithm without providing meaningful additional structure or guarantees.

About Performance

Benchmarks (Apple Silicon M3, LLVM clang++20, 2025) compare jh::flat_multimap against std::multimap under representative workloads. Measurements were taken at both -O0 and -O3 -march=native, with consistent trends observed across optimization levels.

Operation	`std::multimap`	`jh::flat_multimap`	Notes
Random insert (5k)	faster	slower	tree insertion avoids bulk movement
Ordered insert (5k)	slower, higher variance	consistently faster	contiguous insertion dominates
Bulk construction (50k)	allocator-dominated	≈ 2-4× faster	single append + stable sort
Random lookup	slightly faster	slightly slower	binary search vs pointer traversal
Iteration	pointer chasing	≈ 50-90× faster	sequential memory traversal
Erase by key	faster	slower	range compaction cost

Observed Behavior

Large-scale construction using bulk_insert consistently outperforms node-based multimap insertion at 50k elements and beyond.
Sequential iteration shows a decisive advantage for flat_multimap, often exceeding one order of magnitude.
Lookup and erase operations are generally slower than std::multimap under low-pressure workloads, reflecting the cost of binary search and contiguous range compaction.
Under high-density datasets (e.g. ≥ 1M elements with dense key distributions), lookup performance converges toward that of std::multimap, as contiguous memory layout and cacheline locality increasingly dominate pointer-based traversal costs.
Performance characteristics remain stable across optimization levels, indicating that results are dominated by memory layout and access patterns rather than compiler-specific optimizations.

Design Summary

The final design of flat_multimap follows the same fundamental philosophy as the ordered_* family: prioritizing cache locality, high hit rates, and contiguous memory layout over optimal asymptotic performance for individual operations.

For small datasets, incremental insertion via insert or emplace is sufficient and convenient. As the container grows beyond a few thousand elements (typically around 5k entries), bulk-oriented construction is strongly recommended to preserve predictable performance characteristics.

Compared to std::multimap, flat_multimap deliberately trades a portion of lookup and erase performance (on the order of ~30% under low-pressure workloads) in exchange for:

significantly improved L1/L2 cache hit rates,
fully contiguous storage with no allocator-induced fragmentation,
more stable performance at larger scales.

While pointer-based containers often perform well in microbenchmarks executed in isolation, such measurements typically assume an idealized environment with no long-term allocator pressure or memory fragmentation. In real-world systems, contiguous containers tend to exhibit more stable behavior over time due to their compact layout and predictable access patterns.

flat_multimap is therefore intended as a locality-optimized, range-oriented structure for large in-memory datasets, rather than as a drop-in replacement for tree-based multimaps in all scenarios.

Classes