Technology

B+ Tree Index

Anvil DB uses B+ trees as the foundational index structure for property lookups, unique constraints, composite keys, full-text search, and spatial queries. The implementation is fully generic, supports bulk loading, concurrent access, and leaf-linked range scans.

Architecture

The B+ tree is a balanced tree structure where all data resides in leaf nodes, and internal nodes contain only separator keys for navigation. Leaf nodes are linked in a forward chain, enabling efficient range scans without tree traversal.

data structures (bptree.rs)rust
pub struct BPlusTree<K: Ord + Clone, V: Clone> {
    order: usize,              // Branching factor (min 3)
    root: Option<usize>,       // Root node ID
    nodes: Vec<Node<K, V>>,    // All nodes in the tree
    len: usize,                // Total key-value pairs
}

enum Node<K, V> {
    Internal(InternalNode<K>),  // Separator keys + child pointers
    Leaf(LeafNode<K, V>),       // Data keys + values + next_leaf
}

struct LeafNode<K, V> {
    keys: Vec<K>,
    values: Vec<V>,
    next_leaf: Option<usize>,   // Forward pointer for range scans
}

Operations

Point Lookup

Single-key lookup navigates from root to leaf using binary search at each internal node, then binary searches the leaf's key array. Complexity: O(log n).

Insert with Auto-Split

Insertion finds the correct leaf and adds the key-value pair. When a leaf exceeds capacity (order - 1 keys), it splits at the midpoint: the right half becomes a new leaf and the median key is promoted to the parent. Internal nodes split the same way, propagating up to the root if needed.

split logictext
Insert key into leaf
  ↓ leaf.keys.len() > max_keys (order - 1)?
  ↓ YES: split at midpoint
     Left leaf: keys[..mid], values[..mid]
     Right leaf: keys[mid..], values[mid..]
     Promote keys[mid] to parent
  ↓ Parent overflow? Split parent recursively
  ↓ Root splits? Create new root with single key

Delete with Merge/Redistribute

Deletion removes the key from its leaf. If the leaf underflows below the minimum (ceil((order-1)/2) keys), the tree first attempts to redistribute entries from a sibling. If redistribution isn't possible (sibling also at minimum), the nodes merge. Merges can cascade up to the root, reducing tree height.

Range Scans

Range queries navigate to the start leaf, then follow the next_leaf linked list until the upper bound is reached. No tree traversal is needed after finding the start position.

OperationMethodDescription
Full scaniter()All entries in sorted order via leaf chain
From keyrange_from(&K)Entries where key >= start
To keyrange_to(&K)Entries where key <= end
Betweenrange(&K, &K)Closed interval [start, end]
Prefixprefix_scan(&K)Keys sharing a common prefix (String, Vec<u8>)

Bulk Loading

Pre-sorted data can be loaded bottom-up in a single pass, bypassing the insert-split cycle entirely. Leaf nodes are packed to capacity and linked, then internal levels are built from the leaf boundaries. This is significantly faster than repeated inserts for large datasets.

bulk loadrust
// Build tree from 10,000 sorted entries in one pass
let tree = BPlusTree::bulk_load(
    64, // order
    sorted_data.into_iter(),
);

Capacity Rules

All capacity limits are derived from the configurable order parameter (minimum 3). The order determines the maximum fan-out of internal nodes.

ParameterFormulaExample (order=64)
Max keys per nodeorder - 163
Min leaf keysceil((order - 1) / 2)32
Min internal keysceil(order / 2) - 131
Max children per internalorder64
Root minimum1 key (no minimum)1

Concurrent Access

The ConcurrentBPlusTree wrapper provides thread-safe access via Arc<RwLock>. Multiple readers can access the tree concurrently; writers acquire an exclusive lock. Custom read/write closures are supported for complex operations.

concurrent wrapper (concurrent_bptree.rs)rust
pub struct ConcurrentBPlusTree<K, V> {
    inner: Arc<RwLock<BPlusTree<K, V>>>,
}

// Multiple concurrent readers
tree.get(&key);           // shared read lock
tree.range_to_vec(&a, &b); // shared read lock

// Exclusive writer
tree.insert(key, value);  // exclusive write lock
tree.remove(&key);        // exclusive write lock

Index Types Built on B+ Tree

The index manager uses B+ trees as the underlying structure for all index types in Anvil DB.

Index TypeKeyValueUse Case
PropertyPropertyValueVec<NodeId>Fast property lookups: WHERE n.name = 'Alice'
UniquePropertyValueNodeIdUniqueness constraints: one node per key
Composite(PropertyValue, PropertyValue)Vec<NodeId>Multi-property lookups
Full-textString (term)Vec<(NodeId, f32)>Text search with TF-IDF scoring
SpatialMorton code (u64)Vec<NodeId>Geospatial queries via Z-order curve
Label scanLabelIdVec<NodeId>Fast label-based node retrieval
Relationship(RelTypeId, NodeId)Vec<RelId>Relationship traversal acceleration

Prefix Scanning

The B+ tree supports prefix-based range scans for String and Vec<u8> keys via the PrefixBound trait. The trait computes the exclusive upper bound of a prefix range by incrementing the last non-0xFF byte.

prefix scanrust
// Find all keys starting with "usr_"
for (key, val) in tree.prefix_scan(&"usr_".to_string()) {
    // yields: "usr_001", "usr_002", "usr_alice", ...
}

// Internally computes: range("usr_", "usr]")
// where upper bound is "usr_" with last byte incremented

Test Coverage

The B+ tree implementation is covered by 51 tests across 2 modules, verifying correctness under sequential inserts, reverse inserts, large-scale operations (10,000 keys), and concurrent access.

ModuleTestsCoverage
bptree.rs43Insert, delete, split, merge, range scan, prefix scan, bulk load, edge cases
concurrent_bptree.rs8Concurrent readers, writers, mixed access, closure-based operations

Key Test Scenarios

TestDescription
large_sequential_insert1,000 sequential keys with verified sorted iteration
large_reverse_insert1,000 reverse-order keys triggering repeated splits
bulk_load_large10,000 entries loaded bottom-up in one pass
delete_causes_height_reductionDeletions collapse tree height via cascading merges
leaf_merge_on_deleteDeletion triggers leaf merge when below minimum
leaf_redistribute_on_deleteDeletion triggers redistribution from sibling
concurrent_readers_and_writer8 readers + 1 writer operating on 200 keys simultaneously
concurrent_writers4 writer threads inserting 1,000 keys in non-overlapping ranges
interleaved_insert_delete50 keys with alternating insert/delete operations
prefix_scan_stringsPrefix-based range scan on string keys

Design Decisions

Generic Types

Keys require Ord + Clone, values require Clone. This allows the same B+ tree to back property indexes (PropertyValue keys), unique indexes (single NodeId values), and full-text indexes (term → score vectors).

Page-Ready Layout

Nodes are stored in a contiguous Vec with usize IDs, designed to map directly to a page-based storage manager. Each node corresponds to one fixed-size page on disk.

Leaf Linking

All leaves form a singly-linked list via next_leaf pointers. Range scans follow this chain after a single root-to-leaf traversal, avoiding repeated tree navigation.

No Unsafe Code

The entire implementation uses safe Rust with no unsafe blocks. Memory safety is guaranteed by the compiler, eliminating buffer overflow and use-after-free vulnerabilities common in C/C++ B+ tree implementations.