MVCC Transactions

Technology

Multiversion Concurrency Control

Anvil DB uses MVCC (Multiversion Concurrency Control) to allow concurrent readers and writers without blocking. Each write creates a new version stamped with the transaction's commit timestamp; readers see a consistent snapshot at their start time. The system supports three isolation levels, record-level locking, deadlock detection, and savepoints for partial rollback.

Version Chains

Each record maintains an ordered chain of versions keyed by timestamp. Readers traverse the chain to find the latest version visible at their snapshot timestamp. Deleted records are represented as tombstone versions (data = None).

version chain (mvcc.rs)rust
pub struct VersionChain<V> {
    versions: BTreeMap<Timestamp, Version<V>>,
}

struct Version<V> {
    timestamp: Timestamp,   // Commit timestamp
    data: Option<V>,        // None = tombstone (deleted)
}

// Read: find latest version <= read_timestamp
chain.read(read_timestamp) -> Option<&V>

// Conflict detection: any version since timestamp?
chain.has_version_since(timestamp) -> bool

// GC: remove old versions, keeping one before cutoff
chain.gc(oldest_active_timestamp)

version visibilitytext
Record "Alice" version chain:
  T=10: {name: "Alice", age: 30}    ← written by tx1
  T=25: {name: "Alice", age: 31}    ← written by tx5
  T=40: None (tombstone)             ← deleted by tx8

Reader at T=20 sees: {name: "Alice", age: 30}  (T=10)
Reader at T=30 sees: {name: "Alice", age: 31}  (T=25)
Reader at T=50 sees: (deleted)                   (T=40)

MVCC Store

The MvccStore maps keys to version chains and provides snapshot-consistent scan and garbage collection.

store (mvcc.rs)rust
pub struct MvccStore<K, V> {
    records: BTreeMap<K, VersionChain<V>>,
}

// Snapshot scan: iterate visible keys at timestamp
store.scan(read_timestamp) -> Vec<(&K, &V)>

// Write-write conflict detection
chain.has_version_since(my_start_timestamp)

// GC + tombstone cleanup
store.gc_with_tombstone_cleanup(oldest_active_timestamp)

Isolation Levels

Level	Snapshot Behavior	Validation
Read Committed	Timestamp advances per read — sees latest committed data at each statement	None
Repeatable Read	Fixed snapshot from transaction start — consistent view throughout	None
Serializable	Fixed snapshot + read/write set tracking — validates no conflicts at commit	Checks if any key in read_set was written after start_timestamp

snapshot (isolation.rs)rust
pub struct Snapshot {
    read_timestamp: Timestamp,
    isolation: IsolationLevel,
    read_set: HashSet<Vec<u8>>,   // Serializable only
    write_set: HashSet<Vec<u8>>,  // Serializable only
}

// Read Committed: advance timestamp per read
snapshot.advance_timestamp(new_timestamp);

// Serializable: validate at commit time
snapshot.validate(store, start_timestamp)
  -> Result<(), ValidationError>  // Fails if read_set was modified

Record-Level Locking

Writes acquire record-level locks via a lock table. Shared locks allow concurrent readers; exclusive locks block all other access. Lock upgrade from shared to exclusive is supported when the transaction is the sole holder.

lock table (lock_table.rs)rust
pub enum LockMode { Shared, Exclusive }

pub enum EntityType { Node, Relationship, Property, Label }

pub struct LockKey {
    entity_type: EntityType,
    id: u64,
}

pub struct LockTable {
    locks: HashMap<LockKey, LockEntry>,
    lock_timeout: Duration,  // Default 30 seconds
}

Compatibility	Shared Held	Exclusive Held
Request Shared	Granted	Blocked
Request Exclusive	Blocked (unless sole holder → upgrade)	Blocked

Lock acquisition supports configurable timeouts with exponential backoff. If a lock cannot be acquired within the timeout, the transaction receives a LockError::Timeout.

Deadlock Detection

A wait-for graph tracks which transactions are waiting on which other transactions. Cycle detection via depth-first search identifies deadlocks. The youngest transaction in the cycle (highest TxId) is chosen as the victim and aborted.

wait-for graph (deadlock.rs)rust
pub struct WaitForGraph {
    edges: HashMap<TxId, HashSet<TxId>>,  // waiter → holders
}

// Proactive: check before waiting
graph.would_deadlock(waiter, holder) -> Option<Vec<TxId>>

// Reactive: find all existing cycles
graph.detect_deadlocks() -> Vec<Vec<TxId>>

// Victim selection: abort youngest (highest TxId)
graph.choose_victim(&cycle) -> TxId

deadlock exampletext
tx1 waits on tx2 (tx2 holds lock on node 5)
tx2 waits on tx3 (tx3 holds lock on node 10)
tx3 waits on tx1 (tx1 holds lock on node 15)

Wait-for graph: tx1 → tx2 → tx3 → tx1 (CYCLE!)
Victim: tx3 (highest TxId = youngest)
Result: tx3 aborted, tx1 and tx2 proceed

Savepoints

Savepoints enable partial rollback within a transaction. An undo log records every mutation with before-images. Rolling back to a savepoint reverses operations in reverse order and removes nested savepoints created after it.

savepoint manager (savepoint.rs)rust
pub enum UndoEntry {
    Insert { key: Vec<u8> },                        // Undo: delete
    Update { key: Vec<u8>, before_image: Vec<u8> }, // Undo: restore
    Delete { key: Vec<u8>, before_image: Vec<u8> }, // Undo: re-insert
}

pub struct SavepointManager {
    undo_log: Vec<UndoEntry>,
    savepoints: HashMap<SavepointId, usize>,  // ID → log position
    savepoint_stack: Vec<SavepointId>,         // Creation order
}

// Create savepoint at current undo log position
manager.create_savepoint() -> SavepointId

// Rollback: returns undo entries in reverse order
manager.rollback_to(savepoint_id) -> Vec<UndoEntry>

// Full rollback of entire transaction
manager.rollback_all() -> Vec<UndoEntry>

Transaction Lifecycle

transaction manager (manager.rs)rust
pub struct TransactionManager {
    next_tx_id: AtomicU64,       // Atomic ID counter
    next_timestamp: AtomicU64,   // Atomic timestamp counter
    active: HashMap<TxId, Transaction>,
    completed: HashMap<TxId, Transaction>,
}

// Lifecycle
let tx_id = manager.begin();           // Active
let ts = manager.commit(tx_id)?;       // Active → Committed
manager.rollback(tx_id)?;               // Active → Aborted

// GC: oldest active timestamp for version cleanup
manager.oldest_active_timestamp() -> Option<Timestamp>

State	Transition	Effect
Active	begin() / begin_read_only()	Allocates TxId and start timestamp
Committed	commit(tx_id)	Assigns commit timestamp, moves to completed
Aborted	rollback(tx_id)	Moves to completed, locks released

Test Coverage

The MVCC subsystem is covered by 98 tests across 7 modules.

Module	Tests	Coverage
mvcc.rs	19	Version chains, snapshot reads, conflict detection, GC, tombstone cleanup
lock_table.rs	20	Shared/exclusive locks, upgrade, timeout, release, multi-key independence
deadlock.rs	13	Wait-for graph, cycle detection (DFS), 3-way deadlocks, victim selection
savepoint.rs	12	Create, rollback, nested savepoints, release, undo entry types
isolation.rs	11	Read committed, repeatable read, serializable validation, conflict detection
manager.rs	13	Begin, commit, rollback, sequential IDs, oldest active timestamp, purge
transaction.rs	10	State transitions, double-commit rejection, read-only flag

Design

Non-Blocking Reads

Readers never block writers and writers never block readers. Each reader sees a consistent snapshot via version chain traversal, while writers create new versions without modifying existing ones.

Write-Write Conflict Detection

Before committing, a transaction checks if any record it wrote was also written by another transaction after its start. This prevents lost updates without requiring read locks.

Garbage Collection

Old versions are cleaned up when no active transaction can see them. The oldest active timestamp serves as the GC watermark. Tombstones are removed when no transaction can see the deleted record.

Youngest Victim

Deadlock resolution always aborts the youngest transaction (highest TxId), minimizing wasted work since younger transactions have done less processing.