LRU Page Cache

Technology

Anvil DB uses a Least Recently Used (LRU) page cache as the buffer manager between the storage engine and disk. All page reads and writes pass through the cache, with O(1) lookup, promotion, and eviction backed by a HashMap index and flat-vec doubly-linked list.

Architecture

The cache maps page IDs to fixed-size byte buffers. A doubly-linked list stored in a flat Vec tracks access order: the head is the least-recently-used page (eviction candidate) and the tail is the most-recently-used.

core structures (page_cache.rs)rust
pub struct PageCache {
    page_size: PageSize,             // 4 KB or 8 KB
    capacity: usize,                 // Max pages in cache
    index: HashMap<PageId, usize>,   // O(1) page lookup
    entries: Vec<Entry>,              // Flat-vec doubly-linked list
    lru_head: Option<usize>,         // Least recently used (evict first)
    lru_tail: Option<usize>,         // Most recently used
}

struct Entry {
    page: Page,                      // The cached page data
    prev: Option<usize>,             // Previous in LRU order
    next: Option<usize>,             // Next in LRU order
}

data flowtext
Read request: PageId
    |
    v
HashMap lookup ──── HIT ──── move_to_tail() ──── return &Page
    |
    MISS
    |
    v
Load from disk ──── insert() ──── cache full?
                                    |
                              YES: evict_lru() ──── flush dirty page to disk
                                    |
                              NO: append to tail

Operations

Operation	Complexity	Description
get(id)	O(1)	Lookup page by ID, promote to most-recently-used
get_mut(id)	O(1)	Mutable lookup, promotes to MRU, enables dirty tracking
insert(page)	O(1)	Add page to cache; evicts LRU if at capacity
remove(id)	O(1)	Remove specific page, unlink from LRU list
contains(id)	O(1)	Check if page is cached
dirty_pages()	O(n)	Iterate all dirty pages for checkpoint flushing

LRU Promotion

Every get() and get_mut() call promotes the accessed page to the tail (most-recently-used position) via move_to_tail(), which unlinks the entry from its current position and relinks it at the end. This ensures frequently accessed pages stay in cache while cold pages drift toward the head for eviction.

Eviction

When insert() is called on a full cache, the head entry (least-recently-used) is evicted and returned as an Evicted struct. The caller is responsible for flushing the evicted page to disk if its dirty flag is set.

eviction resultrust
pub struct Evicted {
    pub page: Page,  // Caller must flush if page.is_dirty()
}

// Insert returns the evicted page when cache is full
let evicted: Option<Evicted> = cache.insert(new_page);
if let Some(e) = evicted {
    if e.page.is_dirty() {
        disk.write_page(&e.page)?;  // Flush to disk
    }
}

Page Structure

Each page is a fixed-size byte buffer with an ID, dirty flag, and read/write accessors. Writing to a page automatically sets the dirty flag, which is checked during eviction and checkpoint flushing.

page (page.rs)rust
pub struct Page {
    id: PageId,          // u64 page identifier
    data: Box<[u8]>,     // Heap-allocated byte buffer
    dirty: bool,         // Set on write, cleared on flush
}

// Writing automatically marks dirty
page.write(offset, &data);  // dirty = true
page.clear_dirty();          // After flushing to disk

Setting	Value	Description
PAGE_SIZE_4K	4,096 bytes	Default page size for general use
PAGE_SIZE_8K	8,192 bytes	Larger pages for bulk workloads
Minimum	512 bytes	Must be power of two
Dirty tracking	Boolean flag	Set on write(), cleared on flush

Free Page Management

Deleted pages are recycled via a FreeList that maintains a LIFO stack of reusable page IDs and a high-water mark for sequential allocation. New pages are allocated from the free list first; only when the list is empty does the high-water mark advance.

free list (freelist.rs)rust
pub struct FreeList {
    free: Vec<PageId>,       // Recycled page IDs (LIFO)
    next_page_id: u64,       // High-water mark
}

// Allocation: reuse freed page or bump HWM
let id = freelist.allocate();

// Deallocation: return page for reuse
freelist.free(page_id);

Implementation Details

Flat-Vec Linked List

The LRU list uses a Vec of entries with prev/next indices instead of heap-allocated nodes. This provides cache-friendly memory layout, avoids pointer chasing, and enables O(1) allocation via Vec index.

HashMap Index

A HashMap maps PageId to Vec index for O(1) page lookup. Combined with the O(1) LRU reordering, every cache operation is constant time regardless of cache size.

Dirty Page Tracking

Pages track their dirty state via a boolean flag set automatically on any write() call. The dirty_pages() iterator yields all modified pages for batch flushing during checkpoints.

No Unsafe Code

The entire page cache implementation uses safe Rust. Memory safety for page buffers is enforced by the type system, eliminating buffer overflows and use-after-free vulnerabilities.

Test Coverage

The page cache and related components are covered by 28 tests across 3 modules.

Module	Tests	Coverage
page_cache.rs	10	Insert, get, eviction, LRU promotion, dirty tracking, replace-in-place
page.rs	12	Page allocation, read/write, dirty flag, byte offset calculation, page size validation
freelist.rs	6	Sequential allocation, free/reuse LIFO, high-water mark, free page listing

Key Test Scenarios

Test	Description
evicts_lru_when_full	Inserting into a full cache evicts the least-recently-used page
access_promotes_to_mru	get() promotes a page to most-recently-used, preventing eviction
get_mut_promotes	Mutable access also promotes, ensuring hot pages stay cached
insert_existing_replaces_in_place	Re-inserting same PageId replaces without eviction
dirty_pages_iterator	dirty_pages() correctly filters to only modified pages
evicted_dirty_page_is_reported	Evicted page preserves dirty flag for caller to flush
free_and_reuse	Freed pages are reused in LIFO order before allocating new ones