What Actually Happens When You `INSERT` a Row?

One innocent line of code. Seven layers cooperating in milliseconds. Let's pull back the curtain.

One line of code. A lot happening underneath.

You write this. It looks harmless:

CODE

db.Insert("Users", row);

WHAT IT LOOKS LIKE

"Hey database, stick this row into the Users table. Thanks."

And that's all you said. Easy. Boring, even.

Under the hood, it's anything but boring. Picture a database engine as a busy hospital ER. Your row is a new patient. It has to pass through triage, get logged in the official intake book before anyone touches anything permanent, and only then get filed into the ward roster. If the power flickers mid-admission, the intake book tells us exactly who was officially checked in.

?

Why you should care

When you tell an AI "add a column" or "make this query faster," the AI needs to know which layer of the database your change belongs in. If you can name the layers, you can steer the AI — instead of getting generic, half-right code.

The 5 layers, at a glance

A database isn't one thing. It's a stack of specialized layers, each solving one narrow problem. Meet the cast:

API

The Public API

The front door. db.Insert(...), db.CreateTable(...) — the handful of methods you actually call as a developer.

TBL

Table

Knows the shape of your data (the schema) and enforces rules like "no duplicate IDs" and "this column can't be empty."

TX

Transaction + WAL

The intake log. A WAL writes every change to disk before the real data is modified — so a crash can never lose a committed write.

IDX

B+ Tree Index

A fancy sorted filing cabinet. The B+ tree is what lets you find row #4,829,117 in a few milliseconds instead of reading millions of rows one by one.

PG

Page-based Storage

The actual bytes on disk, organized into fixed-size blocks called pages. This is where your data physically lives.

!

Separation of concerns

Each layer has one job. Durability is the WAL's job. Sort order is the index's job. Disk I/O is the storage engine's job. Confusing these is the #1 bug in amateur database code — and the reason you should care which layer you're touching.

Meet the public API

This is the entire conversation you have with the database from your application code. Everything else in this course is what happens behind these lines.

CODE

using var db = new Database("mydata.mde", cacheSize: 100, useMemoryMappedFile: false);

var columns = new List<ColumnDefinition>
{
    new ColumnDefinition("Id", DataType.Int, false),
    new ColumnDefinition("Name", DataType.String),
    new ColumnDefinition("Age", DataType.Int)
};

var table = db.CreateTable("Users", columns, primaryKeyColumn: "Id");

var row = new DataRow(table.Schema);
row["Id"] = 1;
row["Name"] = "Alice";
row["Age"] = 30;

db.Insert("Users", row);

PLAIN ENGLISH

Open (or create) a database file called mydata.mde. Cache 100 pages in memory for speed. using makes sure the file closes cleanly when we're done.

Sketch out what a "User" looks like: three fields.

Id is an integer, and the false means it can't be empty.

Name is text. Age is an integer. Simple stuff.

Create the Users table with Id as its primary key — the unique identifier for each row.

Make an empty row that fits the table's shape.

Fill it in: Alice, ID 1, age 30.

Hand it to the database. This is the one innocent line we started with.

Notice there are really only four verbs you need to know at this level: open a database, define columns, make a row, insert it. Everything else this course teaches is what the database does for you behind those four verbs.

The journey of one row

Click Next Step to watch your row travel through the layers. Each stop has exactly one job.

You

Your Code

DB

Database

TBL

Table

WAL

B+T

B+ Tree

Click "Next Step" to begin your row's journey.

Prefer to read it at your own pace? Here are the same seven steps as cards:

1

Your code calls db.Insert

A API call from your application, passing the table name and a row object.

2

Database finds the Table

The Database object keeps a dictionary of tables by name. It hands control to the right Table instance.

3

Table validates the row

Does every column match its expected type? Are non-null fields actually present? Is the primary key unique?

4

Row is serialized to bytes

Serializing turns the row object into a compact byte array — the only form disks and logs actually understand.

5

Bytes appended to the WAL

This is the durability point. Once the WAL entry is on disk, the write is officially survivable — even if the power dies right now, recovery can replay it.

6

Bytes inserted into the B+ Tree

The index places the new entry in sorted order by primary key, so future lookups by ID are fast.

7

Success returns to you

Control flows back up the stack. Your single line of code finishes. Alice is now safely in the database.

Zooming in: the real `Table.Insert` code

Here's the actual C# method that orchestrates steps 3 through 6. This is roughly 20 lines of C# from MiniDatabaseEngine/Table.cs. Don't be intimidated — we'll translate each line.

CODE

public void Insert(DataRow row, Transaction.Transaction? transaction = null)
{
    _lock.EnterWriteLock();
    try
    {
        var key = GetPrimaryKey(row);
        ValidateRowForWrite(row);

        var keyExists = _index.Search(key) != null;
        var keyPendingInTransaction = transaction?.HasBufferedValueForKey(_schema.TableName, key, GetPrimaryKeyDataType()) ?? false;
        if (keyExists || keyPendingInTransaction)
            throw new InvalidOperationException($"Duplicate primary key value '{key}'.");

        var serialized = SerializeRow(row);

        if (transaction != null)
        {
            transaction.LogInsert(_schema.TableName, key, serialized);
            _nextRowId++;
            return;
        }

        _index.Insert(key, serialized);
        _nextRowId++;
    }
    finally
    {
        _lock.ExitWriteLock();
    }
}

PLAIN ENGLISH

Public method Insert takes a row, and optionally a transaction (if you're batching multiple writes together).

Grab an exclusive write lock. No other thread can modify this table until we're done.

try starts a block that guarantees cleanup happens even if something fails.

Pull the primary key value out of the row (e.g., 1 for Alice).

Validate: does every field match the schema? Missing required fields? Wrong types? Fail loudly if so.

Is this key already in the index? Search returns a value if yes, null if not.

Is this key pending in an open transaction that hasn't committed yet? The ?. safely skips if there's no transaction. The ?? false defaults to "no" when the answer is missing.

Either way — already committed OR pending — reject as a duplicate.

Throw a clear error with the offending key value. No silent failures.

Turn the row object into a byte array, ready to be stored.

If we're inside a transaction, don't touch the real index yet…

…instead, park the change in the transaction's WAL buffer and bump the row ID counter. We'll apply it on commit.

Otherwise — no transaction — write it straight into the B+ Tree index now.

finally always runs, success or failure.

Release the write lock. Other threads can now proceed.

Notice how many layers this one method coordinates: lock manager, schema validator, index, serializer, transaction buffer. That orchestration is the whole reason this course exists.

Test your intuition

No memorization. Each question is a real scenario you might hit when debugging or talking to an AI about a database. Think it through, then pick.

Users report duplicate-key errors even after rolling back a transaction. Which layer's check is most likely firing?

You want to add compression for stored rows — smaller files, faster disk reads. Which layer should own the compression logic?

A write "succeeded" (your code got a success return), but the row was gone after a server crash. What's the most likely cause?

Put these four events in the order they happen during an insert:
(a) bytes persisted in a page on disk (b) row serialized to bytes (c) primary key extracted (d) WAL entry appended

Up next: Module 2 — Pages & Extents, the storage floor plan. We'll zoom all the way down to the bottom of the stack and see how the database physically lays out bytes on disk. Once you know the floor plan, the rest of the building makes sense.

02

Pages & Extents

The storage floor plan — how bytes on disk become something a database can find.

Disks don't care about rows

Imagine a giant self-storage warehouse. Every single unit is exactly the same size. That's how a disk sees your database — not as tables and rows, but as a long row of identical storage units called pages.

The storage engine's whole job is translation: take your messy, variable-sized rows and cram them into these rigid, fixed-size units so the disk can move them around efficiently.

P

Page — 4096 bytes

The atom of storage. Everything on disk is a page.

E

Extent — 8 pages = 32 KB

A neighborhood of pages. Fetched as a group for locality.

●

Dirty flag

One bit that says "this page has changes that haven't hit disk yet."

L

LRU cache

Keeps hot pages in RAM. Evicts the one nobody touched in the longest time.

💡

Why 4KB?

Most SSDs and OS page caches work in 4KB blocks. Matching the page size to the OS block size means one database page = one disk I/O — no wasted reads.

Anatomy of a page

A page is absurdly simple: an ID, a fixed-size array of raw bytes, and a flag. That's it. The ID is what makes the math work — it tells the engine exactly where on disk this unit lives.

CODE

public class Page
{
#if DATA_PAGE_SIZE
    public const int PageSize = #DATA_PAGE_SIZE#; 
#else
    public const int PageSize = 4096; // 4KB pages
#endif
    public int PageId { get; set; }
    public byte[] Data { get; set; }
    public bool IsDirty { get; set; }
    
    public Page(int pageId)
    {
        PageId = pageId;
        Data = new byte[PageSize];
        IsDirty = false;
    }

PLAIN ENGLISH

Every page is exactly 4096 bytes — no exceptions. This matches the size the disk itself prefers to move around.

Each page has an ID number. That ID is secretly a map coordinate — it tells us where the page lives on disk.

The Data array is the actual 4096-byte storage unit. Whatever the database wants to remember goes here.

IsDirty starts as false — the page matches what's on disk exactly.

The constructor just hands you a fresh, empty page with the ID you asked for — a blank storage unit, freshly swept.

Because every page is the same size, finding page number N on disk is a single multiplication — no searching, no index lookup, just arithmetic.

The file, viewed as a ribbon of pages

P0

P1

P2

P3

P4

P5

P6

P7

0

4096

8192

12288

16384

20480

24576

28672

file offset = pageId × 4096

Want page 17? Skip 17 × 4096 = 69,632 bytes into the file. Done.

Extents — bundling the moving truck

The warehouse moving truck is expensive to dispatch. Running 8 separate trips for 8 adjacent units is wasteful — so we send it once and have it pick up the whole block at a time. That block is called an extent.

8 pages sitting next to each other. One trip. That's the whole idea.

CODE

public class Extent
{
    public const int PagesPerExtent = 8;
    
    public int ExtentId { get; set; }
    public Page[] Pages { get; set; }
    public bool IsDirty => Pages.Any(p => p.IsDirty);
    
    public Extent(int extentId)
    {
        ExtentId = extentId;
        Pages = new Page[PagesPerExtent];
        
        for (int i = 0; i < PagesPerExtent; i++)
        {
            int pageId = extentId * PagesPerExtent + i;
            Pages[i] = new Page(pageId);
        }
    }

PLAIN ENGLISH

Every extent holds exactly 8 pages. Always. No more, no less.

The extent gets its own ID and a little shelf with 8 slots for pages.

"IsDirty" for an extent just asks: does any page in here need saving?

The constructor fills each of the 8 slots with a fresh page.

Here's the trick: extentId × 8 + i gives each page its global ID. So page 17 lives in extent 2, slot 1 (since 2 × 8 + 1 = 17). No lookup table — just multiplication.

⚡

Dirty pages are lazy

A page can be modified in memory dozens of times and only written to disk once, when something asks for a flush. That's why databases are fast — disk writes are batched.

Try it yourself: which extent?

Remember the formula: extentId = pageId ÷ 8. Drag each page to the extent it belongs in. The math is the whole game.

Page 0

Page 7

Page 8

Page 15

Page 23

Extent 0 — pages 0 through 7

Drop here

Extent 1 — pages 8 through 15

Drop here

Extent 2 — pages 16 through 23

Drop here

The front-desk clipboard — LRU cache

Walking to the warehouse for every lookup is exhausting. So the front desk keeps a clipboard of the units opened most recently — that's the cache. When it runs out of space, it evicts whoever was touched the longest time ago. That rule has a name: LRU (Least Recently Used).

CODE

public void Put(int pageId, Page page)
{
    lock (_lockObject)
    {
        if (_cache.TryGetValue(pageId, out var node))
        {
            node.Page = page;
            MoveToHead(node);
            return;
        }

        var newNode = new CacheNode(page);
        _cache[pageId] = newNode;
        AddToHead(newNode);

        if (_cache.Count > _capacity)
        {
            var removed = RemoveTail();
            if (removed != null)
                _cache.TryRemove(removed.Page.PageId, out _);
        }
    }
}

PLAIN ENGLISH

Lock the clipboard first — two people writing on it at once would be chaos.

If this page is already on the clipboard, just refresh it and move its entry to the top (most recent).

Otherwise, write a brand-new entry at the top of the clipboard.

If the clipboard is overflowing past its capacity...

...pluck off the bottom entry — that's the one nobody has touched in the longest time. Forget it exists.

The "top" and "bottom" of the clipboard are tracked by a doubly-linked list, which is why moving something to the top is almost free. An even fancier variant — a memory-mapped file — lets the OS handle caching for you, but that's a topic for another day.

Put it to work

Three questions to stretch the ideas. No trick questions — just situations you might hit in real conversations with an AI coding assistant.

If the cache holds 100 pages at 4 KB each, how much RAM does it use?

You've written 10,000 rows but haven't called `Flush()`. The power cuts out. What survives?

Why does the cache evict the least recently used page, rather than the least frequently used one?

Next up: module 3, The B+ Tree. Now that bytes have a place to live, how do we find one specific row among millions?

03

The B+ Tree — Finding a Needle Fast

How databases turn "search 10 million rows" into about 23 comparisons.

The phone book with index tabs

Scanning 10 million rows to find user #47,392,185 would take forever. A B+ tree turns that search into about 23 comparisons — like binary-searching a physical phone book, one flip at a time.

Picture a phone book with thick index tabs glued to the edge of the pages. The tabs don't contain phone numbers — they just say "flip here for S" or "flip here for T." The actual phone numbers live on the alphabetized pages at the end, and those pages are stapled in order — once you land on "Smith" you can just keep reading forward to "Thompson, Turner, Vance."

That's a B+ tree. The tabs are internal nodes. The pages are leaf nodes. And the staples are what make range queries fast.

[ 30 | 60 ]

internal node (the tabs)

/ | \

[5, 15, 25]

<-- prev next -->

[35, 45]

<-- prev next -->

[65, 75, 90]

leaf nodes (the stapled pages) -- linked forward & back

!

Why B+ (not B)?

In a plain B-tree, values live at every level. In a B+ tree, values live only in leaves, and leaves are linked. That makes range scans — WHERE age BETWEEN 20 AND 30 — a straight walk instead of a full traversal.

Two kinds of nodes, two different jobs

Every node holds keys. But leaf nodes also hold the actual values and have Next/Previous pointers to their siblings. Internal nodes hold only keys and pointers to children — they're pure signposts.

CODE

public abstract class BPlusTreeNode
{
    public bool IsLeaf { get; protected set; }
    public BPlusTreeNode? Parent { get; set; }
    public List<object> Keys { get; protected set; }
    
    protected BPlusTreeNode(bool isLeaf)
    {
        IsLeaf = isLeaf;
        Keys = new List<object>();
    }
    
    public abstract int KeyCount { get; }
}

public class BPlusTreeInternalNode : BPlusTreeNode
{
    public List<BPlusTreeNode> Children { get; private set; }
    
    public BPlusTreeInternalNode() : base(false)
    {
        Children = new List<BPlusTreeNode>();
    }
    
    public override int KeyCount => Keys.Count;
}

public class BPlusTreeLeafNode : BPlusTreeNode
{
    public List<object?> Values { get; private set; }
    public BPlusTreeLeafNode? Next { get; set; }
    public BPlusTreeLeafNode? Previous { get; set; }
    
    public BPlusTreeLeafNode() : base(true)
    {
        Values = new List<object?>();
    }
    
    public override int KeyCount => Keys.Count;
}

PLAIN ENGLISH

Start with a shared blueprint: every node, leaf or not, has a list of keys and knows who its parent is.

The KeyCount is abstract, meaning each kind of node fills it in its own way.

Internal nodes add one thing: a list of children. They're the signposts — they know where to send you, not what the answer is.

Leaf nodes are where the real data lives: a Values list that lines up with the keys one-to-one.

Plus Next and Previous pointers to the sibling leaves — this is the "stapled pages" trick. A doubly linked chain of leaves.

Two operations that justify the whole design

A B+ tree earns its keep with two moves: jump straight to one key (Search) or collect every key in a span (Range). Same tree. Very different mechanics.

Search: climb down the tabs to the right page

CODE

public object? Search(object key)
{
    lock (_lockObject)
    {
        var leaf = FindLeafNode(key);
        int index = FindKeyIndex(leaf.Keys, key);
        
        if (index < leaf.Keys.Count && _comparer.Compare(leaf.Keys[index], key) == 0)
        {
            return ((BPlusTreeLeafNode)leaf).Values[index];
        }
        
        return null;
    }
}

PLAIN ENGLISH

Grab the lock so two threads don't read the tree while someone else is rearranging it.

FindLeafNode walks from the root, picking the right child at each signpost until it reaches a leaf.

Inside that leaf, binary search finds where the key would sit among the sorted keys.

Ask the comparer: is the key at that spot actually equal to what we wanted? If yes, return the matching value.

If not, this key isn't in the tree — return null.

Range: find the start, then walk sideways

CODE

public IEnumerable<KeyValuePair<object, object?>> Range(object? minKey, object? maxKey)
{
    lock (_lockObject)
    {
        var results = new List<KeyValuePair<object, object?>>();
        var leaf = minKey != null ? FindLeafNode(minKey) : GetFirstLeaf();
        
        while (leaf != null)
        {
            for (int i = 0; i < leaf.Keys.Count; i++)
            {
                var key = leaf.Keys[i];
                
                if (minKey != null && _comparer.Compare(key, minKey) < 0)
                    continue;
                    
                if (maxKey != null && _comparer.Compare(key, maxKey) > 0)
                    return results;
                    
                results.Add(new KeyValuePair<object, object?>(key, leaf.Values[i]));
            }
            
            leaf = leaf.Next;
        }
        
        return results;
    }
}

PLAIN ENGLISH

Return an IEnumerable of key-value pairs between minKey and maxKey.

Find the leaf where minKey lives (or the leftmost leaf if no lower bound). This is the "flip to S" step.

Now walk forward through the linked leaves — no going back to the root. This is the whole reason the tree is linked.

Skip anything below minKey. If we pass maxKey, we're done — bail out.

Otherwise, collect the pair and keep going.

Move to the next leaf via the Next pointer. That's the staple between pages doing its job.

i

Fanout = how wide each node is.

This tree uses order-4 nodes (3 keys, 4 children). Real databases use order-100+. Higher fanout means a shallower tree, which means fewer disk reads per lookup.

The three operations, at a glance

Every query the database ever runs against an index comes down to one of these three moves.

S

Search — O(log n)

Climb down the tabs to the right page, then binary-search inside the leaf. A million rows = about 20 comparisons.

I

Insert + Split

When a leaf overflows, cut it in half and promote the middle key to the parent. A split makes the tree grow upward, not downward.

R

Range scan

Find the starting leaf, then walk the linked list of stapled leaves sideways until you pass the upper bound.

Build the tree: inserting 10, 20, 30, 40

Let's watch a tree grow from scratch, with order = 3 (so a leaf can hold 2 keys before it has to split). Four inserts, four snapshots.

1

Insert 10

The tree starts as a single leaf. 10 goes in. Nothing else to do.

[ 10 ]

2

Insert 20

Still fits in the same leaf. Keys stay sorted.

[ 10, 20 ]

3

Insert 30 — first split!

The leaf would hold 3 keys (over capacity). Split it: left leaf keeps [10], right leaf gets [20, 30]. We promote 20 to a brand-new internal node that becomes the root.

[ 20 ]

/ \

[ 10 ]

next-->

[ 20, 30 ]

4

Insert 40

Walk the root: 40 >= 20, so go right. The right leaf already has [20, 30] and now needs 40 — another split! Left stays [20], right becomes [30, 40], and 30 gets promoted up to the root. The root now holds two keys.

[ 20 | 30 ]

/ | \

[ 10 ]

-->

[ 20 ]

-->

[ 30, 40 ]

Here's the insert code that orchestrates all that. It's surprisingly short — the recursion lives inside SplitLeafNode, which can bubble splits all the way up to the root.

CODE

public void Insert(object key, object? value)
{
    lock (_lockObject)
    {
        var leaf = FindLeafNode(key);
        InsertIntoLeaf(leaf, key, value);
        
        if (leaf.KeyCount > _order - 1)
        {
            SplitLeafNode(leaf);
        }
    }
}

PLAIN ENGLISH

Lock so no other thread can modify the tree while we're rearranging it.

Walk down from the root to find the one leaf where this key belongs.

Drop the key and value into that leaf, keeping the list sorted.

If the leaf now holds more keys than allowed, split it — and that split may cascade upward.

Quick check — apply what you've seen

Four short scenarios. There's no score — the point is to stretch the ideas, not memorize them.

Your query is `WHERE user_id BETWEEN 1000 AND 2000`. Which B+ tree operation runs, and why does it stay cheap?

After many inserts your tree has become noticeably taller, and searches feel slower. What happened?

Leaves have both `Next` AND `Previous` pointers, but the current `Range` method only uses `Next`. Why keep `Previous`?

Insert `5, 15, 25, 35` in order into an empty order-3 tree (max 2 keys per leaf). What does the root look like after the last insert?

Coming up next: now that you know the tree exists, Module 4 shows how the LINQ query planner decides when to use it — and when it's cheaper to just read every row.

04

Indexes & the Query Plan

How the database turns your LINQ into a route — and why the same question can be fast or slow.

The GPS route planner inside your database

Ask your phone for directions to the airport. It doesn't try every street — it peeks at your request, notices "highway only," and picks an efficient route. Your database does the same thing with a query.

You write LINQ like .Where(r => r["Id"] == 42). Before a single row is touched, a planner inspects your request and picks a strategy. That strategy is the query plan, and it decides whether your query runs in milliseconds or minutes.

1

LINQ you wrote

.Where(r => (int)r["Id"] == 42)

"Give me the user with Id 42."

2

Access path chosen

Index point lookup

SelectByKey(42)

3

Cost

O(log n)

One tree descent. Lightning fast, even at 10 million rows.

The same planner, three different routes

=

Equality on Id

.Where(r => (int)r["Id"] == 42)

Index point lookup — jumps straight to the row. O(log n).

>

Range on Id

.Where(r => (int)r["Id"] > 100)

Index range scan — finds the start, then walks the leaves. O(log n + k).

?

Filter on Name

.Where(r => (string)r["Name"] == "Alice")

Full table scan + filter. O(n). No index on Name, so every row must be checked.

!

Declarative vs imperative

LINQ describes what you want; the planner figures out how. This is the single biggest reason databases are productive — you don't hand-code loops. You say "give me users where Id = 42" and the planner quietly picks between a one-jump index lookup and a million-row scan.

The three phases of Execute

When you call .ToList() on a LINQ query, the engine runs the Execute method. Think of it as the GPS running three steps: pick a door, filter what comes through, and sort what's left.

CODE

public TResult Execute<TResult>(Expression expression)
{
    var plan = BuildExecutionPlan(expression);
    IEnumerable<DataRow> rows = ExecuteAccessPath(plan);

    foreach (var predicate in plan.Predicates)
    {
        rows = rows.Where(predicate);
    }

    if (plan.OrderByColumn != null)
    {
        if (plan.IsOrderByDescending)
            rows = rows.OrderByDescending(r => r[plan.OrderByColumn]);
        else
            rows = rows.OrderBy(r => r[plan.OrderByColumn]);
    }

    return (TResult)(object)rows;
}

PLAIN ENGLISH

Read the user's request and build a plan — like reading an address before picking a route.

Phase 1: pick a door. Run the chosen access path (index jump, range walk, or full scan) to get a starting stream of rows.

Phase 2: filter what comes through. For any condition the index couldn't handle, check each row in memory.

Phase 3: sort what's left. If the user asked for an order, shuffle the survivors into the right sequence.

Hand back the resulting stream of rows to the caller — lazy, ready to iterate.

Notice the order: access path first, filters second, sort last. Sorting always happens in memory over whatever survived the filters. That's why ORDER BY never speeds up a query — it can only slow it down.

Picking the door: ExecuteAccessPath

This is the heart of the planner. A short predicate-sniffing ladder decides whether your query jumps, walks, or scans.

CODE

private IEnumerable<DataRow> ExecuteAccessPath(QueryExecutionPlan plan)
{
    if (!string.IsNullOrEmpty(_primaryKeyColumn) && plan.IndexRange != null)
    {
        if (plan.IndexRange.ExactKey != null)
        {
            var single = _table.SelectByKey(plan.IndexRange.ExactKey);
            return single != null ? new List<DataRow> { single } : new List<DataRow>();
        }

        if (plan.IndexRange.HasLowerBound || plan.IndexRange.HasUpperBound)
        {
            return _table.SelectByPrimaryKeyRange(plan.IndexRange.LowerBound, plan.IndexRange.UpperBound);
        }
    }

    return _table.SelectAll();
}

PLAIN ENGLISH

Start of the if-ladder. First: do we even have an index on the primary key, and did the visitor spot an index-friendly filter?

Exact key? Best case. One jump into the index and we're done.

Return a list of exactly one row, or an empty list if nothing matched. No scanning, no filtering, no wasted effort.

Range? Second best. The filter is something like Id > 100 or Id BETWEEN 10 AND 20.

Walk the sorted leaves of the index from the lower bound to the upper bound. Still way faster than a full scan.

Neither? Fallback. Read every row in the table and let later phases filter them in memory.

Read it top-down: exact key? one jump. range? leaf-walk. neither? full scan. That's the whole planner — three lines of real decision-making sitting on top of a B+ tree.

Where the plan meets the tree

When the plan picks "exact key," control lands on SelectByKey — the point where the LINQ layer finally calls into the B+ tree you built in Module 3.

CODE

public DataRow? SelectByKey(object key)
{
    _lock.EnterReadLock();
    try
    {
        var serialized = _index.Search(key);
        if (serialized == null)
            return null;
        
        return DeserializeRow((byte[])serialized);
    }
    finally
    {
        _lock.ExitReadLock();
    }
}

PLAIN ENGLISH

Grab a read lock so other readers can come along, but writers must wait their turn.

Ask the B+ tree index to search for this key. One descent from root to leaf.

Nothing found? Return null — the caller turns that into an empty result.

Found! Turn the stored bytes back into a row the caller can use.

Whatever happened, release the read lock before leaving.

Three views of the same query

Watch how what you wrote becomes a plan object becomes a method call on Table. Click each tab:

// What you, the developer, wrote:
var user = db.Query("Users")
    .Where(r => (int)r["Id"] == 42)
    .ToList();

// What BuildExecutionPlan produced:
QueryExecutionPlan {
    Predicates: [],                    // empty — no leftover filter
    IndexRange: {
        ExactKey: 42,                 // index can do this exactly
        HasLowerBound: false,
        HasUpperBound: false
    },
    OrderByColumn: null,
    IsOrderByDescending: false
}

// What actually ran on the Table:
_table.SelectByKey(42);
  ↳ _index.Search(42);         // B+ tree descent: O(log n)
  ↳ DeserializeRow(bytes);     // bytes → DataRow
  ↳ return single row.

// No predicate loop. No sort. Done.

Tab 1: your declarative intent. Nothing executes yet — LINQ is just a description.

This three-step journey — intent → plan → operation — is exactly why SQL and LINQ feel magical. The visitor pattern turns your lambda into a data structure, and that data structure picks the physical operation.

Spot the bug: the query that won't use the index

Here's a query that looks like it should be O(log n). But the developer made one small mistake, and now it full-scans 10 million rows every time. Can you spot it?

Find the line that kills the index:

1 var x = db.Query("Users")

2 .Where(r => ((int)r["Id"]).ToString() == "42")

3 .ToList();

The planner only recognizes patterns shaped like r["Id"] == constant. The moment you wrap the column in a function call — .ToString(), Math.Abs(), Substring() — the visitor's pattern-match fails and the query falls back to a full scan. This is the same footgun as writing WHERE TO_CHAR(id) = '42' in SQL. Keep the column naked on one side of the comparison.

!

There are no secondary indexes in this engine

Only the primary key has an index. Any filter on another column — .Where(r => r["Name"] == "Alice"), .Where(r => r["Email"].Contains("@")) — will always full-scan. Knowing this up front stops you from blaming AI, phantom slowdowns, or random gremlins when a query gets slow. If it's not the primary key, it's O(n).

Check your route-planning instincts

Four scenarios. No memorization — just apply what you've seen.

1. Given `.Where(r => (int)r["Age"] > 25).Where(r => (int)r["Id"] == 7)` — which Where helps the index?

2. A user complains their query got slower after adding `ORDER BY`. What happened?

3. You want to add a secondary index on `Email`. Which class needs the biggest change?

4. Given `.Where(r => (int)r["Id"] >= 10 && (int)r["Id"] < 20)`, describe the plan.

Next up: reads were easy — they just have to be fast. Writes are harder. They have to survive crashes, power cuts, and half-finished transactions. Module 5 opens the durability hood.

05

Transactions, WAL and Crash Recovery

How databases survive a power cut without losing your money.

The airline check-in desk

You hand the agent two tickets and a suitcase. Before anything gets stamped on your boarding passes, the agent writes every item into a thick official logbook, in order: "party of two, bag tagged, seats assigned." Then they print the passes. If the power cuts mid-check-in, the next agent in the morning reads the logbook and finishes only the check-ins marked "complete." Anything without a complete stamp is thrown out.

That is how airlines avoid losing your luggage — and it is exactly how databases avoid losing your money. In this module we meet the logbook: the WAL, and the rules that make a transaction either fully happen or fully vanish.

Imagine a bank transfer. You debit account A, then credit account B. What if the lights go out between those two steps? Without a transaction guarantee, account A is poorer and account B is no richer. Money just evaporates. The rules below exist so that never happens.

A

Atomicity

All or nothing. Never half-done.

C

Consistency

Invariants (like "total balance does not change") hold before and after.

I

Isolation

Concurrent transactions look like they ran one at a time.

D

Durability

Committed means committed, even if the server explodes.

Together these four letters spell ACID — the checklist every serious database must pass.

Log first, apply after

Here is the one rule that powers everything in this module. When a transaction wants to change data, the database does not touch the real data first. It writes the intended change into the WAL on disk, forces that log to be durable, and only then applies the change. Log first, apply after.

Flip that order and you get corruption. Apply first and hope to log later? If the crash lands between those two steps, the change is on disk but the log does not know, so recovery will never replay or undo it. The whole system is built to avoid that gap.

Watch a real transaction trickle through the engine. Four characters: User (the client), Transaction (the bookkeeper), WAL (the logbook on disk), and BPlusTree (where the rows actually live).

U

User

begin

T

Transaction

BEGIN txn 42

U

User

insert row Alice

T

Transaction

append INSERT entry for txn 42

U

User

commit

T

Transaction

append COMMIT marker, then FLUSH to disk

W

WAL

durable on disk — safe

T

Transaction

now apply the buffered INSERT to the index

T

Transaction

committed

!

Log first, apply after is ACID's beating heart.

If the commit record is on disk, recovery will replay the change. If not, it never happened. There is no third state.

Inside a logbook entry

Every row in the logbook is a WALEntry. One entry per operation: "transaction 42 inserted Alice," "transaction 42 committed," and so on. The fields are chosen so recovery has everything it needs — the old value to undo, the new value to redo, and a sequence number so we know the exact order things happened.

CODE

public class WALEntry
{
    private const int MaxCheckpointActiveTransactionCount = 100_000;
    private const int MaxStringByteLength = 64 * 1024;
    private const int MaxValueLength = 1024 * 1024;

    public long TransactionId { get; set; }
    public WALOperationType OperationType { get; set; }
    public string TableName { get; set; } = string.Empty;
    public object? Key { get; set; }
    public byte[]? OldValue { get; set; }
    public byte[]? NewValue { get; set; }
    public long Timestamp { get; set; }
    public long SequenceNumber { get; set; }
    public List<long> CheckpointActiveTransactionIds { get; set; } = new();
    public long CheckpointNextTransactionId { get; set; }
}

PLAIN ENGLISH

One entry in the logbook. Every write the database wants to do shows up here first.

Safety caps so a single entry can never bloat the log and break recovery.

Who did it — which transaction this entry belongs to.

What kind of event: Begin, Insert, Update, Delete, Commit, Rollback, or Checkpoint.

Which table and which row (the key) the change affects.

OldValue + NewValue together are the magic: old lets us undo, new lets us redo.

A clock stamp and a SequenceNumber giving every entry a total order — crucial for replay.

Bookkeeping for checkpoints — the subject of a later screen.

The moment of commit

A commit is not a switch. It is a three-step dance: write the commit marker, flush the WAL to disk, then apply. The line between "might lose this" and "definitely safe" is a single call to Flush().

CODE

public void Commit()
{
    _lock.EnterWriteLock();
    try
    {
        if (_state != TransactionState.Active)
            throw new InvalidOperationException($"Cannot commit transaction in state: {_state}");

        // Log the commit
        _walManager.AppendEntry(new WALEntry
        {
            TransactionId = _transactionId,
            OperationType = WALOperationType.Commit
        });

        // Force flush to ensure durability
        _walManager.Flush();

        // Apply buffered writes only after WAL commit is durable.
        _commitApplyCallback(_entries);

        _state = TransactionState.Committed;
        _transactionManager.CompleteTransaction(_transactionId);
    }
    finally
    {
        _lock.ExitWriteLock();
    }
}

PLAIN ENGLISH

Someone called commit(). Grab the write lock so nobody else meddles while we finalize.

Sanity check: you can only commit a transaction that is still Active — no double-committing.

Step 1 — log. Append a Commit marker to the WAL: "txn 42 is done."

Step 2 — flush. This is the ACID-defining moment. Flush() forces the log to physical disk. If we crash after this line, recovery will replay the transaction.

Step 3 — apply. Only now do we push the buffered inserts and updates into the real B+Tree.

Mark the transaction Committed and tell the manager we are done. Release the lock.

!

Deferred writes.

This engine does not apply changes to the B+Tree until after commit — every Insert, Update, and Delete is buffered inside the transaction. That makes rollback almost free: there is nothing to undo. The tradeoff: other transactions cannot see your in-flight writes.

Say it once more: log first, apply after.

Morning after the crash

A crash happens. Half-written transactions, maybe a commit marker that made it to disk, maybe one that didn't. The next time the engine boots, it runs RecoverFromWAL: read the whole logbook, bucket entries by transaction, replay only the buckets with a Commit marker, throw the rest away.

L

Disk / WAL file

R

Recovery Reader

B

Transaction Buckets

T

B+Tree

Click "Next Step" to walk through recovery

Here is the recovery code, stripped of its error handling. Notice how simple the rule is: if there is a Commit entry, replay; otherwise, ignore.

CODE

public void RecoverFromWAL(Action<WALEntry> applyEntry)
{
    _lock.EnterWriteLock();
    try
    {
        var entries = _walManager.ReadEntriesForRecovery();
        var transactions = new Dictionary<long, List<WALEntry>>();
        var committedTransactions = new HashSet<long>();

        foreach (var entry in entries)
        {
            if (entry.OperationType == WALOperationType.Checkpoint) continue;

            if (entry.OperationType == WALOperationType.Commit)
                committedTransactions.Add(entry.TransactionId);
            else if (entry.OperationType == WALOperationType.Rollback)
                continue;
            else if (entry.OperationType != WALOperationType.BeginTransaction)
            {
                if (!transactions.ContainsKey(entry.TransactionId))
                    transactions[entry.TransactionId] = new List<WALEntry>();
                transactions[entry.TransactionId].Add(entry);
            }
        }

        foreach (var txnId in committedTransactions)
        {
            if (transactions.TryGetValue(txnId, out var txnEntries))
            {
                foreach (var entry in txnEntries)
                    applyEntry(entry);
            }
        }
    }
    finally { _lock.ExitWriteLock(); }
}

PLAIN ENGLISH

Take exclusive control of the log — nobody else is allowed to write during recovery.

Read every entry the logbook has to offer.

Prepare two containers: one bucket per transaction, plus a set of transaction ids that successfully committed.

Walk through the log. Skip Checkpoint entries (they are hints, not changes).

If we see a Commit, mark that transaction as "yes, finish this one."

Rollback entries and BeginTransaction entries have no data to replay — skip them.

For every actual change (Insert, Update, Delete), drop it into its transaction bucket.

Now the payoff. Loop over only the committed transaction ids, and replay every entry in those buckets onto the index.

Everything else — half-finished transactions without a commit — is silently thrown away. That is atomicity in action.

The mantra again: log first, apply after. Recovery only trusts what was logged.

Keeping the logbook lean

If the WAL grew forever, recovery would take forever too. So the engine periodically takes a checkpoint: flush all dirty in-memory pages to disk, write a special Checkpoint WAL entry, and truncate the oldest part of the log. The logbook starts fresh from a known-good moment.

1

Flush all dirty pages

Every page that has changes in memory gets written out to disk, via fsync, so the on-disk B+Tree matches what committed transactions say it should.

2

Write a Checkpoint WAL entry

It lists the transactions that were still active at this instant (CheckpointActiveTransactionIds) and the next id the engine will hand out.

3

Truncate old WAL

Everything before the checkpoint is provably already on disk in the B+Tree — safe to forget.

4

Smaller WAL means faster recovery next time

Replay starts at the latest checkpoint instead of the dawn of time. Restart time stays bounded no matter how long the database has been running.

Checkpoints, deferred writes, and the commit dance all rest on the same foundation: log first, apply after.

Think it through

Four scenarios. Pick the answer a seasoned database engineer would give. The explanations are where the real learning happens — read them even when you are right.

Scenario 1 — Crash after commit, before apply

Power cuts after the commit record is written to the WAL but before the index is updated. On restart, is the inserted row there?

Scenario 2 — Crash between two inserts, before commit

A transaction inserts row A, then inserts row B, and the power cuts before commit() is called. What does the database look like after restart?

Scenario 3 — A stress test loses committed data

Your colleague says: "After a kill -9 mid-test, we are missing rows that the client got a committed response for." Where do you look first?

Scenario 4 — Why is rollback free?

A junior engineer asks: "For rollback, where is the code that undoes the B+Tree changes?" What do you tell them?

Next module: Concurrency. Multiple transactions at once — how does the engine keep them from stepping on each other?

06

Concurrency: Many Hands, One Ledger

How the database lets dozens of people read and write at the same time without anyone crashing into anyone else.

The Subway Revolving Door

Picture a revolving door at a subway station. Many people can walk through at the same time, as long as they are all going the same direction. But if one person stops in the middle to tie their shoelaces, everyone else must wait until they are done.

That is exactly how a database handles concurrency. Many readers can flow through together. A single writer gets the door to themselves. The trick is making sure no one ever crashes into anyone else.

🎟️

Why this matters

Two users click "Buy" on the last ticket at the exact same millisecond. Who gets it? If the database answers wrong even once, the business loses trust. Concurrency control is the invisible layer that answers it billions of times a day.

The door has three states

A reader-writer lock keeps track of which state the door is in right now.

IDLE

No one is inside

The door sits still. The next person (reader or writer) can walk right in.

READ·N

Any number of readers inside

More readers can still join. Writers must wait outside until every reader has left.

WRITE·1

Exactly one writer inside

The door is locked to everyone else. No readers, no other writers, until the writer leaves.

IDLE

↔

READERS (N)

↔

WRITER (1)

Readers and a writer can never coexist. Only the IDLE state connects them.

Five Doors, Stacked

A real database does not have one door. It has five, nested like Russian dolls. Each door guards a different part of the system, from the whole database down to a tiny cache page.

When an operation needs to pass through more than one door, it must always pick them up in the same order. That single rule prevents an entire class of bugs called deadlock.

🔐

Level 1 · Database lock

Protects schema-level operations, like creating or dropping a whole table.

🔒

Level 2 · Table lock

Per-table read and write coordination. Each table has its own reader-writer lock.

🔑

Level 3 · B+ Tree lock

Guards tree structural changes like node splits and leaf re-linking.

🗝️

Level 4 · StorageEngine lock

Protects page-level disk input and output as blocks are read and written.

📎

Level 5 · PageCache / ExtentCache locks

Guards the in-memory caches that track which pages are hot and which can be evicted.

The rule is simple to say and life-saving to follow: always acquire locks in order from level 1 down to level 5. Never the other way. That contract is written directly into the source code as a comment.

CODE

/// <summary>
/// Thread Safety:
/// This class uses ReaderWriterLockSlim to ensure thread-safe operations.
/// Multiple threads can read concurrently, but write operations are exclusive.
/// 
/// Lock Ordering (to prevent deadlocks when nested locking occurs):
/// When multiple locks need to be acquired, always acquire them in this order:
/// 1. Database._lock (schema-level operations)
/// 2. Table._lock (table-level operations)
/// 3. BPlusTree._lockObject (index operations)
/// 4. StorageEngine._lock (this class - storage operations)
/// 5. PageCache/ExtentCache locks (cache operations)
/// </summary>
public class StorageEngine : IDisposable
{
    private readonly ReaderWriterLockSlim _lock;

PLAIN ENGLISH

This is a comment block at the top of the file explaining the thread-safety rules for anyone reading the code.

It announces that this class uses ReaderWriterLockSlim, the reader-writer door from the previous screen.

Many threads can read at the same time. Writes get the door all to themselves.

Here comes the real contract. When an operation needs to hold more than one lock, they must be picked up in this exact order.

The five levels, from outermost (whole database) to innermost (cache pages).

Picking up locks in this order is a contract. Two threads that both follow it can never deadlock.

The class itself is just a regular C# class, and its private lock field is a ReaderWriterLockSlim.

⚖️

Lock ordering is a design discipline, not a runtime check

The compiler will not stop you from violating it. Only the review discipline will. Every new method that takes two locks is a potential deadlock until proven otherwise.

Reading vs Writing: One Word Changes Everything

In the code, the only difference between a read operation and a write operation is a single word. But that word completely changes how the door behaves.

Notice how both examples wrap their work in try/finally. The lock must be released even if the work explodes halfway through. Forgetting that single line is how a database freezes forever.

WRITE PATH · Table.Insert

public void Insert(DataRow row, Transaction.Transaction? transaction = null)
{
    _lock.EnterWriteLock();
    try
    {
        var key = GetPrimaryKey(row);
        ValidateRowForWrite(row);

        var keyExists = _index.Search(key) != null;
        var keyPendingInTransaction = transaction?.HasBufferedValueForKey(_schema.TableName, key, GetPrimaryKeyDataType()) ?? false;
        if (keyExists || keyPendingInTransaction)
            throw new InvalidOperationException($"Duplicate primary key value '{key}'.");

        var serialized = SerializeRow(row);

        if (transaction != null)
        {
            transaction.LogInsert(_schema.TableName, key, serialized);
            _nextRowId++;
            return;
        }

        _index.Insert(key, serialized);
        _nextRowId++;
    }
    finally
    {
        _lock.ExitWriteLock();
    }
}

PLAIN ENGLISH

Insert a new row into the table. Optionally part of a transaction.

First, grab the write lock. Only one thread in the entire program can be inside this block at a time for this table.

From here on, everything must be protected. The try block begins the protected region.

Work out the primary key of the incoming row, then check that the row looks valid.

Is this key already in the index? Is it already being added by a transaction that has not committed yet?

If either is true, reject the insert. You cannot have two rows with the same primary key.

Turn the row into bytes so it can be stored.

If this is happening inside a transaction, do not write to the index yet. Just remember it in the transaction buffer for later.

Otherwise, write it straight into the index and bump the row counter.

The finally block runs no matter what — success, exception, anything. This is where we release the write lock so others can enter the door.

READ PATH · Table.SelectByKey

public DataRow? SelectByKey(object key)
{
    _lock.EnterReadLock();
    try
    {
        var serialized = _index.Search(key);
        if (serialized == null)
            return null;
        
        return DeserializeRow((byte[])serialized);
    }
    finally
    {
        _lock.ExitReadLock();
    }
}

PLAIN ENGLISH

Look up a single row by its primary key.

Grab the read lock, not the write lock. Many other readers can be here at the same time.

Begin the protected region. As long as we are inside, the table's structure will not change underneath us.

Ask the index for the bytes that match this key.

If nothing was found, return nothing.

Otherwise, turn the bytes back into a row object and hand it to the caller.

Release the read lock. Once the last reader leaves, a waiting writer can finally enter.

EnterWriteLock means "I need the door all to myself." EnterReadLock means "I will share." Picking the wrong one is either a correctness bug or a performance disaster.

🎚️

Coarse vs fine-grained locking

This engine locks whole tables at a time — simple, but means one writer blocks all readers of that table. Real databases use row-level or page-level locks to allow more parallelism, at the cost of much more complex code. That is the tradeoff granularity is all about.

When the Door Jams: A Deadlock Almost Happens

Here is what happens when two threads forget the ordering rule. Thread A picks up Lock A, Thread B picks up Lock B, then each wants the other. Now nobody can move. This is the race condition that every concurrent system has to defend against.

A

Thread-A → Lock-A

I have you.

B

Thread-B → Lock-B

I have you.

A

Thread-A → Lock-B

Now give me you too.

Lb

Lock-B → Thread-A

I am held by B. Please wait.

B

Thread-B → Lock-A

I need you too.

La

Lock-A → Thread-B

I am held by A. Please wait.

!

Narrator

Nobody moves. Deadlock. The program will sit here forever unless someone reboots it.

✓

Narrator

Fix: always acquire locks in the same global order. If both threads had asked for Lock-A first, one would have waited its turn and everything would flow.

Spot the Bug

Here is a method that grabs two of the five locks from our stack. It looks innocent. It even has a try/finally. But somewhere in here, the ordering rule is broken. If another thread is doing the usual 1 → 5 walk, the two can wedge against each other.

Click the line you think is wrong.

Find the line that violates lock ordering:

1 public void Weird()

2 {

3 _cache.Lock.EnterWriteLock(); // level 5

4 try

5 _storage.Lock.EnterWriteLock(); // level 4

6 // ...

7 finally { _cache.Lock.ExitWriteLock(); }

This exact shape of bug is how real production systems hang at 3 a.m. Knowing the lock order of your system is like knowing the emergency exits — rare to need, priceless when you do.

Check Your Understanding

Four scenarios. No memorization — just use what you now know about doors, readers, writers, and lock ordering.

Reads are fast, but writes sometimes stall the whole table for seconds. What is most likely going on?

The app hangs forever when two threads call `Method1` and `Method2` at the same time. What is the first thing to check?

Why does `BPlusTree` use a simple `lock(obj)` instead of `ReaderWriterLockSlim`?

You edit a row inside a transaction. Another thread calls `SelectAll()` before you commit. What does the other thread see?

Next up — Module 7: Observability. Now that you know how it works, how do you see it working at runtime?

07

Observability — Watching the Engine Run

The cockpit instrument panel for a database that cannot be opened mid-flight.

Flying blind is not a strategy

Your database has been running for three days. Was everything fine? Did commits happen? How many? How many rolled back? If you cannot answer those questions in under five seconds, your system is not observable.

A pilot cannot open the hood of a 747 at 35,000 feet. They rely on the cockpit instrument panel — gauges, warning lights, and the voice recorder. A running database is the same. You cannot freeze it mid-transaction to peek inside. You need instruments that report outward while it flies.

✈️

The cockpit analogy

Metrics are the gauges on the dashboard — altitude, fuel, airspeed. Structured logs are the cockpit voice recorder — rich detail about every event, kept for later review. You need both.

One sentence. Two realities.

Here is the same event — a row being inserted — logged two different ways. Both “work.” Only one of them is usable after the fact.

BAD

Console.WriteLine("insert done");

No timestamp.

No table name.

No transaction id.

Cannot be filtered, searched, or shipped anywhere useful.

GOOD

{"timestamp":"2026-04-22T10:12:03Z",
 "level":"Information",
 "event":"row.inserted",
 "table":"Users",
 "txn":42}

Every field is named and machine-readable — it is JSON.

The ISO timestamp sorts correctly across time zones.

The event name is a stable key you can grep for across millions of lines.

Context travels with the event — table, transaction, whatever matters.

⚠️

AI-generated debug lines are not observability

When a code assistant writes Console.WriteLine("did the thing"), that is print-debugging pollution, not instrumentation. Recognizing this difference is one of the biggest quality markers in professional code.

The black box recorder

Every log line in the engine is a DatabaseLogEntry — a small, strongly-typed object that knows how to turn itself into JSON. Think of it as one page torn from the flight recorder.

CODE

public sealed class DatabaseLogEntry
{
    public DateTimeOffset Timestamp { get; init; } = DateTimeOffset.UtcNow;
    public DatabaseLogLevel Level { get; init; }
    public string EventName { get; init; } = string.Empty;
    public string Message { get; init; } = string.Empty;
    public IReadOnlyDictionary<string, object?> Properties { get; init; } = new Dictionary<string, object?>();

    public string ToJson()
    {
        var payload = new Dictionary<string, object?>(StringComparer.Ordinal)
        {
            ["timestamp"] = Timestamp.ToString("O"),
            ["level"] = Level.ToString(),
            ["event"] = EventName,
            ["message"] = Message
        };

        foreach (var pair in Properties)
        {
            payload[pair.Key] = pair.Value;
        }

        return JsonSerializer.Serialize(payload);
    }
}

PLAIN ENGLISH

A log entry is a fixed-shape record — one page of the flight recorder.

It is born with the current UTC timestamp already stamped on it.

A log level says how loud this entry is: Information, Warning, Error.

An event name — like row.inserted — is the stable key you grep for later.

A free-form message for humans, plus a dictionary of extra properties for context (table name, transaction id, row count).

ToJson() flattens everything into one JSON string: the four standard keys first, then every custom property tagged on.

The result is one text line that any log tool on earth can parse.

The dashboard gauges

Logs tell the story. Metrics count the story. Every major event the engine performs bumps a counter by one. Here is the whole metrics gauge cluster in 30 lines.

CODE

internal sealed class DatabaseMetrics
{
    private long _tablesCreated;
    private long _transactionsStarted;
    private long _inserts;
    private long _updates;
    private long _deletes;
    private long _flushes;
    private long _checkpoints;
    private long _backupsCreated;
    private long _integrityChecks;

    public void IncrementTablesCreated() => Interlocked.Increment(ref _tablesCreated);
    public void IncrementTransactionsStarted() => Interlocked.Increment(ref _transactionsStarted);
    public void IncrementInserts() => Interlocked.Increment(ref _inserts);
    public void IncrementUpdates() => Interlocked.Increment(ref _updates);
    public void IncrementDeletes() => Interlocked.Increment(ref _deletes);
    public void IncrementFlushes() => Interlocked.Increment(ref _flushes);
    public void IncrementCheckpoints() => Interlocked.Increment(ref _checkpoints);
    public void IncrementBackupsCreated() => Interlocked.Increment(ref _backupsCreated);
    public void IncrementIntegrityChecks() => Interlocked.Increment(ref _integrityChecks);

    public DatabaseMetricsSnapshot Snapshot()
    {
        return new DatabaseMetricsSnapshot
        {
            TablesCreated = Interlocked.Read(ref _tablesCreated),
            TransactionsStarted = Interlocked.Read(ref _transactionsStarted),
            Inserts = Interlocked.Read(ref _inserts),
            Updates = Interlocked.Read(ref _updates),
            Deletes = Interlocked.Read(ref _deletes),
            Flushes = Interlocked.Read(ref _flushes),
            Checkpoints = Interlocked.Read(ref _checkpoints),
            BackupsCreated = Interlocked.Read(ref _backupsCreated),
            IntegrityChecks = Interlocked.Read(ref _integrityChecks)
        };
    }
}

PLAIN ENGLISH

Nine private numbers — one per gauge on the dashboard.

Each Increment method bumps one gauge by exactly one tick.

Interlocked.Increment is the magic: it is atomic, so two threads can increment at the same nanosecond and nothing gets lost.

Snapshot() takes a consistent reading of every gauge at once — one still photograph of the whole dashboard.

A snapshot is read-only and safe to pass around — callers cannot accidentally corrupt the live counters.

⚡

Interlocked.Increment is lock-free

It uses a CPU instruction (compare-and-swap) that is atomic at the hardware level. That is why you can count millions of events per second without contention — the gauges never slow the engine down.

The gauge cluster

Nine counters. Each one answers a different question about what the engine has been doing.

🗂️

TablesCreated

Total tables ever defined in this database — a rough measure of schema churn.

🎬

TransactionsStarted

Every time BeginTransaction fires, this ticks. Divide by uptime to see load.

➕

Inserts

Rows added across all tables — the purest measure of write volume.

✏️

Updates

Rows modified in place. High update rates can hint at hotspot rows.

🗑️

Deletes

Rows removed. Watch this to catch accidental mass deletions early.

💧

Flushes

Dirty pages pushed to disk. A surging rate means the engine is under write pressure.

🚩

Checkpoints

Count of commit-triggered flushes — rises when the WAL is actively pushing data to disk.

💾

BackupsCreated

How many full backups have been written. A flatline here is a disaster waiting to happen.

🩺

IntegrityChecks

How many times the engine audited itself for corruption. Zero means nobody is watching.

The event vocabulary

Alongside the counters, the engine emits named events — the stable keys you filter on in a log aggregator. These seven cover the whole life of a database.

database.opened The engine attached to a database file — the cockpit lights just came on.

table.created A new table was defined, with its column count in the properties.

transaction.started A new unit of work began — the matching commit or rollback will follow.

row.inserted Data modification — a new row added; carries the table name and txn id.

row.updated Data modification — an existing row rewritten.

database.checkpoint A checkpoint completed — the engine just reached a durable save point.

database.integrity_check A health check ran — properties include how many issues were found.

Three lines and you can see inside

Here is the consumer side. Nothing fancy — register a logger, flip a flag, read a snapshot.

CODE

using var db = new Database(
    "mydata.mde",
    options: new DatabaseOptions
    {
        Logger = new JsonConsoleDatabaseLogger(),
        EnableMetrics = true
    });

var metrics = db.GetMetricsSnapshot();
Console.WriteLine($"Inserts: {metrics.Inserts}, Checkpoints: {metrics.Checkpoints}");

PLAIN ENGLISH

Open the database file — same as always.

Pass in options: hand it a logger that writes JSON to the console.

Flip EnableMetrics = true and the nine counters start ticking.

Later, anywhere in your code, call GetMetricsSnapshot() for a frozen reading.

Print, scrape, or ship to a dashboard. Three lines and the engine is no longer a black box.

One insert, traced through the instrument panel

Follow a single insert from user code out through the two telemetry pipelines — counter and log — and back out as a readable snapshot.

U

User Code

D

Database

M

Metrics Counter

L

Logger

Click Next Step to begin

💡

Metrics tell you *what*; logs tell you *why*.

A dashboard lighting up “commits dropped 80%” is an alert. The matching log entries show which transactions failed and how — that is the root cause. Every mature observability setup runs both rails in parallel.

Does it click?

Four scenarios. Answer from the cockpit, not from memory.

You suspect a query is making storage blow up. Which single metric would you watch live?

Your boss asks “were any transactions rolled back last hour?” Can this engine answer that cleanly?

Why not just use `Console.WriteLine` everywhere for logs?

Two threads both call `IncrementInserts` at the same instant. Can the counter lose an increment?

You finished all seven modules.

Seven dense modules. Page layout, B-tree indexes, the WAL, transactions, crash recovery, concurrency, and now observability. That is the whole spine of a real database engine — read at the level a working engineer reads it.

The goal was never for you to become a database engineer. The goal was to become fluent enough to steer AI and debug production. You now own the vocabulary — pages, WAL, commits, locks, counters, structured logs — that makes those conversations specific instead of vague. That is a huge deal.

Where to go next

📖

Read the real databases

SQLite’s source is famously readable — start there. Postgres’s WAL code is legendary. You have the concepts now; the code will feel familiar, not foreign.

🛠️

Instrument your own code

Pick any project you own. Add one counter and one structured log line per boundary — API to DB, DB to disk. That single habit separates amateur code from professional code.

🤖

Question every AI suggestion

You can now push back with specifics. “Which layer does this belong in?” “How does this affect the WAL?” “What is the lock order?” Those are the questions a senior engineer asks — and now they are yours.

Thank you for flying the whole route. Safe landings.

One line of code. A lot happening underneath.

The 5 layers, at a glance

Meet the public API

The journey of one row

Zooming in: the real Table.Insert code

Test your intuition

Users report duplicate-key errors even after rolling back a transaction. Which layer's check is most likely firing?

You want to add compression for stored rows — smaller files, faster disk reads. Which layer should own the compression logic?

A write "succeeded" (your code got a success return), but the row was gone after a server crash. What's the most likely cause?

Put these four events in the order they happen during an insert: (a) bytes persisted in a page on disk (b) row serialized to bytes (c) primary key extracted (d) WAL entry appended

Disks don't care about rows

Page — 4096 bytes

Extent — 8 pages = 32 KB

Dirty flag

LRU cache

Anatomy of a page

Extents — bundling the moving truck

Try it yourself: which extent?

The front-desk clipboard — LRU cache

Put it to work

If the cache holds 100 pages at 4 KB each, how much RAM does it use?

You've written 10,000 rows but haven't called Flush(). The power cuts out. What survives?

Why does the cache evict the least recently used page, rather than the least frequently used one?

The phone book with index tabs

Two kinds of nodes, two different jobs

Two operations that justify the whole design

Search: climb down the tabs to the right page

Range: find the start, then walk sideways

The three operations, at a glance

Search — O(log n)

Insert + Split

Range scan

Build the tree: inserting 10, 20, 30, 40

Quick check — apply what you've seen

Your query is WHERE user_id BETWEEN 1000 AND 2000. Which B+ tree operation runs, and why does it stay cheap?

After many inserts your tree has become noticeably taller, and searches feel slower. What happened?

Leaves have both Next AND Previous pointers, but the current Range method only uses Next. Why keep Previous?

Insert 5, 15, 25, 35 in order into an empty order-3 tree (max 2 keys per leaf). What does the root look like after the last insert?

The GPS route planner inside your database

LINQ you wrote

Access path chosen

Cost

The same planner, three different routes

Equality on Id

Range on Id

Filter on Name

The three phases of Execute

Picking the door: ExecuteAccessPath

Where the plan meets the tree

Three views of the same query

Spot the bug: the query that won't use the index

Find the line that kills the index:

Check your route-planning instincts

1. Given .Where(r => (int)r["Age"] > 25).Where(r => (int)r["Id"] == 7) — which Where helps the index?

2. A user complains their query got slower after adding ORDER BY. What happened?

3. You want to add a secondary index on Email. Which class needs the biggest change?

4. Given .Where(r => (int)r["Id"] >= 10 && (int)r["Id"] < 20), describe the plan.

The airline check-in desk

Atomicity

Consistency

Isolation

Durability

Log first, apply after

Inside a logbook entry

The moment of commit

Morning after the crash

Keeping the logbook lean

Think it through

The Subway Revolving Door

The door has three states

Five Doors, Stacked

Reading vs Writing: One Word Changes Everything

When the Door Jams: A Deadlock Almost Happens

Spot the Bug

Find the line that violates lock ordering:

Check Your Understanding

Reads are fast, but writes sometimes stall the whole table for seconds. What is most likely going on?

The app hangs forever when two threads call Method1 and Method2 at the same time. What is the first thing to check?

Why does BPlusTree use a simple lock(obj) instead of ReaderWriterLockSlim?

You edit a row inside a transaction. Another thread calls SelectAll() before you commit. What does the other thread see?

Zooming in: the real `Table.Insert` code

Put these four events in the order they happen during an insert:
(a) bytes persisted in a page on disk (b) row serialized to bytes (c) primary key extracted (d) WAL entry appended

You've written 10,000 rows but haven't called `Flush()`. The power cuts out. What survives?

Your query is `WHERE user_id BETWEEN 1000 AND 2000`. Which B+ tree operation runs, and why does it stay cheap?

Leaves have both `Next` AND `Previous` pointers, but the current `Range` method only uses `Next`. Why keep `Previous`?

Insert `5, 15, 25, 35` in order into an empty order-3 tree (max 2 keys per leaf). What does the root look like after the last insert?

1. Given `.Where(r => (int)r["Age"] > 25).Where(r => (int)r["Id"] == 7)` — which Where helps the index?

2. A user complains their query got slower after adding `ORDER BY`. What happened?

3. You want to add a secondary index on `Email`. Which class needs the biggest change?

4. Given `.Where(r => (int)r["Id"] >= 10 && (int)r["Id"] < 20)`, describe the plan.

The app hangs forever when two threads call `Method1` and `Method2` at the same time. What is the first thing to check?

Why does `BPlusTree` use a simple `lock(obj)` instead of `ReaderWriterLockSlim`?

You edit a row inside a transaction. Another thread calls `SelectAll()` before you commit. What does the other thread see?

Why not just use `Console.WriteLine` everywhere for logs?

Two threads both call `IncrementInserts` at the same instant. Can the counter lose an increment?