Commit Database¶
The Commit Database is the persistence layer: an immutable mutation DAG, transactional, with history and deterministic reduction across concurrent streams.
Important
Before reading the API, identify your mode of use. The Modes of Use diagnostic decides whether the sections below are reference material or load-bearing for you — single-stream readers can ignore reduction-related sections entirely.
Structure¶
A CommitDatabase holds two content-addressed spaces:
The DAG of commits — the versioned history of mutations. Each commit is identified by its content hash (
CommitId). This is what the rest of this page documents.The pool of blobs — immutable binary payloads (textures, meshes, raw buffers, …) identified by their hash and referenced from inside commits. See Binary Data (Blobs) for the blob API.
Both spaces are append-only and content-addressed independently. They are replicated together by Database synchronisation.
Opening a CommitDatabase¶
>>> db = CommitDatabase.open("model.cdb")
To create a new database with embedded definitions, use:
python3 tools/dsm_util.py create_commit_database model.dsm model.cdb
Reading State¶
A freshly created database has no commits — first_commit_id() and
last_commit_id() return None:
>>> db.first_commit_id() is None
True
>>> db.last_commit_id() is None
True
>>> db.head_commit_ids()
set()
The initial_state() method always works and returns the empty
state. The example uses constants like TUTO_A_USER_LOGIN exposed
by the database’s embedded definitions — see
Embedded Definitions below; a one-line
db.definitions().inject() makes them available in the calling
namespace.
>>> initial = db.initial_state()
>>> len(initial.attachment_getting().keys(TUTO_A_USER_LOGIN))
0
AttachmentGetting Interface¶
Read attachments via attachment_getting():
>>> getting = state.attachment_getting()
>>> doc = getting.get(attachment, key)
>>> doc
Optional({...})
>>> keys = getting.keys(attachment)
Mutations¶
Create a mutable state and apply changes:
>>> mutable_state = CommitMutableState(db.state(db.last_commit_id()))
>>> mutating = mutable_state.attachment_mutating()
>>> mutating.set(attachment, key, document)
>>> mutating.update(attachment, key, path, new_value)
Committing¶
commit_mutations() returns the new commit id — capture it explicitly to
chain further mutations or read the resulting state:
>>> commit_id = db.commit_mutations("Commit message", mutable_state)
Complete Example¶
Add an Alice document and read it back:
>>> key = TUTO_A_USER_LOGIN.create_key()
>>> login = TUTO_A_USER_LOGIN.create_document()
>>> login.nickname = "alice"
>>> login.password = "secret"
>>> mutable = CommitMutableState(db.initial_state())
>>> mutable.attachment_mutating().set(TUTO_A_USER_LOGIN, key, login)
>>> commit_id = db.commit_mutations("Add Alice", mutable)
>>> state = db.state(commit_id)
>>> state.attachment_getting().get(TUTO_A_USER_LOGIN, key)
Optional({nickname='alice', password='secret'})
Path-Based Mutators¶
Instead of replacing entire documents with set(), path-based mutators use
Paths to target specific locations. This enables path-based merging when multiple
users edit concurrently.
Mutator |
Target |
Operation |
|---|---|---|
|
Field |
Replace value at path |
|
Set |
Add elements |
|
Set |
Remove elements |
|
Map |
Add key-value pairs |
|
Map |
Remove keys |
|
Map |
Update existing key |
|
XArray |
Insert at position |
|
XArray |
Update at position |
|
XArray |
Remove at position |
Field Update¶
>>> mutating.update(TUTO_A_USER_LOGIN, key, TUTO_P_LOGIN_NICKNAME, "alice_updated")
Why Paths Matter¶
When two users edit different fields simultaneously:
User A: update(attachment, key, path_to_name, "Alice")
User B: update(attachment, key, path_to_email, "bob@example.com")
After convergence: Both updates apply (disjoint paths)
With set(), one user’s changes would overwrite the other’s.
Paths matter here because name and email are owned by distinct writers:
each means exactly the field they touch, so the union — Alice’s name beside
Bob’s email — is the collective intent, owned end to end. The same verbs
invent instead when a path is a fragment of a whole-value intent that
diff happened to split — see
Re-entering the graph.
See Cooperative Discipline for the principle (scope ownership) and its limits.
Commit History¶
Inspect commit metadata:
>>> header = db.commit_header(commit_id)
>>> header.label()
'Add Alice'
>>> header.parent_commit_id() == ValueCommitId()
True
The first commit’s parent is the zero ValueCommitId (no ancestor).
Navigate history by passing the explicit ids you captured:
>>> state1 = db.state(first_commit_id)
>>> state2 = db.state(latest_commit_id)
Embedded Definitions¶
CommitDatabase stores its definitions:
>>> defs = db.definitions()
>>> sorted(str(t) for t in defs.types())
['Tuto::Account', 'Tuto::Identity', 'Tuto::Login', 'Tuto::Status', 'Tuto::Texture', 'Tuto::Thumbnail', 'Tuto::User']
Calling defs.inject() makes TUTO_A_USER_LOGIN, TUTO_S_LOGIN, etc.
available as constants in the calling namespace.
How Reduction Picks a Winner¶
When concurrent streams are reduced, the engine has to choose a single outcome for every overlapping path. The choice is deterministic given a fixed merge sequence — same inputs, same merges, same result on every client — but its mechanics are structural, not author- or time-meaningful.
The merge primitive. commitMerge(parent, target) creates a
merge commit. When the resulting state is reconstructed, target’s
mutations are applied after parent’s — so on every overlapping
path, the value from target survives. This is the only such rule
the engine itself fixes, and it makes the operation non-commutative:
commitMerge(A, B) ≠ commitMerge(B, A).
Reducing multiple heads is a strategy, not a guarantee. The
built-in reduceHeads seeds the running result with the most recent
head (lastCommitId(), by authoring timestamp) and folds the remaining
heads into it in ascending CommitId order, calling commitMerge once
per head with the running result as parent and the head as target. Applications are free to use a
different order — or to skip reduceHeads entirely and issue their
own commitMerge sequence. The final state depends on who calls
commitMerge in what order, not on a property of the engine.
Within this default, the outcome on an overlapping path is still fully
determined — a function of the CommitId hashes, reproducible on every
client — but it is not predictable without computing them. One
consequence is easy to miss: because the seed is folded in first and
each later commitMerge lets target overwrite it, the most recent
head is not the one preserved on an overlapping path; the
highest-CommitId head, applied last, is.
Because the outcome is set by the merge sequence and not by the engine, every
client that reduces heads on a shared database must use the same strategy. Two
processes folding the same heads in different orders — one in ascending
CommitId, another in, say, hash-table iteration order — produce different
states on contested paths, and the shared history stops converging. Fix one
reduction order for all writers of a shared store; the built-in reduceHeads()
is the obvious choice, and it is transactional, where a hand-rolled fold can
leave a half-merged DAG if it is interrupted mid-merge.
On an overlapping path, the surviving value is structural, not intentional. Whichever strategy is used, the value that survives is a function of how merges were sequenced — not of authorship, recency, or semantic priority. Two authors editing the same field have no way to predict which value will survive reduction, even within a fixed strategy.
The implication for the application is treated in the Dual-Layer Contract: do not rely on a specific arbitration outcome; re-validate at read time.
Performance characteristics¶
Reconstructing a document’s state from its commit history is
per-document, not per-database. Cost depends on the opcodes that
touched it and on Ops — the number of path-targeted operations
on this document since its last set.
Opcode |
Reconstruction |
Notes |
|---|---|---|
|
O(1) |
replaces the whole document |
|
O(Ops) |
replays field-level updates |
|
O(Ops) |
|
|
O(Ops) |
|
|
O(Ops) |
|
|
O(Ops · log M) |
M = map size |
|
O(Ops · log M) |
|
|
O(Ops · S) |
S = set size |
|
O(Ops · S) |
|
|
O(Ops²) |
UUID positioning, quadratic worst |
Design pitfall: O(N²) on accumulated Sets¶
A document whose state grows by repeated union_in_set across many
commits incurs O(Ops²) reconstruction — every read replays every
prior union. For long-running topologies that grow by accumulation,
prefer either a tree structure with set() replacing the children
list each commit, or a periodic flatten (see
Storage growth).
Validated scale¶
Commit has been benchmarked at:
~6 600 documents per database — about 3 MB of structural data, alongside ~3 GB of associated blobs on CAD workloads (structure is typically < 0.15 % of total disk footprint);
up to 8 concurrent processes sharing a single SQLite database with applicative jitter between commits.
State reconstruction is linear in document count: a full warm-up via
CommitState.cache_preload() runs at ~1.5–2 µs per document across
this range — 0.4 ms at 230 documents, 14 ms at 6 600.
Behaviour beyond those envelopes is not characterised.
Storage growth¶
The mutation DAG is append-only — once written, every commit is immutable. The database grows monotonically; the runtime carries no incremental garbage collection, no partial purge that trims old commits while keeping recent history, and no archival mechanism.
The only sanctioned way to shrink a database is Flatten — a user-space pattern, not a runtime operation: read the current head state from the source database, write it as the initial commit of a fresh target database, then switch readers and writers to the new database. The source database is untouched; the target is a new append-only history that happens to start where the old one ended.
The dsviper binding ships this pattern as a ready-made converter,
CommitDatabaseFlattener: it flattens a chosen commit into a fresh
single-commit target, keeping only the blobs that commit still references
and dropping superseded history blobs. The source is left untouched — the
converter automates the pattern, it does not trim history in place. See
Database Transfer for the full transfer
toolkit.
Warning
CommitDatabase.delete_commit() and CommitDatabase.reset_commits()
(plus the CLI wrapper commit_admin reset) are not features —
they are tricks for live-demo scenarios.
delete_commit(commit_id)only makes sense on a head — deleting any other commit would orphan its descendants. Live-demo use: rewinding the DAG by one step.reset_commits()/commit_admin resetremoves every commit except the initial one. Live-demo use: replaying a scenario from a known baseline between runs.
Both operations break the append-only invariant that every other reader relies on. Never use them as storage-management tools.
Sustained-growth scenarios that need to keep recent history while trimming older commits are not addressed by the current runtime.
Safe Usage¶
A checklist for the operational gotchas. None of this is enforced by the engine — it’s on the application.
Identify your mode first. The Modes of Use carry different burdens. Only multi-stream with strong invariants makes the Dual-Layer Contract load-bearing; the other modes are safe to use without it.
Capture
commit_idexplicitly.commit_mutations()returns the new id; there is no implicit current commit to auto-advance. Chain further mutations and reads from the captured value. That id is itself state: if you keep it outside the database — in a scene file, a config, any side-channel the store never sees — it can drift out of sync (one is copied, restored, or reopened without the other). A stale or absent baseline makes the next read either raise or, worse, diff against the wrong commit and emit authored mutations no one made. Store the cursor so it cannot outlive or desync from the database it points into.Prefer path-based mutators over
set()for fields edited concurrently.set()replaces the whole document, so disjoint edits collide.update,union_in_set,update_in_map, etc. converge cleanly on disjoint paths — see Why Paths Matter. But match the path to the semantic unit: lettingdiffsplit a bound value into sub-paths is its own failure mode — see Re-entering the graph.Re-validate the state when you read it back, not when you build the mutations. Under best-effort reduction, mutations may have been silently dropped and combined states may violate cross-field invariants. See The Dual-Layer Contract for the discipline and where it becomes load-bearing.