Database synchronisation¶
CommitSynchronizer replicates a CommitDatabase between two
sites. A CommitDatabase is made of two content-addressed
spaces: the DAG of commits, and the pool of blobs —
immutable binary payloads identified by their hash and referenced
from inside commits (see
Binary Data (Blobs)). Both are append-only
and content-addressed
independently. Sync replicates both, by two set differences:
one on commit ids, one on blob hashes. Each side copies what the
other has and it lacks. The resulting DAG and blob pool on each
side are the union of both — divergent heads included, unchanged.
Sync does not reduce the resulting heads. That is a separate operation (see Sync vs reduce below).
Two deployment patterns¶
Transparent proxy¶
CommitDatabase.connect() and CommitDatabase.connect_local() open
a remote proxy over a commit_database_server. The application
holds no local copy; every read and write traverses the network.
This is direct remote access — not synchronisation. There is no
second base to keep in step.
See commit_database_server.py for
the server side and CommitDatabase.connect() for the client side.
Replicated¶
Each site holds its own local CommitDatabase, and a
CommitSynchronizer periodically (or on demand) exchanges commits
with another site — typically a central server, but the engine
makes no such assumption.
This pattern enables offline work, local-first reads, and bandwidth amortisation. It also gives each site its own write head, which can make the application a multi-stream consumer of the engine — depending on how the diverging heads are reduced; see Implications.
Mechanism¶
CommitSynchronizer operates on two CommitDatabasing instances
named source and target — they are roles, not a hierarchy.
The same machinery synchronises two local databases, two remote
ones, or any combination.
The operation in one pass:
Detect change. Read
dataVersion()on both sides. If neither has changed since the last sync, return immediately.Extend definitions if needed. Compare
definitionsHexDigeston both sides — if they differ,extendDefinitionsadds the sender’s missing types to the receiver.extendDefinitionsis strictly additive by construction (DSM types are sealed by definition), so independent schema evolution never blocks sync.Copy missing commits and the blobs they reference. Walk the missing commits —
source.commitIds() \ target.commitIds()for Fetch,target.commitIds() \ source.commitIds()for Push — in topological order. For each Mutations commit, decode its opcodes (against the synced definitions) to collect any blob references they carry, copy those the target lacks (set difference on blob hashes), then create the commit itself. This guarantees the invariant that a commit on the target never references a blob the target does not have — theblob_idconstraint stated in the blob API.Blobs are packed into batches of
size_of_packed_blobs(default 25 MB) to amortise network round-trips — without packing, a commit referencing many small blobs would cost one round-trip per blob.
What sync does not do:
It never rewrites a commit. Append-only is preserved end-to-end.
It never picks a winner between divergent heads. Multiple heads resulting from independent writes survive the sync intact, on both sides.
It never reduces or transforms a commit’s mutations — the opcode payload is stored verbatim. It decodes a Mutations commit’s opcodes only to collect the blob references it must copy first (step 3).
Three modes¶
The two roles are symmetric by construction; a mode picks one of
the two possible directions across that pair (or both, for Sync).
Mode |
Direction |
Use case |
|---|---|---|
|
source → target |
Pull updates from a server into a local replica. |
|
target → source |
Send local commits up to a server. |
|
both (Fetch then Push) |
Bidirectional, the common case. |
Modes are passed as strings to the Python constructor and exposed
as the class constants MODE_FETCH, MODE_PUSH, MODE_SYNC.
Sync vs reduce¶
These are two distinct operations. Confusing them is the most common source of surprise.
Operation |
What it does |
Result |
|---|---|---|
Sync |
Copies missing commits across sites. Append-only. |
Both sides see the union of commits. Multiple heads may now coexist on each side. |
Reduce |
Calls |
One head (or fewer than before). |
CommitSynchronizer::sync() performs only the first. The second is
either:
automatic, if the application uses a
CommitStorethat callsreduceHeads()after sync, orexplicit, via
commit_admin reduce_headsagainst the database, orbespoke, via direct
commitMergecalls in application code.
The split is intentional: a replicated topology may want to preserve multiple heads for inspection before reducing, or apply a domain-specific reduction order. Forcing reduction inside sync would foreclose that choice.
API¶
Python (dsviper.CommitSynchronizer)¶
from dsviper import CommitSynchronizer, CommitDatabase
local = CommitDatabase.open("local.cdb")
remote = CommitDatabase.connect("server.local", "54321")
synchronizer = CommitSynchronizer(local, remote, mode="Sync")
info = synchronizer.sync()
The returned CommitSynchronizerInfo reports how many commits and
blobs flowed in each direction and whether DSM definitions were
extended. See the
CommitSynchronizer API reference for
the full surface.
Command line¶
python3 tools/commit_admin.py --host server.local sync local.cdb
python3 tools/commit_admin.py --host server.local sync local.cdb --loop --update-interval 2
See commit_admin.py for all options, including continuous mode.
Implications¶
On Modes of Use¶
A replicated topology is multi-head by construction: each site has its own write head, and divergent heads meet at sync time, not at write time. Whether that makes you multi-stream is a question of how those heads are reduced, not of topology — the same line Modes of Use draws for multi-head exploration. A single author who reviews each head reduction stays single-stream; a second author committing in parallel, or head reduction left unreviewed, does not.
Once you are past that line, a single-stream model — one that assumes a single author with a linear history — has no safe path here. You are then at least in Multi-stream with local invariants, and possibly in Multi-stream with strong invariants depending on what your invariants look like. The diagnostic of Modes of Use applies before you reach for sync.
On the Dual-Layer Contract¶
Every sync extends the local DAG with commits authored elsewhere.
Once the local state is reconstructed from a head that descends
from a commitMerge, reading that state is an
import
— the same import boundary the contract describes for intra-base
mechanical reduction. Synchronisation does not weaken the
contract; it expands the surface where the contract applies.
On Cooperative Discipline¶
Cooperative Discipline remains the modelling exit. If the contributions of each site are routed through structurally disjoint paths — by attachment partitioning, commutative containers, or scope decomposition — convergence post-sync is semantically trivial. The discipline does not change across the sync boundary; the boundary is just another place where it pays off.
Tools¶
Tool |
Role |
|---|---|
The runtime class. Python API. |
|
CLI wrapper for one-shot or continuous sync against a remote server. |
|
Reduce multiple heads after sync. Separate operation. |
|
Network-exposed CommitDatabase. The remote endpoint of replicated sync. |
|
Example application wiring sync into a Qt UI (connect dialog, threaded synchroniser, live log). |
See also¶
Modes of Use — the diagnostic that determines whether sync is even an option for your application.
The Dual-Layer Contract — what becomes load-bearing once sync is in play.
Cooperative Discipline — how to make convergence post-sync trivial by design.
Database Server — the network surface and CLI tools.
cdbe.py — a worked example wiring sync into a real application.