# Database synchronisation `CommitSynchronizer` replicates a `CommitDatabase` between two sites. A `CommitDatabase` is made of **two content-addressed spaces**: the **DAG of commits**, and the **pool of blobs** — immutable binary payloads identified by their hash and referenced from inside commits (see [Binary Data (Blobs)](../dsviper/blobs.md)). Both are append-only and content-addressed independently. Sync replicates both, by **two set differences**: one on commit ids, one on blob hashes. Each side copies what the other has and it lacks. The resulting DAG and blob pool on each side are the union of both — divergent heads included, unchanged. Sync **does not reduce** the resulting heads. That is a separate operation (see [Sync vs reduce](#sync-vs-reduce) below). --- ## Two deployment patterns ### Transparent proxy `CommitDatabase.connect()` and `CommitDatabase.connect_local()` open a **remote proxy** over a `commit_database_server`. The application holds no local copy; every read and write traverses the network. This is direct remote access — not synchronisation. There is no second base to keep in step. See [`commit_database_server.py`](../dsviper-tools/server.md) for the server side and `CommitDatabase.connect()` for the client side. ### Replicated Each site holds its **own local `CommitDatabase`**, and a `CommitSynchronizer` periodically (or on demand) exchanges commits with another site — typically a central server, but the engine makes no such assumption. This pattern enables offline work, local-first reads, and bandwidth amortisation. It also gives each site its **own write head**, which can make the application a **multi-stream** consumer of the engine — depending on how the diverging heads are reduced; see [Implications](#implications). --- ## Mechanism `CommitSynchronizer` operates on two `CommitDatabasing` instances named **source** and **target** — they are roles, not a hierarchy. The same machinery synchronises two local databases, two remote ones, or any combination. The operation in one pass: 1. **Detect change.** Read `dataVersion()` on both sides. If neither has changed since the last sync, return immediately. 2. **Extend definitions if needed.** Compare `definitionsHexDigest` on both sides — if they differ, `extendDefinitions` adds the sender's missing types to the receiver. `extendDefinitions` is strictly additive by construction (DSM types are sealed by definition), so independent schema evolution never blocks sync. 3. **Copy missing commits and the blobs they reference.** Walk the missing commits — `source.commitIds() \ target.commitIds()` for Fetch, `target.commitIds() \ source.commitIds()` for Push — in topological order. For each Mutations commit, decode its opcodes (against the synced definitions) to collect any blob references they carry, copy those the target lacks (set difference on blob hashes), then create the commit itself. This guarantees the invariant that **a commit on the target never references a blob the target does not have** — the [`blob_id` constraint](../dsviper/blobs.md#blob_id-reference) stated in the blob API. Blobs are packed into batches of `size_of_packed_blobs` (default 25 MB) to amortise network round-trips — without packing, a commit referencing many small blobs would cost one round-trip per blob. What sync **does not** do: - It never rewrites a commit. Append-only is preserved end-to-end. - It never picks a winner between divergent heads. Multiple heads resulting from independent writes survive the sync intact, on both sides. - It never reduces or transforms a commit's mutations — the opcode payload is stored verbatim. It decodes a Mutations commit's opcodes only to collect the blob references it must copy first (step 3). --- ## Three modes The two roles are symmetric by construction; a **mode** picks one of the two possible directions across that pair (or both, for `Sync`). | Mode | Direction | Use case | |---------|------------------------|--------------------------------------------------| | `Fetch` | source → target | Pull updates from a server into a local replica. | | `Push` | target → source | Send local commits up to a server. | | `Sync` | both (Fetch then Push) | Bidirectional, the common case. | Modes are passed as strings to the Python constructor and exposed as the class constants `MODE_FETCH`, `MODE_PUSH`, `MODE_SYNC`. --- ## Sync vs reduce These are **two distinct operations**. Confusing them is the most common source of surprise. | Operation | What it does | Result | |--------------|------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------| | **Sync** | Copies missing commits across sites. Append-only. | Both sides see the union of commits. **Multiple heads may now coexist on each side.** | | **Reduce** | Calls `commitMerge` on multiple heads to produce a merge commit. Append-only too — the merge is a new commit, not a rewrite. | One head (or fewer than before). | `CommitSynchronizer::sync()` performs only the first. The second is either: - automatic, if the application uses a `CommitStore` that calls `reduceHeads()` after sync, or - explicit, via `commit_admin reduce_heads` against the database, or - bespoke, via direct `commitMerge` calls in application code. The split is intentional: a replicated topology may want to preserve multiple heads for inspection before reducing, or apply a domain-specific reduction order. Forcing reduction inside sync would foreclose that choice. --- ## API ### Python (`dsviper.CommitSynchronizer`) ```python from dsviper import CommitSynchronizer, CommitDatabase local = CommitDatabase.open("local.cdb") remote = CommitDatabase.connect("server.local", "54321") synchronizer = CommitSynchronizer(local, remote, mode="Sync") info = synchronizer.sync() ``` The returned `CommitSynchronizerInfo` reports how many commits and blobs flowed in each direction and whether DSM definitions were extended. See the [CommitSynchronizer API reference](../dsviper/api/commit.rst) for the full surface. ### Command line ```bash python3 tools/commit_admin.py --host server.local sync local.cdb python3 tools/commit_admin.py --host server.local sync local.cdb --loop --update-interval 2 ``` See {ref}`commit_admin.py ` for all options, including continuous mode. --- ## Implications ### On Modes of Use A replicated topology is **multi-head by construction**: each site has its own write head, and divergent heads meet at sync time, not at write time. Whether that makes you *multi-stream* is a question of how those heads are reduced, not of topology — the same line [Modes of Use](commit_modes.md#multi-head-exploration) draws for multi-head exploration. A single author who reviews each head reduction stays single-stream; a **second author committing in parallel**, or head reduction left unreviewed, does not. Once you are past that line, a single-stream model — one that assumes a single author with a linear history — has no safe path here. You are then at least in [Multi-stream with local invariants](commit_modes.md#multi-stream-with-local-invariants), and possibly in [Multi-stream with strong invariants](commit_modes.md#multi-stream-with-strong-invariants) depending on what your invariants look like. The diagnostic of [Modes of Use](commit_modes.md) applies *before* you reach for sync. ### On the Dual-Layer Contract Every sync extends the local DAG with commits authored elsewhere. Once the local state is reconstructed from a head that descends from a `commitMerge`, **reading that state is an [import](commit_contract.md#reading-the-state-is-an-import-not-a-load)** — the same import boundary the contract describes for intra-base mechanical reduction. Synchronisation does not weaken the contract; it expands the surface where the contract applies. ### On Cooperative Discipline [Cooperative Discipline](commit_cooperation.md) remains the modelling exit. If the contributions of each site are routed through structurally disjoint paths — by attachment partitioning, commutative containers, or scope decomposition — convergence post-sync is semantically trivial. The discipline does not change across the sync boundary; the boundary is just another place where it pays off. --- ## Tools | Tool | Role | |--------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| | [`dsviper.CommitSynchronizer`](../dsviper/api/commit.rst) | The runtime class. Python API. | | [`commit_admin sync`](../dsviper-tools/server.md#sync-local-with-remote) | CLI wrapper for one-shot or continuous sync against a remote server. | | [`commit_admin reduce_heads`](../dsviper-tools/server.md#reduce-heads) | Reduce multiple heads after sync. Separate operation. | | [`commit_database_server.py`](../dsviper-tools/server.md) | Network-exposed CommitDatabase. The remote endpoint of replicated sync. | | [`cdbe.py`](../commit-apps/cdbe.md) | Example application wiring sync into a Qt UI (connect dialog, threaded synchroniser, live log). | --- ## See also - [Modes of Use](commit_modes.md) — the diagnostic that determines whether sync is even an option for your application. - [The Dual-Layer Contract](commit_contract.md) — what becomes load-bearing once sync is in play. - [Cooperative Discipline](commit_cooperation.md) — how to make convergence post-sync trivial by design. - [Database Server](../dsviper-tools/server.md) — the network surface and CLI tools. - [cdbe.py](../commit-apps/cdbe.md) — a worked example wiring sync into a real application.