The Dual-Layer Contract¶

This page is load-bearing only for Multi-stream with strong invariants; reach it from the Modes of Use diagnostic, not before. In every other mode it is reference.

The Commit Database produces structurally sound but semantically untrusted output under reduction. Re-validating at read time closes the gap for local invariants. For strong invariants it cannot — the dropped intent is not recoverable downstream, so the gap must be closed upstream: re-architect toward local invariants (see Cooperative Discipline) or supervise reduction with an application layer.

A change of discipline¶

The contract exists because two layers, governed by opposite disciplines, meet at the reduction boundary.

dsviper / Viper C++ applies fail-fast at the type and structural level: malformed values, undefined paths against a known schema, references to absent attachments all raise exceptions immediately. Nothing is silently coerced.
The Commit Engine, layered on top, applies best-effort at the reduction level: when concurrent streams meet, mutations whose targets have disappeared after divergence are silently dropped so reduction proceeds without human arbitration.

The two are not contradictory — they govern different operations at different moments. But the discipline changes across the boundary, and that change is precisely what this contract formalises: the application takes back, at read time, the strictness the engine relaxes at reduction time.

Why the divide exists: complicated vs complex¶

The boundary is not arbitrary — the engine and the application solve two different kinds of problem:

Reduction is complicated. Many moving parts — LWW arbitration, merge ordering, structural drop rules — but mechanical: the same inputs and the same merge sequence always yield the same outcome. Solvable in code once, with no knowledge of your domain.
Semantic validation is complex. Emergent and application-specific, with no closed-form solution: uniqueness, referential integrity, cross-field invariants all depend on what your data means. No engine solves it generically without becoming your application.

So the engine stops at reduction — pushing semantics inside would force it either to refuse states (breaking unsupervised reduction) or to encode every application’s rules (breaking generality).

The Contract¶

Layer	Guarantees
Commit	Deterministic reduction, DAG consistency, immutability
Application	Re-validates engine output before acting on it

When concurrent streams are reduced, mutations are applied using a best-effort algorithm:

Mutations targeting non-existent documents or unresolved paths are silently ignored.
Business rule violations are not detected by the Commit Database.
LWW arbitration may keep a value that no single submitted intent would have produced.

This is by design. The engine is intentionally agnostic to your domain rules and never refuses to reduce — it picks a deterministic outcome and moves on.

Reading the State Is an Import, Not a Load¶

This is the load-bearing point of the contract.

“Untrusted” usually means data that crossed a boundary you don’t control — network input, a deserialized file, an external API. The natural reflex is to validate at that boundary and then trust the data internally. We call that a load: a one-shot transfer of confidence from an external source into a trusted in-memory representation.

Best-effort reduction breaks that reflex. The state returned by state(commitId) is itself a boundary, even though the data never left the process. The state is reconstructed by replaying the mutations each author committed, in linearisation order — and a mutation’s grain is the verb that produced it, not the whole value the author had in mind. A set() replaces the whole document and supersedes every mutation before it; the path-based mutators (update and the container operations) instead accrete, each on its own path. The result is assembled from these mutations, not from the intents behind them.

For the path-based mutators that carry most concurrent work, that assembly resolves each shared structure in one of two regimes:

Shared path — resolution with loss. When two heads write the same path, the last in linearisation order wins and the rest are discarded. No value is invented: a value you really submitted wins, verbatim; the others are gone, and nothing signals it.
Disjoint paths — recombination without loss. When two heads write different paths of the same structure, both survive: no field value is dropped, and none is fabricated. The recombined structure has no single author — with several writers it cannot — but whether that is owned or invented depends on what the paths meant.

A writer owns a scope when their whole intent is exactly that scope — all of it, and nothing outside it. That distinction is the whole of it. If each writer owned the path they touched, the union is the collective’s intent, owned end to end: trustworthy, though no one person wrote the whole. If instead diff split one author’s whole-value intent across those paths, the union recombines fragments no one meant to stand alone — a structure that answers to no author. The two are indistinguishable field by field: with nothing lost, every field verbatim, and no invariant broken, per-field re-validation cannot tell the owned union from the invented one, because each field, alone, is valid. What ownership certifies is not the fields but the combination.

So name the escalation. Valid or invalid is the wrong first question: it presumes a true state to measure against. Untrusted is the working frame — re-validate on read. And where the paths were fragments, the state is invented: no prior true state exists to recover. Three depths of one gap, not synonyms.

Structurally the data is sound — types check, paths resolve, the DAG is consistent. Semantically, the right question is ownership, not unicity: with several writers the state never reflects a single intent, so demanding that is the wrong bar. The state is trustworthy when every surviving value is one author’s whole intent over a scope they alone wrote — and an invention when any surviving value is a fragment that recombination left standing on its own.

Treat reading state(commitId) as an import, not a load. An import acknowledges that the source is structurally well-formed but semantically foreign: a strategy must be picked for what to do with parts that violate your domain rules. A load offers no such choice — it assumes the data is already yours. After best-effort reduction, it is not.

Three Error Families¶

Because of this, validation in a dsviper-based system happens at three distinct places, not one:

Family	Origin	Detected by
Untrusted (external)	I/O, deserialization, type mismatch, malformed path	Engine, fail-fast → `dsviper.Error`
Untrusted (post-reduction)	State returned by the Commit Database after reduction	Application, at read time
Invalid (semantic)	Business-rule violation on otherwise sound data	Application, at read time

The first family is what dsviper.Error covers. The other two are entirely your responsibility — and they share a remediation: re-validate when you read the state, not when you build the mutations.

What Commit Provides¶

Guarantee	Description
DAG Consistency	Commits form a valid directed acyclic graph
Immutability	Once committed, data cannot be modified
Deterministic Reduction	`commitMerge` is a pure function of its ordered inputs — same `(parent, target)` produces the same commit. Which order ends up applied across multi-head topologies is an application choice, not an engine property.
Content-Addressable	`CommitId = SHA-1(content)`, tamper-evident

Note

Why reduction — not merge, not convergence. A merge in version-control tools is a reconciliation step that may surface conflicts and ask a human to arbitrate; the engine does no such thing. Convergence, in the CRDT sense, names an order-independent outcome — the same set of updates yields the same state regardless of the order they are applied; the engine does not provide that either, because commitMerge is non-commutative (commitMerge(A, B) ≠ commitMerge(B, A)). What the engine actually does is reduce divergent heads to one by deterministic structural rules: reproducible given a fixed order, order-dependent, with no semantic arbitration. We reserve convergence for the commutative subset — disjoint paths, accretive containers (see Cooperative Discipline) — where reduction genuinely is order-independent. Everywhere else, it is reduction.

How Reduction Picks a Winner¶

The mechanics — commitMerge and the structural rules that pick a value on each overlapping path — are described in Commit Database — How Reduction Picks a Winner. The summary the contract leans on: the outcome is deterministic given a fixed merge sequence, but its mechanics are structural, not author- or time-meaningful. Two authors editing the same field have no way to predict which value will survive reduction.

This is what the contract is telling you: do not rely on a specific arbitration outcome. Re-validate at read time.

What Commit Does NOT Provide¶

Not Provided	Description
Intent Preservation	Deterministic arbitration (LWW), not your intent
Semantic Validation	No business rule checking
Mutation Notification	No alert when mutations are silently ignored
Conflict Detection	No notion of conflict — just deterministic reduction

Import Outcomes¶

The four entries below apply to standard consumers meeting untrusted state — code that did not commit to the defensive-by-design route from day one. They are commonly framed as “strategies”; under strong invariants they are better described as modes of loss: none re-enters the DAG, none recovers intent.

The trigger for import is the merge itself, not concurrency or automation. A strictly linear history is a load: every mutation you submitted is exactly the one that contributed to the state. The moment the history contains a commitMerge — unsupervised reducer or human desktop merge — best-effort drops have been applied silently, and reading becomes an import.

That state answers to no single human intent. Where streams overlap, reduction picks by structural rule and can preserve a combination no contributor would have written — its author is the mechanism, not a person. So none of the outcomes below recovers a true state; there is none to recover. Read them as structural consequences, not degrees of developer fault — the failure is a mechanical reduction applied where intent had to survive, not the code left coping with the result.

Outcome	What the application does	Resulting state
Ignore	Consume the state as-is	untrusted — stays in the DAG, unvalidated
Extract a subset	Validate, drop what fails	partial — disconnected from the DAG
Correct	Validate, repair what fails	phantom — disconnected from the DAG
Reject	Refuse the state	none — nothing produced

Read this as a hierarchy of failure modes, not a menu. Under strong invariants:

Ignore — deferral, not a strategy: it treats the import as a load. The unvalidated violation surfaces later, in code that assumed the invariants held.
Extract a subset — invention by subtraction: to make the fragment validate, it deletes the records that carry the violation, manufacturing consistency by erasing the evidence of its own inconsistency. It carries less than the reduced state, not more.
Correct — invention by addition, the mirror of Extract: instead of deleting the violating record it overwrites it with a fabricated value that has no CommitId. Once a write builds on it, the fabrication enters the DAG as if authored.
Reject — the only honest answer under strong invariants: it concedes mechanical reduction was the wrong primitive, and pushes resolution outside the import — human intervention, rollback, or a coordination protocol.

Commit has no notion of conflict — there is nothing for the application to push back into. The engine has already produced the reduced state; whatever the application does with the result lives outside the DAG.

If you reach for one of these outcomes regularly in code that guards strong invariants, that is the diagnostic, not the solution. The next move is upstream: re-design toward Multi-stream with local invariants via scope decomposition (see Cooperative Discipline), or build an application-level supervisor. Under local invariants these outcomes remain available as defensive fallbacks — no longer load-bearing.

Re-entering the graph¶

Reading a state is half the cycle; writing it back is the other half, and the verb you write it back with decides what the Commit Database is under concurrency.

Writing back with update / diff. Path-based write-back keeps the per-path behaviour: concurrent authors who touch different paths have their edits recombined rather than overwritten. Whether that recombination is a cure or an invention turns on ownership — does each writer own the scope they touch? When the path is the unit of intent, every contribution is a whole owned intent and the union is trustworthy (the cooperative case). When diff splits one author’s whole-value intent into sub-scopes instead, those fragments recombine into a value no one owns alone — and the same minimality carries it across the cycle: at input it records a path-scoped operation from a whole-value edit, attributing a scope the author need not have meant; at reduction it lets the fragments recombine into a structure no author wrote; and when a corrected read is written back, it diffs against that reconstructed state, so a value derived from it enters the DAG as authored. Path-based write-back is the cure under ownership and the cost without it.

The loop compounds. A read-modify-write application closes the loop: it loads a state, edits it, and writes the edit back as the next commit — whose state the next load reconstructs. When the load is lossy and the write-back diffs against it, each cycle’s output is the next cycle’s input. Whatever does not round-trip is then not lost once but eroded a little more every cycle — a drift that is silent (no error), signed (a real authored commit), and cumulative on the append-only history, and that no per-cycle check catches because each cycle looks locally consistent. A loop you never close — a read, or a one-way projection — cannot drift; a loop you do close inherits the round-trip fidelity of its weakest stage.

Writing back with set. A whole-document write replaces the document and does not recombine. Concurrent writers to the same document are not folded together — one whole document survives and the others are dropped. The recombination is gone, and with it the appearance that several authors collaborated: the Commit Database is then a key-value store with history.

Neither verb is collaboration. set gives an honest versioned store where a contested document keeps one whole version — which one is a head-reduction strategy choice, not an intent; update / diff gives recombination — trustworthy when each scope is owned, an invention when it is a fragment. The architecture decision is which of the two you mean to build — and to make it on purpose, not by reaching for whichever verb is nearest.

Choosing the grain. Owning a scope is something you do, and the lever is write granularity. diff delegates that to structural difference — it splits wherever values differ, recursively down to the leaf — so it can fragment fields that form one semantic unit. Only you know which fields are bound: a translation vector, a (min, max) pair, the components of a quaternion. Write each such unit as a single update on the unit’s path rather than letting diff pick the grain. The cost is deliberate: two concurrent edits inside one unit then collide on a shared path — one whole edit lost — instead of recombining into a value neither author wrote. For bound fields that is the trade you want: an overt, detectable loss beats a silent invention. diff is safe only where the leaves already are the atoms; otherwise prefer several targeted updates — that is how a path becomes a scope someone owns.

Implications for `dsviper` Code¶

The contract shapes how you should write code on top of dsviper.Commit*:

Treat the post-reduction state as an import boundary. It is the symmetric equivalent of deserialized network input: structurally sound, semantically unverified.
Do not assume that every operation you build into a CommitMutableState will land. After concurrent streams are reduced, some operations may have been silently dropped because their targets disappeared.
Pick an import outcome explicitly, and name it as such. Do not let it emerge from where you happen to validate. If you find yourself reaching for Correct or Ignore in code that handles strong invariants, that is the signal to move the conversation upstream — see Cooperative Discipline.
Validate at the application boundary, which means the Commit Database output. If your domain has invariants (uniqueness, referential integrity, cross-field consistency), enforce them when consuming the state.

Summary¶

“I guarantee structural integrity. I hand you back data that is structurally sound but semantically untrusted. You re-validate it on read.”

— The Dual-Layer Contract

Whether that re-validation closes the loop is not decided here — it is decided by which mode you are in (Modes of Use). Under naturally local invariants there is nothing for reduction to break, so this contract is reference. Under defensive-by-design the application is built to tolerate the drop, and re-validation is part of that design. Under strong invariants re-validation can detect the violation but cannot rebuild the dropped intent — the gap closes only by re-architecting upstream, never by a better outcome downstream.