Waiver Ledger (Exceptions Are Data)
Context
In a governed harness, certain gates are mandatory: tests, linting, security scans, approvals. In reality, gates sometimes cannot run:
- The environment is missing a tool.
- A test suite is flaky.
- A dependency registry is down.
- A required approval is unavailable.
Without a formal mechanism, teams handle this informally (“ship anyway”), and the harness becomes less trustworthy over time.
Problem
How do you allow exceptions without turning governance into a pile of tribal knowledge?
You need a way to:
- Make exceptions explicit and reviewable.
- Capture why the gate was skipped and what risk remains.
- Turn waivers into measurable data for later improvement.
Forces
- Delivery pressure: teams will route around gates if the system makes exceptions impossible.
- Auditability: “we skipped tests” must be visible and attributable.
- Expiration: exceptions tend to become permanent unless forced to expire.
- Incentives: if waivers are easy, they become the default.
- Evidence-first: even when a gate is waived, you should capture alternative evidence.
Solution
Create a waiver ledger: a structured, append-only record of exceptions.
The harness treats “cannot satisfy a required gate” as one of two outcomes:
- Blocked: stop with reproduction steps.
- Waived: proceed only if a waiver record exists with required metadata and approvals.
Waivers are not a prompt instruction. They are a governed artifact that the harness can validate.
A waiver record should include:
- What gate was waived
- Why it was not runnable
- What alternative evidence was produced
- Risk assessment and affected surfaces
- Who approved it
- Expiration (time or condition)
- Links to trace IDs / diffs / incident tickets
Implementation sketch
Ledger format
Use a machine-readable format (JSON/YAML) and treat it as an audit artifact.
Minimal waiver schema (conceptual):
{
"waiver_id": "W-2026-02-23-001",
"created_at": "2026-02-23T18:22:10Z",
"scope": {
"repo": "ai-first-software-engineering-book",
"diff_fingerprint": "sha256:...",
"paths": ["book/patterns/..."]
},
"gate": {
"name": "mkdocs_build",
"required_by_policy": true
},
"reason": {
"category": "environment",
"summary": "mkdocs build runner unavailable in CI",
"details": "CI runner image missing mkdocs; scheduled fix in platform backlog"
},
"alternative_evidence": [
{"type": "local_run", "command": "uv run mkdocs build", "exit_code": 0}
],
"risk": {
"level": "low",
"notes": "Docs-only change; no runtime code"
},
"approvals": [
{"role": "maintainer", "by": "alice", "at": "2026-02-23T18:30:00Z"}
],
"expires": {
"at": "2026-03-01T00:00:00Z",
"condition": "CI image includes mkdocs"
}
}
Harness behavior
- When a required gate fails for non-actionable reasons, the harness requests a waiver rather than declaring success.
- The router validates that a waiver is present, unexpired, and covers the diff fingerprint.
- The trace references the waiver ID and includes the alternative evidence bundle.
- Policy can cap waiver usage per tier or per protected surface.
Using waivers as data
A waiver ledger is useful only if it is queried:
- Track counts by gate, reason category, and team.
- Identify chronic sources of blockage (flaky tests, missing tooling).
- Enforce auto-expiration and “waiver debt” work items.
Concrete examples
Example 1: Flaky integration tests during a low-risk change
Task: “Update docs and run build.”
- Required gate:
mkdocs build. - Gate fails in CI due to a transient filesystem error.
Outcome:
- The harness runs the gate locally and captures logs.
- A maintainer approves a short-lived waiver that covers this diff fingerprint.
- The run completes with evidence + waiver ID.
This keeps shipping possible while retaining auditability.
Example 2: Missing security scan tool blocks a protected-path edit
Task: “Edit a workflow to fix a CI regression.”
- Required gates: security scan + workflow validation.
- Security scan tool is unavailable.
Outcome:
- The harness refuses to proceed without a waiver.
- The waiver requires explicit security approval and an alternative evidence step (for example, run scan in a different environment or manual review checklist).
- The waiver expires quickly and is linked to a backlog item to restore the tool.
This prevents “temporary” exceptions from silently becoming permanent.
Failure modes
- Waiver sprawl: waivers become the normal path.
- Mitigation: approvals, expiration, and per-gate limits; treat repeated waivers as incidents.
- No alternative evidence: waivers are used as blank checks.
- Mitigation: require at least one alternative evidence item and record it in the trace.
- Unbounded scope: a waiver applies to unrelated diffs.
- Mitigation: bind waivers to a diff fingerprint + path scope.
- No expiry: exceptions persist forever.
- Mitigation: enforce expiration at the harness level; reject expired waivers.
- Ledger ignored: data exists but is not acted on.
- Mitigation: dashboards or scheduled reviews; treat chronic waivers as governance debt.
When not to use
- Teams unwilling to enforce approvals and expiration (the ledger will become noise).
- Situations where exceptions are unacceptable by policy (for example, regulated releases requiring specific attestations).
- Very small projects where a simple “blocked” outcome is sufficient and waivers would add bureaucracy.