When a dashboard breaks, the root cause is often upstream: a column renamed, a type changed, a new null introduced, or a filter logic updated. Data contracts are a lightweight way to prevent these surprises by making expectations explicit and enforceable.
What is a data contract?
A data contract is a versioned agreement between a dataset producer and its consumers. It describes what the dataset contains, how it changes, and what guarantees are provided (freshness, completeness, constraints).
The minimum useful contract
- Dataset identity: name, domain, owner, purpose
- Schema: fields, types, nullability
- Keys: unique identifiers and grain
- Constraints: allowed values, ranges, referential integrity
- SLOs: freshness window, expected volume bounds
- Change policy: what counts as breaking, versioning rules
json{ "dataset": "mart.orders", "version": "1.2.0", "owner": "data-platform", "grain": "order_id", "schema": [ { "name": "order_id", "type": "string", "nullable": false }, { "name": "customer_id", "type": "string", "nullable": false }, { "name": "order_total", "type": "decimal(12,2)", "nullable": false } ], "slos": { "freshness_minutes": 30 } }
How to enforce contracts
Enforcement means automated checks in CI/CD and in orchestration runs. The contract is not a PDF — it’s a living spec that blocks unsafe changes.
- Validate schema drift on every deploy (diff schema snapshots).
- Run constraint tests (unique/not-null/accepted values).
- Monitor SLOs (freshness + volume).
- Require version bump and consumer notification for breaking changes.
Versioning: what’s a breaking change?
- Breaking: renaming/removing a column, changing type, changing grain.
- Non-breaking: adding a nullable column, adding a new partition, relaxing a constraint (with care).
- Ambiguous: changing business logic while keeping schema same — treat as breaking for critical metrics.
“Contracts don’t eliminate change. They make change safe.”
Where contracts live
Start simple: store contract specs next to transformations in git, and publish a rendered view (docs portal or catalog). As maturity grows, integrate with a data catalog and automate lineage-based impact analysis.
If you want to adopt data contracts, the key is to begin with your most business-critical datasets and make enforcement part of the standard delivery workflow.
