Data platforms often drift into ‘always-on’ mode: warehouses running 24/7, inefficient queries, duplicated storage, and pipelines that recompute everything. The fix isn’t a one-time cleanup — it’s building cost visibility and feedback loops.
Start by attributing cost
- Tag jobs by pipeline/dataset.
- Track warehouse spend by query/job.
- Measure storage growth by table and retention policy.
Common cost offenders
- Full refresh transformations when incremental is possible
- Exploding joins (bad keys / wrong grain)
- Unbounded backfills running at peak hours
- Lack of partition pruning
- Storing multiple copies of the same derived dataset
Controls that don’t hurt reliability
- Budgets + alerts per environment (dev/stage/prod).
- Concurrency limits to avoid warehouse queue pressure.
- Schedule heavy backfills off-peak.
- Retention policies for intermediate tables.
Incremental by default
The biggest cost win is to stop recomputing history. Partition outputs, process only new/changed data, and validate with targeted reconciliation checks.
“FinOps is not a cost-cutting project. It’s an operating model: measure → decide → enforce → learn.”
