Databricks Waste Detection & Optimization Insights
Waste detection finds spend that likely produced little value or can be reduced safely. Optimization insights identify configuration changes that may reduce cost, improve reliability, or reduce operational risk.
Waste types
Section titled “Waste types”| Type | What it means |
|---|---|
| Idle cluster | Cluster ran with little or no useful activity. |
| Idle warehouse | SQL warehouse clusters were running without queries. |
| Long-running cluster | Cluster ran for an unusually large portion of the observation window. |
| Failing queries | Failed SQL still consumed DBUs. |
| Long-running queries | Query runtime is high enough to deserve review. |
| Runaway job | A job run lasted far beyond its baseline. |
| Retry storm | Repeated failures or retries consumed avoidable compute. |
| Overprovisioned cluster | Utilization is low for the configured workers. |
| Long auto-termination | Idle timeout is much longer than recommended. |
| Unused table | Storage cost with no recent access signal. |
| Fat driver | Driver sizing appears excessive for observed usage. |
| Zombie model | Serving endpoint appears unused but still costs money. |
| Weekend waste | Non-production-like activity ran during weekends. |
| Poor pruning / Scanzilla | Queries scan much more data than expected. |
Optimization types
Section titled “Optimization types”Cluster and workload hygiene detectors include spot instance candidates, single-node candidates, excess workers, outdated runtimes, missing job timeouts, Photon candidates, interactive cluster misuse, pool candidates, instance-type mismatch, first-on-demand guardrails, pipeline dev mode, pipeline preview channel, and multi-cluster jobs.
Storage optimization detectors include poor read/write ratio, duplicate datasets, excessive retention, non-partitioned tables, and unoptimized storage.
How savings are estimated
Section titled “How savings are estimated”Savings estimates are conservative approximations derived from recent observed cost and the recommended change. Examples:
- Reducing idle runtime estimates savings from idle minutes/hours and recent cost rate.
- Worker reduction estimates savings from the difference between current and recommended workers.
- Single-node conversion estimates savings from worker cost that may be removed.
- Storage opportunities use observed storage or predictive-optimization cost where available.
Treat estimates as prioritization guidance, not an invoice guarantee.
Activity filtering
Section titled “Activity filtering”Detectors require enough recent observations to avoid false positives. For example, cluster hygiene uses node timeline utilization and runtime windows; query hygiene uses query history; warehouse hygiene checks warehouse events and query activity.
From insight to action
Section titled “From insight to action”Some waste insights generate action plans. Review the evidence, safety tier, and proposed change before approving execution. If a finding is expected or intentionally accepted, dismiss it so future reviews focus on new problems.