Skip to content

Databricks Waste Detection & Optimization Insights

Waste detection finds spend that likely produced little value or can be reduced safely. Optimization insights identify configuration changes that may reduce cost, improve reliability, or reduce operational risk.

TypeWhat it means
Idle clusterCluster ran with little or no useful activity.
Idle warehouseSQL warehouse clusters were running without queries.
Long-running clusterCluster ran for an unusually large portion of the observation window.
Failing queriesFailed SQL still consumed DBUs.
Long-running queriesQuery runtime is high enough to deserve review.
Runaway jobA job run lasted far beyond its baseline.
Retry stormRepeated failures or retries consumed avoidable compute.
Overprovisioned clusterUtilization is low for the configured workers.
Long auto-terminationIdle timeout is much longer than recommended.
Unused tableStorage cost with no recent access signal.
Fat driverDriver sizing appears excessive for observed usage.
Zombie modelServing endpoint appears unused but still costs money.
Weekend wasteNon-production-like activity ran during weekends.
Poor pruning / ScanzillaQueries scan much more data than expected.

Cluster and workload hygiene detectors include spot instance candidates, single-node candidates, excess workers, outdated runtimes, missing job timeouts, Photon candidates, interactive cluster misuse, pool candidates, instance-type mismatch, first-on-demand guardrails, pipeline dev mode, pipeline preview channel, and multi-cluster jobs.

Storage optimization detectors include poor read/write ratio, duplicate datasets, excessive retention, non-partitioned tables, and unoptimized storage.

Savings estimates are conservative approximations derived from recent observed cost and the recommended change. Examples:

  • Reducing idle runtime estimates savings from idle minutes/hours and recent cost rate.
  • Worker reduction estimates savings from the difference between current and recommended workers.
  • Single-node conversion estimates savings from worker cost that may be removed.
  • Storage opportunities use observed storage or predictive-optimization cost where available.

Treat estimates as prioritization guidance, not an invoice guarantee.

Detectors require enough recent observations to avoid false positives. For example, cluster hygiene uses node timeline utilization and runtime windows; query hygiene uses query history; warehouse hygiene checks warehouse events and query activity.

Some waste insights generate action plans. Review the evidence, safety tier, and proposed change before approving execution. If a finding is expected or intentionally accepted, dismiss it so future reviews focus on new problems.