Databricks Cost Insights & Optimization Actions
Insights is LakeSentry’s queue of detected cost problems and optimization opportunities. It combines anomaly detection, workload hygiene checks, and waste detectors into a single review flow.
Active
Section titled “Active”Open findings that need review. Filters include category, severity, status, insight type, storage/resource view, resource type, owner, and workspace.
History
Section titled “History”Resolved, dismissed, or stale findings. History is useful for proving that waste was addressed and for reviewing recurring patterns.
Insight categories
Section titled “Insight categories”| Category | Examples |
|---|---|
| Waste | Idle clusters, idle warehouses, long-running clusters, failing or long-running queries, runaway jobs, retry storms, unused tables, zombie models, weekend waste, poor pruning, very large scans |
| Anomaly | Cost spikes, duration anomalies, failure-rate spikes, warehouse spend spikes, serving spikes, budget risk, declining attribution quality |
| Optimization | Long auto-termination, warehouse auto-stop, spot candidates, single-node candidates, overprovisioned workers, outdated runtimes, missing timeouts, Photon candidates, pool candidates, pipeline dev/preview mode, storage layout issues |
Insight detail
Section titled “Insight detail”Each insight has a detail page with:
- Summary: title, severity, category, potential savings, and confidence.
- Evidence: type-specific fields such as z-score, CPU utilization, idle minutes, wasted cost, or recommended worker count.
- Subject: the related work unit, cluster, warehouse, table, or principal when available.
- Timeline: status changes, generated action plans, approvals, dismissals, and execution outcomes.
Staleness and lifecycle
Section titled “Staleness and lifecycle”Insights can be active, resolved, dismissed, or stale. Background maintenance automatically resolves stale anomalies after about 7 days and stale waste after about 3 days when the condition no longer appears. Some resource-specific insights also resolve when the subject disappears or becomes inactive.
Actions and approvals
Section titled “Actions and approvals”LakeSentry generates action plans for eligible insights. Current plan types include terminating idle clusters, canceling runaway runs, pausing schedules, scaling or resizing clusters, converting to single-node, and upgrading Databricks Runtime.
Action plans are approval-first. Review the evidence, expected savings, and blast radius before approving any change.
The action executor currently performs Databricks mutations for terminating idle clusters and canceling runaway runs. Other plan types are persisted with evidence and proposed changes for operator review until direct execution support is added.