Skip to content

Databricks Cost Insights & Optimization Actions

Insights is LakeSentry’s queue of detected cost problems and optimization opportunities. It combines anomaly detection, workload hygiene checks, and waste detectors into a single review flow.

Open findings that need review. Filters include category, severity, status, insight type, storage/resource view, resource type, owner, and workspace.

Resolved, dismissed, or stale findings. History is useful for proving that waste was addressed and for reviewing recurring patterns.

CategoryExamples
WasteIdle clusters, idle warehouses, long-running clusters, failing or long-running queries, runaway jobs, retry storms, unused tables, zombie models, weekend waste, poor pruning, very large scans
AnomalyCost spikes, duration anomalies, failure-rate spikes, warehouse spend spikes, serving spikes, budget risk, declining attribution quality
OptimizationLong auto-termination, warehouse auto-stop, spot candidates, single-node candidates, overprovisioned workers, outdated runtimes, missing timeouts, Photon candidates, pool candidates, pipeline dev/preview mode, storage layout issues

Each insight has a detail page with:

  • Summary: title, severity, category, potential savings, and confidence.
  • Evidence: type-specific fields such as z-score, CPU utilization, idle minutes, wasted cost, or recommended worker count.
  • Subject: the related work unit, cluster, warehouse, table, or principal when available.
  • Timeline: status changes, generated action plans, approvals, dismissals, and execution outcomes.

Insights can be active, resolved, dismissed, or stale. Background maintenance automatically resolves stale anomalies after about 7 days and stale waste after about 3 days when the condition no longer appears. Some resource-specific insights also resolve when the subject disappears or becomes inactive.

LakeSentry generates action plans for eligible insights. Current plan types include terminating idle clusters, canceling runaway runs, pausing schedules, scaling or resizing clusters, converting to single-node, and upgrading Databricks Runtime.

Action plans are approval-first. Review the evidence, expected savings, and blast radius before approving any change.

The action executor currently performs Databricks mutations for terminating idle clusters and canceling runaway runs. Other plan types are persisted with evidence and proposed changes for operator review until direct execution support is added.