Databricks Cost Anomaly Detection
LakeSentry uses statistical and rule-based detectors to surface changes that deserve investigation. Anomalies are not automatic proof of waste; they are a starting point for cost forensics.
Detected anomaly types
Section titled “Detected anomaly types”- Cost spike: spend is unusually high compared with baseline.
- Duration anomaly: workload runtime is unusually high.
- Failure-rate spike: failed runs or queries increased materially.
- Warehouse spend spike: SQL warehouse spend moved outside baseline.
- Serving spike: model serving request/cost patterns changed unexpectedly.
- Budget risk: projected spend is on track to exceed a budget.
- Attribution declining: the share of attributed spend dropped and more cost is now unattributed.
Detection methods
Section titled “Detection methods”LakeSentry combines statistical and threshold detectors. For time-series detectors it uses a Z-score:
z = (current_value - baseline_mean) / baseline_standard_deviationCurrent backend thresholds include:
| Detector | Main trigger |
|---|---|
| Cost spike | Recent spend over the last 48 hours has z-score ≥ 2.0 or is at least 2.5× baseline, with baseline average at least $10 and delta at least $50. |
| Duration anomaly | Runtime z-score ≥ 2.5 with at least 5 baseline runs. |
| Failure-rate spike | Failure rate increased by at least 15 percentage points. |
| Warehouse spend spike | 30-day warehouse velocity increase, minimum $1,000 delta. |
| Serving spike | Last 7 days compared with the previous 7 days, minimum $500 delta. |
| Budget risk | Projected spend exceeds the budget by at least 10%. |
| Attribution declining | Unattributed share increased by at least 5 percentage points week over week. |
Severity is detector-specific. Cost spikes become high/critical when z-score or multiplier crosses higher thresholds. Budget risk becomes high above 25% projected overage and critical above 50%.
Baselines and safeguards
Section titled “Baselines and safeguards”- At least a small number of data points is required before a Z-score is meaningful.
- Constant data has zero standard deviation; LakeSentry does not flag those points as anomalous.
- Cost and runtime anomalies are interpreted with supporting evidence such as resource, owner, and time range.
- Budget risk uses projection against the budget period, not only Z-score movement.
Investigating an anomaly
Section titled “Investigating an anomaly”- Open the insight detail page and review evidence.
- Use Cost Changes to see the period-over-period driver.
- Drill into Cost Explorer by workspace, team, principal, compute type, or table.
- Open the related workload, cluster, SQL warehouse, or serving page when linked.
- Resolve or dismiss the insight after confirming the cause.