Skip to content

Databricks Cost Anomaly Detection

LakeSentry uses statistical and rule-based detectors to surface changes that deserve investigation. Anomalies are not automatic proof of waste; they are a starting point for cost forensics.

  • Cost spike: spend is unusually high compared with baseline.
  • Duration anomaly: workload runtime is unusually high.
  • Failure-rate spike: failed runs or queries increased materially.
  • Warehouse spend spike: SQL warehouse spend moved outside baseline.
  • Serving spike: model serving request/cost patterns changed unexpectedly.
  • Budget risk: projected spend is on track to exceed a budget.
  • Attribution declining: the share of attributed spend dropped and more cost is now unattributed.

LakeSentry combines statistical and threshold detectors. For time-series detectors it uses a Z-score:

z = (current_value - baseline_mean) / baseline_standard_deviation

Current backend thresholds include:

DetectorMain trigger
Cost spikeRecent spend over the last 48 hours has z-score ≥ 2.0 or is at least 2.5× baseline, with baseline average at least $10 and delta at least $50.
Duration anomalyRuntime z-score ≥ 2.5 with at least 5 baseline runs.
Failure-rate spikeFailure rate increased by at least 15 percentage points.
Warehouse spend spike30-day warehouse velocity increase, minimum $1,000 delta.
Serving spikeLast 7 days compared with the previous 7 days, minimum $500 delta.
Budget riskProjected spend exceeds the budget by at least 10%.
Attribution decliningUnattributed share increased by at least 5 percentage points week over week.

Severity is detector-specific. Cost spikes become high/critical when z-score or multiplier crosses higher thresholds. Budget risk becomes high above 25% projected overage and critical above 50%.

  • At least a small number of data points is required before a Z-score is meaningful.
  • Constant data has zero standard deviation; LakeSentry does not flag those points as anomalous.
  • Cost and runtime anomalies are interpreted with supporting evidence such as resource, owner, and time range.
  • Budget risk uses projection against the budget period, not only Z-score movement.
  1. Open the insight detail page and review evidence.
  2. Use Cost Changes to see the period-over-period driver.
  3. Drill into Cost Explorer by workspace, team, principal, compute type, or table.
  4. Open the related workload, cluster, SQL warehouse, or serving page when linked.
  5. Resolve or dismiss the insight after confirming the cause.