Skip to content

What is LakeSentry for Databricks Cost Optimization

LakeSentry is a Databricks cost investigation and workload optimization platform. It helps platform teams answer “why did our bill spike?” and safely reduce waste — without risking production stability.

LakeSentry connects to your Databricks account, ingests data from system tables, and transforms it into an understandable cost model. From there, it surfaces insights about waste and anomalies, and can execute supported optimization actions after approval.

CapabilityWhat it means
Cost visibilitySee where money goes — by workspace, team, job, warehouse, or SKU
AttributionConnect costs to owners using rules, tags, and identity mapping
InvestigationDrill down from an anomaly to root cause in a few clicks
ActionsReview optimization plans and approve supported remediations such as terminating idle clusters or canceling runaway runs

Finance asks about a $50K increase. Your team manually queries system tables, cross-references billing exports, and pieces together the story.

With LakeSentry, you get time-range comparison, cost breakdown by dimension, and anomaly detection — answering the question in minutes instead of hours.

Shared clusters, cross-team jobs, no clear ownership. Chargeback reports are guesswork.

LakeSentry provides attribution rules with confidence tiers (exact, strong, estimated, unattributed). It’s transparent about what it can and can’t attribute, so your chargeback numbers hold up under scrutiny.

”We’re wasting money on idle resources”

Section titled “”We’re wasting money on idle resources””

Clusters running 24/7 for jobs that run once a day. Warehouses oversized for actual query load.

LakeSentry detects waste and suggests actions with estimated savings. You review what would be saved before approving execution.

”I don’t trust automation with production infrastructure”

Section titled “”I don’t trust automation with production infrastructure””

Previous automation tools caused outages or unexpected behavior.

LakeSentry runs read-only by default. All optimization actions currently require explicit approval before execution.

You manage Databricks infrastructure and own the bill. You need forensic investigation tools that help you trace cost back to specific jobs, clusters, and users — not executive summary charts.

You handle chargeback and showback reporting. You need attribution you can trust and rules you can configure, not opaque algorithms you can’t explain to stakeholders.

You run training jobs, experiments, and ML pipelines. You need visibility into compute spend per experiment and serving endpoint so you can optimize within your budget.

LakeSentry follows a three-step flow:

  1. Connect — Add a read-only service principal and connect your Databricks account. Takes minutes, not days.
  2. Collect — LakeSentry ingests system tables on a schedule to build a normalized cost ledger.
  3. Act safely — Review insights and approve supported changes before execution.

For the detailed setup process, see the Quick Start Guide.

LakeSentry is built around a few key design decisions:

  • Conservative attribution — LakeSentry shows “unattributed” rather than guessing wrong. Confidence tiers (exact, strong, estimated, unattributed) tell you how much to trust each number.
  • Approval-first operations — All actions require manual approval before execution. Risk tiers make the intended handling explicit.
  • Financial forensics, not real-time ops — Designed for “why did this happen?” rather than “what’s happening right now?” Time-range selectors, drill-down paths, and historical trends.
  • Low noise, high signal — Significance scoring instead of alert storms. Every insight is worth reading.

LakeSentry reads from Databricks system tables — billing, compute, jobs, queries, serving, and access metadata. It never accesses your business data, notebooks, or query results (beyond query text for insight quality in the current version).

Data sourceWhat it provides
system.billing.*Billable usage and list prices
system.compute.*Cluster and warehouse configuration and utilization
system.lakeflow.*Job and pipeline definitions and run history
system.query.historySQL statements on warehouses and serverless
system.serving.*Model serving endpoints and usage
system.access.*Workspace metadata, lineage, and network events

LakeSentry has three plans — Free, Standard, and Pro. Flat pricing, no per-DBU tax, no per-workspace fees. The Free plan includes unlimited workspaces, one user, and three months of history. See LakeSentry pricing for current rates or sign up for free to start monitoring your Databricks spend.