What is LakeSentry for Databricks Cost Optimization
LakeSentry is a Databricks cost investigation and workload optimization platform. It helps platform teams answer “why did our bill spike?” and safely reduce waste — without risking production stability.
What LakeSentry does
Section titled “What LakeSentry does”LakeSentry connects to your Databricks account, ingests data from system tables, and transforms it into an understandable cost model. From there, it surfaces insights about waste and anomalies, and can execute supported optimization actions after approval.
| Capability | What it means |
|---|---|
| Cost visibility | See where money goes — by workspace, team, job, warehouse, or SKU |
| Attribution | Connect costs to owners using rules, tags, and identity mapping |
| Investigation | Drill down from an anomaly to root cause in a few clicks |
| Actions | Review optimization plans and approve supported remediations such as terminating idle clusters or canceling runaway runs |
Problems LakeSentry solves
Section titled “Problems LakeSentry solves””Why did our bill spike?”
Section titled “”Why did our bill spike?””Finance asks about a $50K increase. Your team manually queries system tables, cross-references billing exports, and pieces together the story.
With LakeSentry, you get time-range comparison, cost breakdown by dimension, and anomaly detection — answering the question in minutes instead of hours.
”Who should pay for this?”
Section titled “”Who should pay for this?””Shared clusters, cross-team jobs, no clear ownership. Chargeback reports are guesswork.
LakeSentry provides attribution rules with confidence tiers (exact, strong, estimated, unattributed). It’s transparent about what it can and can’t attribute, so your chargeback numbers hold up under scrutiny.
”We’re wasting money on idle resources”
Section titled “”We’re wasting money on idle resources””Clusters running 24/7 for jobs that run once a day. Warehouses oversized for actual query load.
LakeSentry detects waste and suggests actions with estimated savings. You review what would be saved before approving execution.
”I don’t trust automation with production infrastructure”
Section titled “”I don’t trust automation with production infrastructure””Previous automation tools caused outages or unexpected behavior.
LakeSentry runs read-only by default. All optimization actions currently require explicit approval before execution.
Who LakeSentry is for
Section titled “Who LakeSentry is for”Platform and DataOps engineers
Section titled “Platform and DataOps engineers”You manage Databricks infrastructure and own the bill. You need forensic investigation tools that help you trace cost back to specific jobs, clusters, and users — not executive summary charts.
FinOps teams
Section titled “FinOps teams”You handle chargeback and showback reporting. You need attribution you can trust and rules you can configure, not opaque algorithms you can’t explain to stakeholders.
Data and ML teams
Section titled “Data and ML teams”You run training jobs, experiments, and ML pipelines. You need visibility into compute spend per experiment and serving endpoint so you can optimize within your budget.
How it works
Section titled “How it works”LakeSentry follows a three-step flow:
- Connect — Add a read-only service principal and connect your Databricks account. Takes minutes, not days.
- Collect — LakeSentry ingests system tables on a schedule to build a normalized cost ledger.
- Act safely — Review insights and approve supported changes before execution.
For the detailed setup process, see the Quick Start Guide.
Core principles
Section titled “Core principles”LakeSentry is built around a few key design decisions:
- Conservative attribution — LakeSentry shows “unattributed” rather than guessing wrong. Confidence tiers (exact, strong, estimated, unattributed) tell you how much to trust each number.
- Approval-first operations — All actions require manual approval before execution. Risk tiers make the intended handling explicit.
- Financial forensics, not real-time ops — Designed for “why did this happen?” rather than “what’s happening right now?” Time-range selectors, drill-down paths, and historical trends.
- Low noise, high signal — Significance scoring instead of alert storms. Every insight is worth reading.
What LakeSentry connects to
Section titled “What LakeSentry connects to”LakeSentry reads from Databricks system tables — billing, compute, jobs, queries, serving, and access metadata. It never accesses your business data, notebooks, or query results (beyond query text for insight quality in the current version).
| Data source | What it provides |
|---|---|
system.billing.* | Billable usage and list prices |
system.compute.* | Cluster and warehouse configuration and utilization |
system.lakeflow.* | Job and pipeline definitions and run history |
system.query.history | SQL statements on warehouses and serverless |
system.serving.* | Model serving endpoints and usage |
system.access.* | Workspace metadata, lineage, and network events |
Try LakeSentry
Section titled “Try LakeSentry”LakeSentry has three plans — Free, Standard, and Pro. Flat pricing, no per-DBU tax, no per-workspace fees. The Free plan includes unlimited workspaces, one user, and three months of history. See LakeSentry pricing for current rates or sign up for free to start monitoring your Databricks spend.
Next steps
Section titled “Next steps”- Quick Start Guide — Get from signup to your first cost investigation
- How LakeSentry Works — Understand the data pipeline architecture
- Cost Attribution — Learn how costs are assigned to teams and owners