Databricks Cost Attribution & Confidence Tiers
LakeSentry’s attribution model answers two questions:
- Who is accountable? Org unit → department → team.
- Why did it happen? Optional project or shared bucket context.
The system favors explainability over forced allocation. Spend that cannot be attributed by a defensible path remains workspace/unattributed.
Attribution inputs
Section titled “Attribution inputs”LakeSentry combines:
- Billing usage and list prices from Databricks system tables.
- Workload metadata for jobs, pipelines, clusters, SQL warehouses, model serving, MLflow, and storage operations.
- User and service-principal identities.
- Databricks tags.
- User/team mappings and resource ownership.
- Admin-defined attribution rules.
Evaluation flow
Section titled “Evaluation flow”For each billing record, attribution runs in this order:
- Session-based attribution for eligible shared compute. SQL Serverless warehouse and all-purpose cluster records use precomputed session allocations when query-history or audit activity is available. Warehouse sessions split by query duration; all-purpose sessions split by command count. A gap greater than 2 hours starts a new session.
- Proportional overhead rules. Platform overhead categories such as networking, database, predictive optimization, and other overhead can be distributed based on team compute spend. If no proportional rule matches, the record stays visible as overhead instead of being hidden.
- Exact and pattern rules. Non-session, non-overhead records evaluate active rules in priority order. Exact rules match a specific resource type and ID. Pattern rules match names/IDs, tags, principal domains, and optional resource type with AND logic.
- Fallback waterfall. Shared-resource owner, mapped user, mapped resource owner, known user without team, then workspace/unattributed.
Rule priority is ascending: lower priority numbers run first. Ties are ordered by the backend query, with workspace-specific rules favored over global pattern/proportional rules.
Attribution paths
Section titled “Attribution paths”| Path | Meaning |
|---|---|
| rule | An exact or pattern attribution rule matched. |
| session | Shared usage was split using session/query/audit signals. |
| overhead | A proportional overhead rule distributed spend. |
| shared | The resource is shared infrastructure. |
| user | A mapped user principal owns the spend. |
| resource | The resource owner maps to a team. |
| workspace | No reliable team was found; spend stays unattributed/workspace-level. |
Confidence tiers
Section titled “Confidence tiers”Confidence communicates how much trust to place in an allocation:
- Exact: explicit exact rule or unambiguous resource ownership.
- Strong: stable pattern rule, tag rule, or mapped identity with clear ownership.
- Estimated: proportional overhead or session-based split from activity signals.
- Unattributed: LakeSentry could not assign the line to a team.
Rule types
Section titled “Rule types”Exact rules
Section titled “Exact rules”Use exact rules for known high-cost resources, for example a production warehouse or a long-lived shared cluster.
Pattern rules
Section titled “Pattern rules”Use pattern rules when names, tags, or principal domains reliably identify ownership. All configured criteria must match.
Proportional rules
Section titled “Proportional rules”Use proportional rules for platform overhead that cannot fairly belong to one team. The current distribution basis is compute spend.
Attribution modes
Section titled “Attribution modes”- Direct: 100% to one team and optional project. Direct rules may also mark the resource as shared and set a shared bucket.
- Split: percentage allocation across multiple teams; percentages must total 100%.
- Proportional: allocate overhead across teams from their compute-spend share, optionally excluding selected teams.
Improving attribution quality
Section titled “Improving attribution quality”- Map high-spend users and service principals first.
- Require Databricks tags for production resources.
- Add exact rules for persistent shared resources.
- Add pattern rules only for conventions you trust.
- Review unattributed spend after every new connector backfill.