Databricks Cost Attribution & Confidence Tiers

LakeSentry’s attribution model answers two questions:

Who is accountable? Org unit → department → team.
Why did it happen? Optional project or shared bucket context.

The system favors explainability over forced allocation. Spend that cannot be attributed by a defensible path remains workspace/unattributed.

Attribution inputs

LakeSentry combines:

Billing usage and list prices from Databricks system tables.
Workload metadata for jobs, pipelines, clusters, SQL warehouses, model serving, MLflow, and storage operations.
User and service-principal identities.
Databricks tags.
User/team mappings and resource ownership.
Admin-defined attribution rules.

Evaluation flow

For each billing record, attribution runs in this order:

Session-based attribution for eligible shared compute. SQL Serverless warehouse and all-purpose cluster records use precomputed session allocations when query-history or audit activity is available. Warehouse sessions split by query duration; all-purpose sessions split by command count. A gap greater than 2 hours starts a new session.
Proportional overhead rules. Platform overhead categories such as networking, database, predictive optimization, and other overhead can be distributed based on team compute spend. If no proportional rule matches, the record stays visible as overhead instead of being hidden.
Exact and pattern rules. Non-session, non-overhead records evaluate active rules in priority order. Exact rules match a specific resource type and ID. Pattern rules match names/IDs, tags, principal domains, and optional resource type with AND logic.
Fallback waterfall. Shared-resource owner, mapped user, mapped resource owner, known user without team, then workspace/unattributed.

Rule priority is ascending: lower priority numbers run first. Ties are ordered by the backend query, with workspace-specific rules favored over global pattern/proportional rules.

Attribution paths

Path	Meaning
rule	An exact or pattern attribution rule matched.
session	Shared usage was split using session/query/audit signals.
overhead	A proportional overhead rule distributed spend.
shared	The resource is shared infrastructure.
user	A mapped user principal owns the spend.
resource	The resource owner maps to a team.
workspace	No reliable team was found; spend stays unattributed/workspace-level.

Confidence tiers

Confidence communicates how much trust to place in an allocation:

Exact: explicit exact rule or unambiguous resource ownership.
Strong: stable pattern rule, tag rule, or mapped identity with clear ownership.
Estimated: proportional overhead or session-based split from activity signals.
Unattributed: LakeSentry could not assign the line to a team.

Rule types

Exact rules

Use exact rules for known high-cost resources, for example a production warehouse or a long-lived shared cluster.

Pattern rules

Use pattern rules when names, tags, or principal domains reliably identify ownership. All configured criteria must match.

Proportional rules

Use proportional rules for platform overhead that cannot fairly belong to one team. The current distribution basis is compute spend.

Attribution modes

Direct: 100% to one team and optional project. Direct rules may also mark the resource as shared and set a shared bucket.
Split: percentage allocation across multiple teams; percentages must total 100%.
Proportional: allocate overhead across teams from their compute-spend share, optionally excluding selected teams.

Improving attribution quality

Map high-spend users and service principals first.
Require Databricks tags for production resources.
Add exact rules for persistent shared resources.
Add pattern rules only for conventions you trust.
Review unattributed spend after every new connector backfill.