Skip to content

Databricks Cost Attribution & Confidence Tiers

LakeSentry’s attribution model answers two questions:

  1. Who is accountable? Org unit → department → team.
  2. Why did it happen? Optional project or shared bucket context.

The system favors explainability over forced allocation. Spend that cannot be attributed by a defensible path remains workspace/unattributed.

LakeSentry combines:

  • Billing usage and list prices from Databricks system tables.
  • Workload metadata for jobs, pipelines, clusters, SQL warehouses, model serving, MLflow, and storage operations.
  • User and service-principal identities.
  • Databricks tags.
  • User/team mappings and resource ownership.
  • Admin-defined attribution rules.

For each billing record, attribution runs in this order:

  1. Session-based attribution for eligible shared compute. SQL Serverless warehouse and all-purpose cluster records use precomputed session allocations when query-history or audit activity is available. Warehouse sessions split by query duration; all-purpose sessions split by command count. A gap greater than 2 hours starts a new session.
  2. Proportional overhead rules. Platform overhead categories such as networking, database, predictive optimization, and other overhead can be distributed based on team compute spend. If no proportional rule matches, the record stays visible as overhead instead of being hidden.
  3. Exact and pattern rules. Non-session, non-overhead records evaluate active rules in priority order. Exact rules match a specific resource type and ID. Pattern rules match names/IDs, tags, principal domains, and optional resource type with AND logic.
  4. Fallback waterfall. Shared-resource owner, mapped user, mapped resource owner, known user without team, then workspace/unattributed.

Rule priority is ascending: lower priority numbers run first. Ties are ordered by the backend query, with workspace-specific rules favored over global pattern/proportional rules.

PathMeaning
ruleAn exact or pattern attribution rule matched.
sessionShared usage was split using session/query/audit signals.
overheadA proportional overhead rule distributed spend.
sharedThe resource is shared infrastructure.
userA mapped user principal owns the spend.
resourceThe resource owner maps to a team.
workspaceNo reliable team was found; spend stays unattributed/workspace-level.

Confidence communicates how much trust to place in an allocation:

  • Exact: explicit exact rule or unambiguous resource ownership.
  • Strong: stable pattern rule, tag rule, or mapped identity with clear ownership.
  • Estimated: proportional overhead or session-based split from activity signals.
  • Unattributed: LakeSentry could not assign the line to a team.

Use exact rules for known high-cost resources, for example a production warehouse or a long-lived shared cluster.

Use pattern rules when names, tags, or principal domains reliably identify ownership. All configured criteria must match.

Use proportional rules for platform overhead that cannot fairly belong to one team. The current distribution basis is compute spend.

  • Direct: 100% to one team and optional project. Direct rules may also mark the resource as shared and set a shared bucket.
  • Split: percentage allocation across multiple teams; percentages must total 100%.
  • Proportional: allocate overhead across teams from their compute-spend share, optionally excluding selected teams.
  1. Map high-spend users and service principals first.
  2. Require Databricks tags for production resources.
  3. Add exact rules for persistent shared resources.
  4. Add pattern rules only for conventions you trust.
  5. Review unattributed spend after every new connector backfill.