Databricks Model Serving Cost Monitoring
Model Serving covers Databricks serving endpoints and served entities when system.serving data is available.
Overview
Section titled “Overview”Shows endpoint count, request volume, latency, and cost, plus endpoint trends and requester breakdowns.
Efficiency
Section titled “Efficiency”Highlights serving-related waste such as zombie models or endpoints with low/no recent inference activity but ongoing cost.
Endpoint list
Section titled “Endpoint list”Rows show endpoint or served entity, workspace, requester/owner signals, request volume, latency, token or usage metrics where available, and cost.
Filters
Section titled “Filters”Use global filters for time range, workspace, organization, tags, and cost mode. Page filters narrow endpoint status and activity.
Waste detection
Section titled “Waste detection”A Zombie Model insight means a serving endpoint or model appears to be deployed with little or no recent inference activity. Review endpoint ownership and business context before shutting it down.