Skip to content

Databricks Model Serving Cost Monitoring

Model Serving covers Databricks serving endpoints and served entities when system.serving data is available.

Shows endpoint count, request volume, latency, and cost, plus endpoint trends and requester breakdowns.

Highlights serving-related waste such as zombie models or endpoints with low/no recent inference activity but ongoing cost.

Rows show endpoint or served entity, workspace, requester/owner signals, request volume, latency, token or usage metrics where available, and cost.

Use global filters for time range, workspace, organization, tags, and cost mode. Page filters narrow endpoint status and activity.

A Zombie Model insight means a serving endpoint or model appears to be deployed with little or no recent inference activity. Review endpoint ownership and business context before shutting it down.