Databricks Model Serving Cost Monitoring

Model Serving covers Databricks serving endpoints and served entities when system.serving data is available.

Tabs

Overview

Shows endpoint count, request volume, latency, and cost, plus endpoint trends and requester breakdowns.

Efficiency

Highlights serving-related waste such as zombie models or endpoints with low/no recent inference activity but ongoing cost.

Endpoint list

Rows show endpoint or served entity, workspace, requester/owner signals, request volume, latency, token or usage metrics where available, and cost.

Filters

Use global filters for time range, workspace, organization, tags, and cost mode. Page filters narrow endpoint status and activity.

Waste detection

A Zombie Model insight means a serving endpoint or model appears to be deployed with little or no recent inference activity. Review endpoint ownership and business context before shutting it down.