Databricks Data Freshness and Pipeline Status
LakeSentry’s data flows through several stages before appearing in dashboards. Understanding these stages and their expected latency helps you distinguish between normal pipeline lag and actual issues.
The data pipeline
Section titled “The data pipeline”Data moves through four stages, each adding latency:
| Stage | What happens | Typical latency |
|---|---|---|
| 1. Databricks system tables | Databricks writes usage events to system tables | 1 minute – 4 hours (varies by table) |
| 2. Extraction | LakeSentry reads system tables through Direct Connection, or an External Connector pushes data | Depends on schedule (default: daily at 8 AM UTC for Direct Connection) |
| 3. Ingestion & validation | LakeSentry validates, deduplicates, and stores raw data | 1–5 minutes |
| 4. Processing & aggregation | Data is transformed into metrics, cost rollups, and insights | 5–20 minutes |
End-to-end latency from a Databricks event occurring to it appearing in LakeSentry dashboards is typically 20 minutes – 5 hours, depending on the data type and extraction schedule.
Expected freshness by data type
Section titled “Expected freshness by data type”Different data types have different inherent latency at the Databricks level:
| Data type | Databricks system table latency | LakeSentry display latency |
|---|---|---|
| Billing / cost data | 1–4 hours | 1.5–5 hours from the actual usage |
| Cluster events | Near real-time | 20–40 minutes after the next extraction plus processing |
| Query history | Minutes to 1 hour | 20–90 minutes |
| Job run history | Minutes to 1 hour | 20–90 minutes |
| Warehouse events | Minutes to 1 hour | 20–90 minutes |
| Storage metadata | Hours (updated periodically) | 1–5 hours |
Checking pipeline status
Section titled “Checking pipeline status”Connector health indicators
Section titled “Connector health indicators”Go to Settings > Connector to see the health of each connector:
| Indicator | Meaning | Action needed |
|---|---|---|
| Green (Synced) | Data has been received from this connector | None — operating normally |
| Red (Error) | Connector status is “error” or “failed”, or no data in 30+ hours (triggers an email alert to admins) | Investigate — the connector may be broken or misconfigured. For External Connector deployments, see Collector Issues. |
| Grey (Awaiting data) | Connector is configured but no data has been received yet | Wait for the first extraction to complete, trigger a sync, or check the External Connector job. |
Region connector detail
Section titled “Region connector detail”Click a region connector to see detailed status:
- Last sync — Timestamp of the last successful extraction
- Tables extracted — List of system tables LakeSentry is successfully extracting
- Extraction checkpoints — Per-table watermarks showing how far extraction has progressed
- Ingestion history — Recent ingestion events with row counts and durations
Data freshness on dashboards
Section titled “Data freshness on dashboards”Dashboard pages display a “Data as of” indicator showing the most recent data point. If this timestamp seems too old:
- Check the connector health (above).
- Consider the expected latency for the data type you’re viewing.
- If the staleness exceeds expected latency, investigate extraction and the pipeline.
Understanding lag
Section titled “Understanding lag”Normal lag patterns
Section titled “Normal lag patterns”Some lag patterns are expected and do not indicate a problem:
- Morning cost updates — Yesterday’s billing data often finalizes overnight. Expect cost dashboards to update with the previous day’s complete data in the early morning (UTC).
- Weekend/holiday gaps — If compute usage drops on weekends, there may be less new data to display. The pipeline is still running, but the deltas are smaller.
- Post-setup lag — After first connecting Databricks, the initial extraction takes longer than incremental runs. The first dashboards may take 30–60 minutes to populate.
Abnormal lag patterns
Section titled “Abnormal lag patterns”These patterns suggest an issue that needs investigation:
| Pattern | Likely cause | What to check |
|---|---|---|
| One region is fresh, another is stale | The stale region’s extraction is not running or is failing | Check that connector’s Data Sync panel or External Connector job |
| All regions are stale | Extraction issue or LakeSentry pipeline delay | Check connector sync history; if extraction is running, contact support |
| Specific data type is stale | Permission lost for that system table | Check table coverage on the region connector |
| Dashboard shows “No data” for recent dates | Checkpoint issue or Databricks table retention | Check extraction checkpoints |
What to do when data is stale
Section titled “What to do when data is stale”Step 1: Check extraction
Section titled “Step 1: Check extraction”- In LakeSentry, open Settings > Connector and note the “Last ingestion” time.
- If last ingestion is recent (within the expected schedule), extraction is fine — skip to Step 3.
- If last ingestion is stale and you use Direct Connection, review the Data Sync panel, trigger a manual sync, and check extraction errors. If you use an External Connector, check the Databricks job:
- Is the job running? Has it run recently?
- Did the most recent run succeed or fail?
- See Collector Issues for detailed diagnosis.
Step 2: Check for Databricks-side delays
Section titled “Step 2: Check for Databricks-side delays”Databricks system tables sometimes have their own delays, independent of LakeSentry extraction:
- Check the Databricks System Table Freshness dashboard (if available in your account console).
- Query the system table directly to see if recent data exists:
If the max timestamp is hours behind, the delay is at the Databricks level.SELECT MAX(usage_end_time) FROM system.billing.usage;
Step 3: Check LakeSentry processing
Section titled “Step 3: Check LakeSentry processing”If extraction is succeeding but dashboards still appear stale:
- Processing backlog — After large imports (first run or checkpoint reset), the processing pipeline may take longer than usual. This resolves on its own.
- Pipeline error — Rare, but if processing fails on specific data, it can cause a backlog. The connector detail page shows ingestion errors if any exist.
Step 4: Trigger a manual sync
Section titled “Step 4: Trigger a manual sync”If the scheduled extraction hasn’t run recently, you can trigger a one-time sync from LakeSentry:
- Go to Settings > Connector in LakeSentry.
- In the Data Sync panel, click Sync Now to start an immediate extraction.
- Wait for the sync to complete (progress is visible in the panel), then check your dashboards.
Optimizing data freshness
Section titled “Optimizing data freshness”Data Sync schedule tuning
Section titled “Data Sync schedule tuning”The default extraction schedule is Daily at 8 AM UTC. You can adjust the Data Sync cadence in Settings > Connector:
| Schedule | Trade-off |
|---|---|
| Every hour | Most frequent data updates, higher compute cost |
| Every 4 hours | Good balance of freshness and cost |
| Daily at 8 AM UTC (default) | Lower cost, suitable for daily reporting and non-urgent monitoring |
| Paused | No automatic extraction — useful when temporarily disabling a connector |
Multiple regions
Section titled “Multiple regions”Each region has its own connector and schedule. High-priority regions (production workloads) can run more frequently while development regions run less often.
Pipeline metrics
Section titled “Pipeline metrics”LakeSentry tracks internal pipeline metrics that can help diagnose freshness issues:
| Metric | What it shows |
|---|---|
| Extraction duration | How long extraction took |
| Rows extracted | Number of rows pulled in the last extraction |
| Ingestion duration | How long it took to validate and store raw data |
| Processing duration | How long metric computation and aggregation took |
| End-to-end latency | Time from extraction to data appearing in dashboards |
These metrics are visible on the region connector detail page under the “Performance” tab.
Next steps
Section titled “Next steps”- Collector Issues — When an External Connector deployment needs troubleshooting
- Common Issues — Broader troubleshooting for dashboard and access issues
- How LakeSentry Works — Understanding the full data pipeline architecture