Skip to content

Databricks Account and Connector Setup

Databricks connectors are the bridge between LakeSentry and your Databricks account. Each connector authenticates with OAuth M2M service-principal credentials or a PAT and provides access to billing data, compute metadata, and workload history through Databricks system tables.

This page covers the current self-service setup process from creating credentials to verifying connectivity. Direct Connection is the default self-service mode in Settings → Connector. External Connector/collector deployments are controlled deployments and are marked as coming soon in the setup UI.

Before creating a Databricks connector, ensure you have:

  • Databricks account admin access (to create service principals and grant permissions)
  • Unity Catalog enabled on your Databricks account (required for system table access)
  • At least one workspace URL per region you want LakeSentry to monitor
  • A SQL warehouse the LakeSentry service principal can use
  • Your Databricks account ID (found in the account console URL or settings page)

LakeSentry authenticates using OAuth machine-to-machine (M2M) via a Databricks service principal.

  1. Go to your Databricks account console.
  2. Navigate to User Management > Service Principals.
  3. Click Add Service Principal and give it a descriptive name (e.g., lakesentry-reader).
  4. Under OAuth, generate an OAuth secret. Copy both the Client ID and Secret.

The service principal needs SQL warehouse access plus SELECT access to the system tables LakeSentry ingests. Run these SQL statements in a workspace with Unity Catalog enabled and replace lakesentry-reader with your service-principal name:

-- Allow LakeSentry to run SQL statements through the chosen warehouse.
-- The exact securable name depends on how your Databricks workspace names warehouses.
GRANT CAN USE ON SQL WAREHOUSE `<warehouse-name>` TO `lakesentry-reader`;
-- Grant access to billing tables (account-level)
GRANT USE CATALOG ON CATALOG system TO `lakesentry-reader`;
GRANT USE SCHEMA ON SCHEMA system.billing TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.billing.usage TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.billing.list_prices TO `lakesentry-reader`;
-- Grant access to compute tables (regional)
GRANT USE SCHEMA ON SCHEMA system.compute TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.clusters TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.node_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.node_types TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.warehouses TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.warehouse_events TO `lakesentry-reader`;
-- Grant access to job/pipeline tables (regional)
GRANT USE SCHEMA ON SCHEMA system.lakeflow TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.jobs TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_tasks TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_run_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_task_run_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.pipelines TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.pipeline_update_timeline TO `lakesentry-reader`;
-- Grant access to query history (regional)
GRANT USE SCHEMA ON SCHEMA system.query TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.query.history TO `lakesentry-reader`;
-- Grant access to workspace metadata
GRANT USE SCHEMA ON SCHEMA system.access TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.access.workspaces_latest TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.access.table_lineage TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.access.clean_room_events TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.access.assistant_events TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.access.inbound_network TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.access.outbound_network TO `lakesentry-reader`;
-- Grant access to table metadata
GRANT USE SCHEMA ON SCHEMA system.information_schema TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.information_schema.tables TO `lakesentry-reader`;
-- Grant access to model serving and storage metadata
GRANT USE SCHEMA ON SCHEMA system.serving TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.serving.served_entities TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.serving.endpoint_usage TO `lakesentry-reader`;
GRANT USE SCHEMA ON SCHEMA system.storage TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.storage.predictive_optimization_operations_history TO `lakesentry-reader`;

Some LakeSentry pages depend on tables that may not exist in every Databricks deployment or are not yet part of the default direct-extraction registry.

-- MLflow tracking is planned for direct extraction. Grant only if LakeSentry support
-- has enabled MLflow ingestion for your tenant.
GRANT USE SCHEMA ON SCHEMA system.mlflow TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.mlflow.experiments_latest TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.mlflow.runs_latest TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.mlflow.run_metrics_history TO `lakesentry-reader`;

The Audit Log page shows LakeSentry’s own internal audit trail. It does not require Databricks system.access.audit, which is not part of the current active extraction registry.

  1. In LakeSentry, go to Settings > Connector.
  2. Click Connect to Databricks or Add Connector.
  3. Choose Direct Connection.
  4. Fill in the required fields:
FieldDescription
Workspace URLThe URL of any Databricks workspace in your account (e.g., https://adb-1234567890123456.7.azuredatabricks.net). Cloud provider is auto-detected from the URL.
OAuth Client IDThe client ID from the service principal you created.
OAuth SecretThe secret you saved in Step 1.
PATOptional alternative if you are using token authentication instead of OAuth M2M.
  1. Click Validate Credentials. LakeSentry validates credentials and Databricks access.
  2. Once validated, save the connector. The connector status shows as active once extraction succeeds.

The connection test checks:

  • OAuth credentials are valid and not expired
  • The service principal can list SQL warehouses (workspace-level API access)
  • At least one SQL warehouse exists in the workspace
  • The service principal can SELECT from system tables (probed automatically)

If the test fails, verify that the service principal has workspace-level access, at least one SQL warehouse exists, and the OAuth secret hasn’t expired.

If you operate Databricks in more than one region, add a separate connector for each additional region. Databricks system tables are regional, so a workspace in one region cannot provide complete metadata for another region.

  1. On Settings → Connector, click Add Connector.
  2. Select the region (e.g., eastus, westeurope, us-west-2).
  3. Enter a workspace URL from that region (e.g., https://adb-1234567890123456.7.azuredatabricks.net).
  4. Click Save.

For detailed information on multi-region configuration, see Region Connectors.

For Direct Connection, LakeSentry extracts data on the Data Sync schedule shown in Settings → Connector.

ScheduleUse when
Daily at 8 AM UTCDefault baseline for cost reporting.
Every 4 hoursYou want fresher dashboards without hourly extraction.
Every hourYou need the freshest supported direct-sync cadence.
PausedYou need to stop extraction temporarily.

You can also trigger an immediate sync, cancel a running sync, or reset checkpoints from the connector detail page.

After the first sync completes, check Settings → Connector:

IndicatorHealthy state
Connector statusActive or synced
Last syncShows a recent timestamp
Extraction runsRecent run succeeded
Tables extractedLists successfully extracted system tables

If the status stays pending after the first sync, see Data Freshness and Common Issues for common causes.

  • LakeSentry connects via a read-only service principal for reporting and detection. Databricks write permissions are only needed for approved executable actions.
  • The service principal accesses system tables only — billing, compute, job, and query metadata. It never touches your business data, notebooks, or query results.
  • Direct Connection uses the credentials you provide for Databricks validation and extraction. External Connector deployments use collector tokens that are hashed server-side.
  • All data transfer happens over HTTPS.

To disconnect LakeSentry from your Databricks account:

  1. Stop extraction — Pause Direct Connection Data Sync. If you use External Connector, disable or delete the Databricks jobs running the collector in each region.
  2. Delete connectors — Remove each connector from Settings → Connector.
  3. Revoke the service principal — In the Databricks account console, delete the service principal or rotate its OAuth secret.