Databricks

Configuring your Databricks destination.

Prerequisites

  • By default, this Databricks integration makes use of Unity Catalog data governance features. You will need Unity Catalog enabled on your Databricks Workspace.

Step 1: Create a SQL endpoint

Create a new SQL endpoint for data writing.

  1. Log in to the Databricks account.
  2. In the navigation pane, click into the workspace dropdown and select SQL.
  3. In the SQL console, in the SQL navigation pane, click Create and then SQL endpoint.

  1. In the New SQL Endpoint menu, choose a name and configure the options for the new SQL endpoint. Under "Advanced options" turn "Unity Catalog" to the On position, select the Preview channel, and click Create.

Step 2: Configure Access

Collect connection information and create an access token for the data transfer service.

  1. In the SQL Endpoints console, select the SQL endpoint you created in Step 1.

  1. Click the Connection Details tab, and make a note of the Server hostname, Port, and HTTP path.

  1. Click the link to Create a personal access token.

  1. Click Generate New Token.

  1. Name the token with a descriptive comment and assign the token lifetime. A longer lifetime will ensure you do not have to update the token as often. Click Generate.
  2. In the pop up that follows, copy the token and securely save the token.

🚧

Using a Service Principal & Token instead of your Personal Access Token

You may prefer to create a Service Principal to use for authentication instead of using a Personal Access Token. To do so, use the following steps to create a Service Principal and generate an access token.

  1. In your Databricks workspace, click your username in the top right, click Admin Settings, and navigate to the Service Principals tab.
  2. Click the Add service principal button, click Add new in the modal, enter a display name and click Add.
  3. Click on the newly created Service Principal, and under Entitlements select Databricks SQL Access and Workspace Access. Click Update, and make a note of the Application ID of your newly created Service Principal.
  4. In the Workspace Settings tab, in the Access Control section, next to the Personal Access Tokens row, click Permission Settings. Search for and select the Service Principal you created, select the Can use permission, and click Add.
  5. Navigate back to the SQL Warehouses section of your Workspace, click the SQL Warehouses tab, and select the SQL Warehouse you created in Step 1. Click Permissions in the top right, search for and select the Service Principal you created, select the Can use permission, and click Add.
  6. Use your terminal to generate a Service Principal Access Token using your Personal Access Token generated above. Record the token value. This token can now be used as the access token for the connection.
curl --request POST "https://<databricks-account-id>.cloud.databricks.com/api/2.0/token-management/on-behalf-of/tokens" \
--header "Authorization: Bearer <personal-access-token>" \     
--data '{               
  "application_id": "<application-id-of-service-principal>",
  "lifetime_seconds": <token-lifetime-in-seconds-eg-31536000>,
  "comment": "<some-discription-of-this-token>"
}'
  1. In the Databricks UI, select the Catalog tab, and under the target Catalog, and select the target destination schema (e.g., `main.default`). Within the Permissions tab, click Grant. In the following modal, select the principal for which you generated the access token, and select either `ALL PRIVILEGES` or the following 5 privileges: `APPLY TAG`, `CREATE TABLE`, `MODIFY`, `SELECT`, `USE SCHEMA`. Click Grant.

Step 3: Create a staging bucket

Create a staging bucket in one of the following cloud environments. Refer to our documentation on object storage for staging bucket configuration instructions.

  • AWS S3
  • Azure Blob Storage

Step 4: Add your destination

  1. Securely share your server hostname, HTTP path, catalog, your chosen schema name, access token, and staging bucket details with us to complete the connection.