How it works

Most destinations are configured by first creating a special purpose user to perform the writing in the data warehouse, whitelisting a Prequel IP, and adding the database details and credentials to Prequel. :link:

List of Destinations

VendorTypeStatusDocsDocs (.md file)
snowflakeOLAPGAlinklink
bigqueryOLAPGAlinklink
redshiftOLAPGAlinklink
databricksOLAPGAlinklink
athenaOLAPGAlinklink
clickhouseOLAPGAlinklink
postgresOLTPGAlinklink
aurora_postgresOLTPGAlinklink
mysqlOLTPGAlinklink
aurora_mysqlOLTPGAlinklink
sql_serverOLTPBetalink
singlestoreOLTPBeta
s3Object StorageGAlinklink
s3_compatibleObject StorageGAlinklink
gcsObject StorageGAlinklink
absObject StorageGAlinklink
google_sheetsSpreadsheetGAlinklink

Other available guides

GuideTypeStatusDocsDocs (.md file)
S3 Staging BucketStaging ResourceGAlinklink
Google Cloud Storage Staging BucketStaging ResourceGAlinklink
Azure Blob Storage Staging BucketStaging ResourceGAlinklink

📘

You should know

You can use your discretion to decide what documentation to provide to your users that wish to connect their destination. To avoid confusion, we recommend working with your user to determine which database or data warehouse destination type they wish to connect, and then sending over the subset of documentation for them to work with.

Format of landed data

Data warehouses & databases (incl. Snowflake, BigQuery, Redshift, Databricks)

Data transferred to data warehouses and relational databases will be loaded as properly typed tables within a single schema.

For destinations other than BigQuery, a special _transfer_status table will be loaded in the created schema to record transfer metadata, namely, a transfer_last_updated_at timestamp for each table. In BigQuery, the last_updated timestamp for a table is already accessible in the __TABLES_SUMMARY__ metatable.

Object storage (incl. AWS S3, Google Cloud Storage, Azure Blob Storage)

Data transferred to object storage destinations will be loaded as Apache Parquet files in Apache Hive style partitions. This means data will appear in the following folder structure:

<bucket_name>/<folder_name>/<model_name>/dt=<transfer_date>/<file_part>_<transfer_timestamp>.parquet

Where:

  • <bucket_name> and <folder_name> are provided during destination configuration.
  • <model_name> is the name of the data model being transferred (this is equivalent to a table name in relational data destinations).
  • <transfer_date> and <transfer_timestamp> are generated at transfer time and based on the transfer's start time. <transfer_date> is of the form 2006-01-01, while <transfer_timestamp> is of the form 20060102150405.
  • <file_part> is a monotonically increasing integer for a given timestamp, and does not carry any special meaning.

📘

What are Apache Hive style partitions and Apache Parquet file format?

  • Apache Hive style partitions are compatible with most popular query engines, and should make data easily queryable and transportable.
  • Apache Parquet file format is an open source, column-oriented data file format that offers efficient data compression and data integrity.

Spreadsheets (incl. Google Sheets)

Data transferred to spreadsheet destinations will be loaded as a newly created tab per data model. Where possible, the tabs will be created as protected tabs (or "read-only") to prevent accidental modification.

Hosting documentation for your customers

You may decide to self-host the destination configuration instructions on your own documentation site. If you prefer to do that, we maintain a copy of the source markdown files at the following public locations:

Supplemental documentation

Some destination documentation guides reference additional or optional documentation. These should be provided as well, and are hosted here:

GuideDocumentation
Staging bucket: AWS S3Coming soon
Staging bucket: GCPComing soon
Staging bucket: AzureComing soon
SSH tunnel: AWSComing soon
SSH tunnel: GCPComing soon