Overview
How it works
Most destinations are configured by first creating a special purpose user to perform the writing in the data warehouse, whitelisting a Prequel IP, and adding the database details and credentials to Prequel. 🔗
List of Destinations
Vendor | Type | Status | Docs | Docs (.md file) |
---|---|---|---|---|
snowflake | OLAP | GA | link | link |
bigquery | OLAP | GA | link | link |
redshift | OLAP | GA | link | link |
databricks | OLAP | GA | link | link |
athena | OLAP | GA | link | link |
clickhouse | OLAP | Beta | ||
postgres | OLTP | GA | link | link |
aurora_postgres | OLTP | GA | link | link |
mysql | OLTP | GA | link | link |
aurora_mysql | OLTP | GA | link | link |
sql_server | OLTP | Beta | link | |
singlestore | OLTP | Beta | ||
s3 | Object Storage | GA | link | link |
s3_compatible | Object Storage | GA | link | link |
gcs | Object Storage | GA | link | link |
abs | Object Storage | GA | link | link |
google_sheets | Spreadsheet | GA | link | link |
Other available guides
You should know
You can use your discretion to decide what documentation to provide to your users that wish to connect their destination. To avoid confusion, we recommend working with your user to determine which database or data warehouse destination type they wish to connect, and then sending over the subset of documentation for them to work with.
Format of landed data
Data warehouses & databases (incl. Snowflake, BigQuery, Redshift, Databricks)
Data transferred to data warehouses and relational databases will be loaded as properly typed tables within a single schema.
For destinations other than BigQuery, a special _transfer_status
table will be loaded in the created schema to record transfer metadata, namely, a transfer_last_updated_at
timestamp for each table. In BigQuery, the last_updated
timestamp for a table is already accessible in the __TABLES_SUMMARY__
metatable.
Object storage (incl. AWS S3, Google Cloud Storage, Azure Blob Storage)
Data transferred to object storage destinations will be loaded as Apache Parquet files in Apache Hive style partitions. This means data will appear in the following folder structure:
<bucket_name>/<folder_name>/<model_name>/dt=<transfer_date>/<file_part>_<transfer_timestamp>.parquet
Where:
<bucket_name>
and<folder_name>
are provided during destination configuration.<model_name>
is the name of the data model being transferred (this is equivalent to a table name in relational data destinations).<transfer_date>
and<transfer_timestamp>
are generated at transfer time and based on the transfer's start time.<transfer_date>
is of the form2006-01-01
, while<transfer_timestamp>
is of the form20060102150405
.<file_part>
is a monotonically increasing integer for a given timestamp, and does not carry any special meaning.
What are Apache Hive style partitions and Apache Parquet file format?
- Apache Hive style partitions are compatible with most popular query engines, and should make data easily queryable and transportable.
- Apache Parquet file format is an open source, column-oriented data file format that offers efficient data compression and data integrity.
Spreadsheets (incl. Google Sheets)
Data transferred to spreadsheet destinations will be loaded as a newly created tab per data model. Where possible, the tabs will be created as protected tabs (or "read-only") to prevent accidental modification.
Hosting documentation for your customers
You may decide to self-host the destination configuration instructions on your own documentation site. If you prefer to do that, we maintain a copy of the source markdown
files at the following public locations:
Supplemental documentation
Some destination documentation guides reference additional or optional documentation. These should be provided as well, and are hosted here:
Guide | Documentation |
---|---|
Staging bucket: AWS S3 | Coming soon |
Staging bucket: GCP | Coming soon |
Staging bucket: Azure | Coming soon |
SSH tunnel: AWS | Coming soon |
SSH tunnel: GCP | Coming soon |
Updated 9 months ago