Overview
How it works
Most destinations are configured by first creating a special purpose user to perform the write operations in the data warehouse, whitelisting a Prequel IP, and adding the database details and credentials to Prequel. 🔗
Supported Destinations
You can find all currently supported destinations and documentation links below. You also may decide to self-host the destination configuration instructions on your own documentation site. If you prefer to do that, we maintain a copy of the source markdown
files accessible at public locations below:
Vendor | Type | Status | Docs | Markdown (.md file) |
---|---|---|---|---|
snowflake | OLAP | GA | link | link |
bigquery | OLAP | GA | link | link |
redshift | OLAP | GA | link | link |
databricks | OLAP | GA | link | link |
athena | OLAP | GA | link | link |
clickhouse | OLAP | GA | link | link |
motherduck | OLAP | Beta | link | link |
postgres | OLTP | GA | link | link |
aurora_postgres | OLTP | GA | link | link |
mysql | OLTP | GA | link | link |
aurora_mysql | OLTP | GA | link | link |
sql_server | OLTP | Beta | link | |
singlestore | OLTP | Beta | ||
s3 | Object Storage | GA | link | link |
s3_compatible | Object Storage | GA | link | link |
gcs | Object Storage | GA | link | link |
abs | Object Storage | GA | link | link |
google_sheets | Spreadsheet | GA | link | link |
Other available guides
You should know
You can use your discretion to decide what documentation to provide to your users that wish to connect their destination. To avoid confusion, we recommend working with your user to determine which database or data warehouse destination type they wish to connect, and then sending over the subset of documentation for them to work with.
Format of landed data
Data warehouses & databases (incl. Snowflake, BigQuery, Redshift, Databricks)
Data transferred to data warehouses and relational databases will be loaded as properly typed tables within a single schema.
For destinations other than BigQuery, a special _transfer_status
table will be loaded in the created schema to record transfer metadata, namely, a transfer_last_updated_at
timestamp for each table. In BigQuery, the last_updated
timestamp for a table is already accessible in the __TABLES_SUMMARY__
metatable.
Object storage (incl. AWS S3, Google Cloud Storage, Azure Blob Storage)
Data transferred to object storage destinations will be loaded as Apache Parquet, CSV, or JSON files in Apache Hive style partitions. The default file format is Parquet, which we strongly recommend for performance reasons. This means data will appear in the following folder structure:
<bucket_name>/<folder_name>/<model_name>/dt=<transfer_date>/<file_part>_<transfer_timestamp>.parquet
Where:
<bucket_name>
and<folder_name>
are provided during destination configuration.<model_name>
is the name of the data model being transferred (this is equivalent to a table name in relational data destinations).<transfer_date>
and<transfer_timestamp>
are generated at transfer time and based on the transfer's start time.<transfer_date>
is of the form2006-01-01
, while<transfer_timestamp>
is of the form20060102150405
.<file_part>
is a monotonically increasing integer for a given timestamp, and does not carry any special meaning.
Prequel also provides a manifest file for each model within each transfer, mapping each transfer to its file contents. You can find more about the structure of these manifest files here: Manifest Files for Object Storage.
What are Apache Hive style partitions and Apache Parquet file format?
- Apache Hive style partitions are compatible with most popular query engines, and should make data easily queryable and transportable.
- Apache Parquet file format is an open source, column-oriented data file format that offers efficient data compression and data integrity.
Note that individual files for object storage destinations will not exceed ~4 GB in size. Rows transferred to object storage destinations are treated as "append only". This has the following effects for all object storage destinations:
- Prequel will always write data based on the current data model, meaning Schema evolution and data model updates must be handled downstream of transfer.
- Deduplication, merging updated rows will not take place during transfer, all updates will be written as unique rows. These updates must be also be made downstream of transfer.
- Full refresh transfers will append the full dataset without deleting existing data.
However, object storage locations configured as a Delta Lake destination will receive the native schema evolution, deduplication, and ACID transaction behavior that all data warehouse destinations receive.
Spreadsheets (incl. Google Sheets)
Data transferred to spreadsheet destinations will be loaded as a newly created tab per data model. Where possible, the tabs will be created as protected tabs (or "read-only") to prevent accidental modification.
Updated 1 day ago