Manifest files for object storage
Object storage destinations (s3
, gcs
, abs
, and s3_compat
) receive manifest files alongside the data files. This enables any downstream pipeline processing this data to know when a transfer has completed for a given model.
A manifest file is written for each model as part of every transfer. Manifest files live in their own _manifests
directory within the object storage destination, and each manifest file is named manifest_{transfer_id}.json
as shown below.
some_bucket
|-- orders
| |-- dt=2024-07-01
| |-- file1.parquet
| file2.parquet
|
|-- transactions
| |-- dt=2024-07-01
| |-- file1.parquet
|
|-- _manifests
| |-- orders
| |-- dt=2024-07-01
| |-- manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json
| |-- transactions
| |-- dt=2024-07-01
| |-- manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json
These files are enabled by default on new object storage destinations.
File format
Every manifest file follows the same structure. Here is a list of keys you can expect in every file, along with a sample complete file.
Object Key | Value |
---|---|
version | Manifest format version. |
transfer_id | Unique ID of the transfer job. |
start_time | Start time of the transfer job in UTC time. |
end_time | End time of the transfer job in UTC time. |
model_id | Unique ID (uuid) of the data model. |
model_name | Name of the data model. |
bucket_name | Name of the object storage bucket. |
bucket_prefix | Prefix of all objects created in this transfer. |
manifest_file_key | Key of the manifest file (path and file name). |
file_format | Format of the landed data (eg PARQUET ). |
transfer_type | Type of data transfer. Either FULL_REFRESH or INCREMENTAL . |
signature | SHA-256 signature of the value stored in the files key. |
signature_public_key | Public key which can be used to read the signature. |
files | Array of file objects. Each object contains: a sha256_checksum , the etag of the file, and the data_file_key (the full path and name the file was written to). |
{
"version": "2024-06-01",
"transfer_id": "aeb6efc1-73ec-405d-ae5e-d28b349b364c",
"start_time": "2024-06-01T07:28:34.028Z",
"end_time": "2024-06-01T07:33:43.897Z",
"model_id": "ee9542e3-6469-4e09-bdcc-abb50ca5643a",
"model_name": "transactions",
"bucket_name": "vendor-data",
"bucket_prefix": "user-supplied-schema/transactions/2024-06-01",
"manifest_file_key": "user-supplied-schema/transactions/2024-06-01/manifest_aeb6efc1-73ec-405d-ae5e-d28b349b364c.json",
"file_format": "PARQUET",
"transfer_type": "FULL_REFRESH",
"signature": "f4d2e40ab3c0f5b8e3b88e022db4f7c54fb9f82c77ffa2e444c479b57843f57bea32812a73b9c8a786d3b908c434f9374e0498b9a2f23c90e2f578b9444382b0f",
"signature_public_key": "-----BEGIN RSA PUBLIC KEY-----\nMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDmf2CGFxZU/Dx911t4K8l/G5zM\njGUvhP01k2YTLtBRXEdXLGZnmzuJTOsqyPOvj3+HU/iNUQ/mXIJu7wKTrA/glZ1i\n0Zcc18Ek0jyne03ikBDIdyeYZTGi37/UnNVLwkr2FxhHUHBgiS5msFjxjquC941D\n5Xluak1U1p6/ZFV0AwIDAQAB\n-----END RSA PUBLIC KEY-----",
"files": [
{
"sha256_checksum": "sQMSpEILNgoQmarvDFonGQ==",
"etag": "af83d6f217c19b8b0fff8023d8ca4716-1",
"data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150301.parquet"
},
{
"sha256_checksum": "9c78d2e727b9f0b56a85b38dff88763c==",
"etag": "9f84f7aacc09e05-1",
"data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150405.parquet"
},
{
"sha256_checksum": "1a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p==",
"etag": "3d4fgsd7834f734b-1",
"data_file_key": "user-supplied-schema/transactions/2024-06-01/20240601150645.parquet"
},
]
}
Updated 6 months ago