Azure Blob Storage
Configuring your Azure Blob Storage destination.
Step 1: Create Azure storage account
- In the Azure portal, navigate to the Storage accounts service and click + Create.
- In the "Basics" tab of the "Create a storage account" form, fill in the required details.
- In the "Advanced" settings, under "Security" make sure Enable storage account key access is turned on. You may turn off (deselect) "Allow enabling public access on containers". Under "Data Lake Storage Gen2", select Enable hierarchical namespace.
- In the "Networking" settings, you may limit "Network access" to either Enable public access from all networks or Enable public access from selected virtual networks and IP addresses. If the latter is selected, be sure to add the service's static IP to the address range of the chosen virtual network. All other settings can use the default selections.
Static IPCloud Hosted (US):
35.192.85.117/32Cloud Hosted (EU):
104.199.49.149/32If private-cloud or self-hosted, contact support for the static egress IP.
- In the "Data protection" settings, you must turn off Enable soft delete for blobs, Enable soft delete for containers, and Enable soft delete for file shares.
- Once the remaining options have been configured to your preference, click Create.
Step 2: Create container and access token
- In the Azure portal, navigate to the Storage accounts service and click on the account that was created in the previous step.
- In the navigation pane, under "Data storage", click Containers. Click + Container, choose a name for the container, and click Create.
Recommendation: dedicated container for data transfersUse a unique container for these transfers. This:
- Prevents resource contention with other workloads
- Avoids accidental data loss from mixed lifecycle or cleanup rules
- Improves security by reducing surface area and enabling tighter, destination-scoped policies
- In the navigation pane, under "Security + networking", click Shared access signature.
- Update the required accessible services and permissions:
- Under "Allowed services": select Blob and File.
- Under "Allowed resource types": select Container and Object.
- Under "Allowed permissions": select Read, Write, Delete, List, Add, Create, and Permanently Delete.
- Select a "Start and expiry date/time" based on your security posture (e.g., set the expiration date 6 months into the future), and click Generate SAS and connection string.
- Make a note of the SAS token that is generated.
Step 3: Add your destination
Securely share your storage account name, container name, your chosen folder name for the data, and your Storage account SAS token with us to complete the connection.
Permissions checklist
- SAS token includes: read, write, delete, list, add, create, and delete permissions on the target container
- Container exists in the intended account/region
- If using network restrictions, the egress IP is allowed
FAQ
Q: How is data organized in the container?
A: Data lands in Hive-style partitions per model: <folder>/<model_name>/dt=<transfer_date>/<file_part>_<transfer_timestamp>.<ext>. To write to the container root, enter . as the folder name.
Q: What file formats are supported?
A: Parquet (default/recommended), CSV, and JSON/JSONL.
Q: How are large datasets written?
A: Files are automatically split; multiple files may be written per model per transfer.
Q: How do I know when a transfer completed?
A: Each transfer writes a manifest file per model under _manifests. Files are written per model per transfer in the format: _manifests/<model_name>/dt=<transfer_date>/manifest_{transfer_id}.json.
Q: Why do I sometimes see duplicates?
A: Object storage is append-only. The change detection process uses a lookback window to ensure no data is missed, which can create duplicates. Downstream pipelines should deduplicate on primary keys prioritizing the most recent transfer window; manifest files can help bound the set of files to read.
Q: Are there file size limits?
A: No explicit size/row limits for Blob Storage; files are split automatically based on volume and performance heuristics.
Updated 4 days ago