SFTP
Configuring your SFTP server.
Prerequisites
- By default, SFTP uses keypair authentication for access. You will need a provided
public keyto configure your destination. It will look roughly like this:
ssh-key <ssh_public_key_beginning_with_AAAA> some-commentStep 1: Create a user on the SFTP server
Login to the SFTP server and complete the steps below.
- Create group
sftpwriter:
sudo groupadd sftpwriter- Create user
sftpwriter:
sudo useradd -m -g sftpwriter sftpwriter- Switch to the
sftpwriteruser:
sudo su - sftpwriter- Create the
.sshdirectory:
mkdir ~/.ssh- Set permissions:
chmod 700 ~/.ssh- Navigate to the
.sshdirectory:
cd ~/.ssh- Create the
authorized_keysfile:
touch authorized_keys- Set permissions:
chmod 600 authorized_keys- Add the public key to the
authorized_keysfile. The key -- including the "ssh-key" and comment -- should be all on one line in the file, without linebreaks.
echo "ssh-key <ssh_public_key_beginning_with_AAAA> sftpwriter-public-key" > authorized_keysStep 2: Add your destination
Use the following details to complete the connection setup: host name, folder name, username, port and preferred delimiter character.
Write permissions at the SFTP root are requiredIn addition to write access within your configured
<folder>, this destination writes per-transfer manifest files under a_manifests/directory created at the root of the SFTP home/path. Ensure the SFTP user can create and write to_manifestsat that root (even if your data lands under a subfolder). Manifests allow downstream systems to detect when a transfer is complete. See the FAQ below for how these files are organized.
Frequently Asked Questions
- How will the data appear in my SFTP server?
-
The data will be loaded with the configured file format (Parquet, CSV, or JSON/JSONL) in a predictable folder structure that can be easily parsed by downstream systems.
sftpwriter_home_folder/ ├─ some_provided_folder/ │ ├─ some_table_a/ │ │ ├─ dt=2024-01-01/ │ │ │ ├─ 0_20240101181004.csv │ │ │ ├─ 1_20240101184002.csv │ │ ├─ dt=2024-01-02/ │ │ │ ├─ 0_20240102180123.csv │ │ ├─ dt=2024-01-03/ │ │ │ ├─ 0_20240103182145.csv │ ├─ some_table_b/ │ │ ├─ dt=2024-01-01/ │ │ │ ├─ 0_20240101186004.csv │ │ ├─ dt=2024-01-02/ │ │ │ ├─ 0_20240102185123.csv │ │ ├─ dt=2024-01-03/ │ │ │ ├─ 0_20240103187145.csv
-
Q: How is the SFTP connection secured?
A: Use SSH key-based authentication for a dedicated, least-privileged SFTP user. Restrict access to only the required directories (e.g., chroot), and allowlist the service's static egress IP at your network perimeter.
Q: What file formats are supported?
A: Parquet (default/recommended), CSV, and JSON/JSONL.
Q: How do I know when a transfer completed?
A: Each transfer writes a manifest JSON file per model under _manifests/ at the root. Files follow the pattern: _manifests/<model_name>/dt=<transfer_date>/manifest_{transfer_id}.json. Use these manifests to trigger downstream processing.
Q: Why do I sometimes see duplicates?
A: File-based destinations are append-oriented. The change-detection process uses a lookback window to prevent missed records, which can create duplicates across adjacent transfers. Downstream pipelines can deduplicate by primary key prioritizing rows in the most recent transfer window.
Q: Can I provide my own public key? Where is the private key stored?
A: We do not support providing your own public key for security reasons. The private key is securely generated and stored in our system and is never shared externally.
Updated 2 days ago