ClickHouse
Configuring your ClickHouse destination.
Prerequisites
- If your ClickHouse security posture requires IP whitelisting, have our data syncing service's static IP available during the following steps. It will be required in Step 1.
Step 1: Allow access
SSH Tunneling Not SupportedSSH Tunneling is currently unsupported for Clickhouse destinations. Please ensure your Clickhouse destination is accessible over the public internet.
Create a rule in a security group or firewall settings to whitelist:
- incoming connections to your host and port (usually
9440) from the static IP. - outgoing connections from ports
1024to65535to the static IP.
Network allowlistingCloud Hosted (US):
35.192.85.117/32Cloud Hosted (EU):
104.199.49.149/32If private-cloud or self-hosted, contact support for the static egress IP.
Step 2: Create writer user
Create a database user to perform the writing of the data.
- Open a connection to your ClickHouse database.
- Create a user for the data transfer by executing the following SQL command.
CREATE USER <username>@'%' IDENTIFIED BY '<some-password>';
Password RulesPasswords may only include alphanumeric characters (A-Z, a-z, 0-9), dashes (-), and underscores (_).
- Grant user required privileges on the database.
GRANT SELECT ON information_schema.columns TO <username>;
GRANT CREATE, INSERT, DROP, ALTER, OPTIMIZE, SHOW, TRUNCATE ON <database>.* TO <username>@'%';
grant CREATE TEMPORARY TABLE, S3 on *.* to <username>@'%';
Understanding theCREATE TEMPORARY TABLE, S3permissionsThe
CREATE TEMPORARY TABLEandS3permissions are required to efficiently transfer data to ClickHouse. Under the hood, these permissions are used to stage data in object storage as compressed files, COPY INTO temporary tables, and finally merge into the target tables. By definition, the temporary table will not exist outside of the session.
Step 3: Set up staging bucket
ClickHouse sources require a staging bucket to efficiently transfer data. Configure your staging bucket using one of the following types of ClickHouse supported object storage:
- S3
- GCS
- Implicit
Using theimplicitbucket optionClickHouse supports the ability to configure staging resources with environment credentials. If this setting is enabled on your ClickHouse cluster, you may choose to use the configured implicit staging resources using the
implicitoption for the staging bucket selection.
Optional: Add a short retention lifecycle policyIf using S3 or GCS for staging, you may configure a lifecycle rule on the bucket to automatically delete objects older than 2 days as the bucket is not used to persist data. Note that transfer logic automatically cleans up files after transfer completion, so this is an optional step.
Step 4: Add your destination
Connection ProtocolUse the ClickHouse TCP native protocol, not HTTPS. This is commonly exposed on port 9000.
Use the following details to complete the connection setup: host name, port, cluster, database name, schema name, username, password, and staging bucket details.
Understanding thedatabasevs.schemafields (connection databasevs.write database)Depending on the version of your integration, you may be asked for both a
databaseandschema, or aconnection databaseandwrite database.
database(also referred to asconnection_database): is the database used to establish the connection with ClickHouse.schema(also referred to aswrite_database): is the database/schema within which data will be writtenThese can be (and often are) the same values, but do not need to be.
Using the ClickHouse data
Querying ClickHouse data without duplicatesThe resulting ClickHouse tables use the ReplacingMergeTree table engine in order to efficiently upsert changes. To properly query this data, the
FINALkeyword must be used when selecting from these tables guarantee duplicates are removed. For example:SELECT * FROM schema.table FINAL WHERE foo = bar ORDER BY foo LIMIT 10;
Permissions checklist
- User has
SELECToninformation_schema.columns. - User has
CREATE, INSERT, DROP, ALTER, OPTIMIZE, SHOW, TRUNCATEon<database>.*. - User has
CREATE TEMPORARY TABLE, S3on*.*. - Staging bucket configured (S3, GCS, or implicit).
- Firewall or security group allows the service's egress IP on the ClickHouse native protocol port (default:
9440for TLS,9000for TCP). - Password uses only alphanumeric characters, dashes (
-), or underscores (_).
FAQ
Q: How is the ClickHouse connection secured?
A: We connect using the username and password you configure over the ClickHouse native TCP protocol. Network access can be restricted by allowlisting the service's static egress IP in your firewall or security group. Note: SSH tunneling is not supported for ClickHouse destinations.
Updated 8 days ago