Local Filesystem Sink
Configuration reference for Nanosync's local filesystem sink. S3 and GCS file sinks are coming soon.
Nanosync writes CDC events to the local filesystem in your chosen file format. The local sink is available today.
Amazon S3 and Google Cloud Storage file sinks are fully implemented and will be enabled in an upcoming release.
S3 and GCS sink docs are included below for reference. These connectors are not yet active — attempting to create a pipeline with type: s3 or type: gcs will return an error.
File formats
| Format | Extension | Notes |
|---|---|---|
parquet | .parquet | Default. Columnar, best compression, strongly typed. Ideal for analytics and data lakes. |
csv | .csv | Plain text. Column headers on the first row. No schema embedded. |
jsonl | .jsonl | One JSON object per line (newline-delimited JSON). |
avro | .avro | Binary encoding with schema embedded in each file. |
Compression codecs
| Codec | Applies to | Notes |
|---|---|---|
snappy | parquet, avro | Fast compression/decompression; moderate ratio. Default. |
gzip | parquet, csv, jsonl | Higher ratio; more CPU. |
zstd | parquet, avro | Best ratio with reasonable speed. |
none | all | No compression. |
File naming
{prefix}{schema}_{table}/{year}/{month}/{day}/{timestamp}-{uuid}.{ext}
Example with prefix: replication/postgres/:
replication/postgres/public_orders/2026/03/14/20260314T102201-abc12345.parquet
Each file includes these metadata columns:
| Column | Type | Description |
|---|---|---|
_ns_op | string | Operation: INSERT, UPDATE, or DELETE |
_ns_table | string | Fully-qualified source table name |
_ns_committed_at | timestamp | Commit timestamp from the source |
Limitations
- Files roll based on internal batch size and flush interval — there is no configurable max file size or time-based roll interval.
- No automatic file compaction — small files accumulate if flush intervals are short. Run compaction in your downstream system (Spark, DuckDB, or a scheduled lifecycle policy).
- S3 and GCS backends are coming soon and cannot be used in pipelines yet.
Local filesystem
Writes files to a local directory. Useful for development, testing, or feeding a local processing pipeline.
Connection configuration
connections:
- name: local-output
type: local
properties:
base_path: /data/replication
file_format: parquet
compression_codec: snappy
Pipeline configuration
pipelines:
- name: orders-to-disk
source:
connection: prod-postgres
tables:
- public.orders
sink:
connection: local-output
Properties
| Property | Default | Description |
|---|---|---|
base_path | — | Absolute path to the output directory. Required. Created if it does not exist. |
file_format | parquet | Output file format: parquet, csv, jsonl, or avro. |
compression_codec | snappy | Compression codec: snappy, gzip, zstd, or none. |
Amazon S3
Coming Soon — The S3 sink is not yet active. The documentation below is provided for preview only.
Writes files to an S3 bucket (or any S3-compatible store such as MinIO, Ceph, or Cloudflare R2).
IAM permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::my-data-lake",
"arn:aws:s3:::my-data-lake/*"
]
}
]
}
Connection configuration
connections:
- name: s3-output
type: s3
properties:
bucket: my-data-lake
prefix: replication/postgres/
region: us-east-1
access_key_id: "${env:AWS_ACCESS_KEY_ID}"
secret_access_key: "${env:AWS_SECRET_ACCESS_KEY}"
file_format: parquet
compression_codec: snappy
Properties
| Property | Default | Description |
|---|---|---|
bucket | — | S3 bucket name. Required. |
prefix | — | Key prefix for all uploaded objects. |
region | us-east-1 | AWS region. |
access_key_id | — | AWS access key ID. Defaults to SDK credential chain. |
secret_access_key | — | AWS secret access key. |
endpoint | — | Custom endpoint for S3-compatible stores (e.g. http://minio.local:9000). |
force_path_style | false | Required for MinIO and some S3-compatible stores. |
file_format | parquet | Output format: parquet, csv, jsonl, or avro. |
compression_codec | snappy | Compression: snappy, gzip, zstd, or none. |
Google Cloud Storage
Coming Soon — The GCS sink is not yet active. The documentation below is provided for preview only.
Writes files to a GCS bucket.
IAM permissions
gcloud storage buckets add-iam-policy-binding gs://my-data-lake \
--member="serviceAccount:nanosync@my-project.iam.gserviceaccount.com" \
--role="roles/storage.objectCreator"
Connection configuration
connections:
- name: gcs-output
type: gcs
properties:
bucket: my-data-lake
prefix: replication/postgres/
project_id: my-gcp-project
file_format: parquet
compression_codec: snappy
# credentials_json: '{"type":"service_account",...}' # optional; defaults to ADC
Properties
| Property | Default | Description |
|---|---|---|
bucket | — | GCS bucket name. Required. |
prefix | — | Object key prefix. |
project_id | — | GCP project ID. |
credentials_json | — | Service account JSON key as an inline string. Defaults to ADC. |
file_format | parquet | Output format: parquet, csv, jsonl, or avro. |
compression_codec | snappy | Compression: snappy, gzip, zstd, or none. |
On GKE, use Workload Identity — no credentials_json needed.
Monitoring
nanosync metrics pipeline orders-to-disk
| Metric | Description |
|---|---|
ns_pipeline_replication_lag_seconds | End-to-end source-to-sink latency |
ns_sink_rows_written_total | Rows written to files |
ns_sink_write_errors_total | File or upload errors |
ns_sink_files_written_total | Files created (rolled or flushed) |
Troubleshooting
Local: permission denied on write
The Nanosync process user does not have write access to base_path. Adjust directory ownership or permissions.
S3: NoSuchBucket error
The bucket does not exist. Create it before starting the pipeline.
S3: AccessDenied on PutObject
The IAM credentials lack write access. Check the bucket policy and IAM role.
GCS: 403 Forbidden
The service account lacks roles/storage.objectCreator on the bucket.
Large files accumulating Files roll based on internal flush interval and batch size — not a configurable size limit. Reduce throughput settings or run compaction downstream.