Local Filesystem Sink

Nanosync writes CDC events to the local filesystem in your chosen file format. The local sink is available today.

Amazon S3 and Google Cloud Storage file sinks are fully implemented and will be enabled in an upcoming release.

S3 and GCS sink docs are included below for reference. These connectors are not yet active — attempting to create a pipeline with type: s3 or type: gcs will return an error.

File formats

Format	Extension	Notes
`parquet`	`.parquet`	Default. Columnar, best compression, strongly typed. Ideal for analytics and data lakes.
`csv`	`.csv`	Plain text. Column headers on the first row. No schema embedded.
`jsonl`	`.jsonl`	One JSON object per line (newline-delimited JSON).
`avro`	`.avro`	Binary encoding with schema embedded in each file.

Compression codecs

Codec	Applies to	Notes
`snappy`	parquet, avro	Fast compression/decompression; moderate ratio. Default.
`gzip`	parquet, csv, jsonl	Higher ratio; more CPU.
`zstd`	parquet, avro	Best ratio with reasonable speed.
`none`	all	No compression.

File naming

{prefix}{schema}_{table}/{year}/{month}/{day}/{timestamp}-{uuid}.{ext}

Example with prefix: replication/postgres/:

replication/postgres/public_orders/2026/03/14/20260314T102201-abc12345.parquet

Each file includes these metadata columns:

Column	Type	Description
`_ns_op`	string	Operation: `INSERT`, `UPDATE`, or `DELETE`
`_ns_table`	string	Fully-qualified source table name
`_ns_committed_at`	timestamp	Commit timestamp from the source

Limitations

Files roll based on internal batch size and flush interval — there is no configurable max file size or time-based roll interval.
No automatic file compaction — small files accumulate if flush intervals are short. Run compaction in your downstream system (Spark, DuckDB, or a scheduled lifecycle policy).
S3 and GCS backends are coming soon and cannot be used in pipelines yet.

Local filesystem

Writes files to a local directory. Useful for development, testing, or feeding a local processing pipeline.

Connection configuration

connections:
  - name: local-output
    type: local
    properties:
      base_path:         /data/replication
      file_format:       parquet
      compression_codec: snappy

Pipeline configuration

pipelines:
  - name: orders-to-disk
    source:
      connection: prod-postgres
      tables:
        - public.orders
    sink:
      connection: local-output

Properties

Property	Default	Description
`base_path`	—	Absolute path to the output directory. Required. Created if it does not exist.
`file_format`	`parquet`	Output file format: `parquet`, `csv`, `jsonl`, or `avro`.
`compression_codec`	`snappy`	Compression codec: `snappy`, `gzip`, `zstd`, or `none`.

Amazon S3

Coming Soon — The S3 sink is not yet active. The documentation below is provided for preview only.

Writes files to an S3 bucket (or any S3-compatible store such as MinIO, Ceph, or Cloudflare R2).

IAM permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-data-lake",
        "arn:aws:s3:::my-data-lake/*"
      ]
    }
  ]
}

Connection configuration

connections:
  - name: s3-output
    type: s3
    properties:
      bucket:              my-data-lake
      prefix:              replication/postgres/
      region:              us-east-1
      access_key_id:       "${env:AWS_ACCESS_KEY_ID}"
      secret_access_key:   "${env:AWS_SECRET_ACCESS_KEY}"
      file_format:         parquet
      compression_codec:   snappy

Properties

Property	Default	Description
`bucket`	—	S3 bucket name. Required.
`prefix`	—	Key prefix for all uploaded objects.
`region`	`us-east-1`	AWS region.
`access_key_id`	—	AWS access key ID. Defaults to SDK credential chain.
`secret_access_key`	—	AWS secret access key.
`endpoint`	—	Custom endpoint for S3-compatible stores (e.g. `http://minio.local:9000`).
`force_path_style`	`false`	Required for MinIO and some S3-compatible stores.
`file_format`	`parquet`	Output format: `parquet`, `csv`, `jsonl`, or `avro`.
`compression_codec`	`snappy`	Compression: `snappy`, `gzip`, `zstd`, or `none`.

Google Cloud Storage

Coming Soon — The GCS sink is not yet active. The documentation below is provided for preview only.

Writes files to a GCS bucket.

IAM permissions

gcloud storage buckets add-iam-policy-binding gs://my-data-lake \
  --member="serviceAccount:nanosync@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectCreator"

Connection configuration

connections:
  - name: gcs-output
    type: gcs
    properties:
      bucket:            my-data-lake
      prefix:            replication/postgres/
      project_id:        my-gcp-project
      file_format:       parquet
      compression_codec: snappy
      # credentials_json: '{"type":"service_account",...}'  # optional; defaults to ADC

Properties

Property	Default	Description
`bucket`	—	GCS bucket name. Required.
`prefix`	—	Object key prefix.
`project_id`	—	GCP project ID.
`credentials_json`	—	Service account JSON key as an inline string. Defaults to ADC.
`file_format`	`parquet`	Output format: `parquet`, `csv`, `jsonl`, or `avro`.
`compression_codec`	`snappy`	Compression: `snappy`, `gzip`, `zstd`, or `none`.

On GKE, use Workload Identity — no credentials_json needed.

Monitoring

nanosync metrics pipeline orders-to-disk

Metric	Description
`ns_pipeline_replication_lag_seconds`	End-to-end source-to-sink latency
`ns_sink_rows_written_total`	Rows written to files
`ns_sink_write_errors_total`	File or upload errors
`ns_sink_files_written_total`	Files created (rolled or flushed)

Troubleshooting

Local: permission denied on write The Nanosync process user does not have write access to base_path. Adjust directory ownership or permissions.

S3: NoSuchBucket error The bucket does not exist. Create it before starting the pipeline.

S3: AccessDenied on PutObject The IAM credentials lack write access. Check the bucket policy and IAM role.

GCS: 403 Forbidden The service account lacks roles/storage.objectCreator on the bucket.

Large files accumulating Files roll based on internal flush interval and batch size — not a configurable size limit. Reduce throughput settings or run compaction downstream.