Local Filesystem Sink

Configuration reference for Nanosync's local filesystem sink. S3 and GCS file sinks are coming soon.

Nanosync writes CDC events to the local filesystem in your chosen file format. The local sink is available today.

Amazon S3 and Google Cloud Storage file sinks are fully implemented and will be enabled in an upcoming release.

S3 and GCS sink docs are included below for reference. These connectors are not yet active — attempting to create a pipeline with type: s3 or type: gcs will return an error.

File formats

FormatExtensionNotes
parquet.parquetDefault. Columnar, best compression, strongly typed. Ideal for analytics and data lakes.
csv.csvPlain text. Column headers on the first row. No schema embedded.
jsonl.jsonlOne JSON object per line (newline-delimited JSON).
avro.avroBinary encoding with schema embedded in each file.

Compression codecs

CodecApplies toNotes
snappyparquet, avroFast compression/decompression; moderate ratio. Default.
gzipparquet, csv, jsonlHigher ratio; more CPU.
zstdparquet, avroBest ratio with reasonable speed.
noneallNo compression.

File naming

{prefix}{schema}_{table}/{year}/{month}/{day}/{timestamp}-{uuid}.{ext}

Example with prefix: replication/postgres/:

replication/postgres/public_orders/2026/03/14/20260314T102201-abc12345.parquet

Each file includes these metadata columns:

ColumnTypeDescription
_ns_opstringOperation: INSERT, UPDATE, or DELETE
_ns_tablestringFully-qualified source table name
_ns_committed_attimestampCommit timestamp from the source

Limitations


Local filesystem

Writes files to a local directory. Useful for development, testing, or feeding a local processing pipeline.

Connection configuration

connections:
  - name: local-output
    type: local
    properties:
      base_path:         /data/replication
      file_format:       parquet
      compression_codec: snappy

Pipeline configuration

pipelines:
  - name: orders-to-disk
    source:
      connection: prod-postgres
      tables:
        - public.orders
    sink:
      connection: local-output

Properties

PropertyDefaultDescription
base_pathAbsolute path to the output directory. Required. Created if it does not exist.
file_formatparquetOutput file format: parquet, csv, jsonl, or avro.
compression_codecsnappyCompression codec: snappy, gzip, zstd, or none.

Amazon S3

Writes files to an S3 bucket (or any S3-compatible store such as MinIO, Ceph, or Cloudflare R2).

IAM permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-data-lake",
        "arn:aws:s3:::my-data-lake/*"
      ]
    }
  ]
}

Connection configuration

connections:
  - name: s3-output
    type: s3
    properties:
      bucket:              my-data-lake
      prefix:              replication/postgres/
      region:              us-east-1
      access_key_id:       "${env:AWS_ACCESS_KEY_ID}"
      secret_access_key:   "${env:AWS_SECRET_ACCESS_KEY}"
      file_format:         parquet
      compression_codec:   snappy

Properties

PropertyDefaultDescription
bucketS3 bucket name. Required.
prefixKey prefix for all uploaded objects.
regionus-east-1AWS region.
access_key_idAWS access key ID. Defaults to SDK credential chain.
secret_access_keyAWS secret access key.
endpointCustom endpoint for S3-compatible stores (e.g. http://minio.local:9000).
force_path_stylefalseRequired for MinIO and some S3-compatible stores.
file_formatparquetOutput format: parquet, csv, jsonl, or avro.
compression_codecsnappyCompression: snappy, gzip, zstd, or none.

Google Cloud Storage

Writes files to a GCS bucket.

IAM permissions

gcloud storage buckets add-iam-policy-binding gs://my-data-lake \
  --member="serviceAccount:nanosync@my-project.iam.gserviceaccount.com" \
  --role="roles/storage.objectCreator"

Connection configuration

connections:
  - name: gcs-output
    type: gcs
    properties:
      bucket:            my-data-lake
      prefix:            replication/postgres/
      project_id:        my-gcp-project
      file_format:       parquet
      compression_codec: snappy
      # credentials_json: '{"type":"service_account",...}'  # optional; defaults to ADC

Properties

PropertyDefaultDescription
bucketGCS bucket name. Required.
prefixObject key prefix.
project_idGCP project ID.
credentials_jsonService account JSON key as an inline string. Defaults to ADC.
file_formatparquetOutput format: parquet, csv, jsonl, or avro.
compression_codecsnappyCompression: snappy, gzip, zstd, or none.

On GKE, use Workload Identity — no credentials_json needed.


Monitoring

nanosync metrics pipeline orders-to-disk
MetricDescription
ns_pipeline_replication_lag_secondsEnd-to-end source-to-sink latency
ns_sink_rows_written_totalRows written to files
ns_sink_write_errors_totalFile or upload errors
ns_sink_files_written_totalFiles created (rolled or flushed)

Troubleshooting

Local: permission denied on write The Nanosync process user does not have write access to base_path. Adjust directory ownership or permissions.

S3: NoSuchBucket error The bucket does not exist. Create it before starting the pipeline.

S3: AccessDenied on PutObject The IAM credentials lack write access. Check the bucket policy and IAM role.

GCS: 403 Forbidden The service account lacks roles/storage.objectCreator on the bucket.

Large files accumulating Files roll based on internal flush interval and batch size — not a configurable size limit. Reduce throughput settings or run compaction downstream.