Configuration Reference

Full YAML schema for Nanosync connections, pipelines, rate limits, and schema mapping.

Nanosync is configured via a single YAML file (default: nanosync.yaml). The YAML is applied to the embedded store on startup. After that, pipelines and connections can be managed via the API, CLI, or UI — no file editing required.

Top-level structure

connections:
  - ...   # named connection definitions

pipelines:
  - ...   # pipeline definitions

Connections

Named connections allow you to reuse credentials across multiple pipelines.

connections:
  - name: prod-postgres          # unique name referenced by pipelines
    type: postgres
    dsn: "postgres://user:${env:PG_PASSWORD}@db.prod:5432/mydb?sslmode=require"

  - name: prod-bigquery
    type: bigquery
    properties:
      project_id: my-gcp-project
      dataset_id: replication
FieldTypeRequiredDescription
namestringyesUnique identifier referenced in pipeline connection: fields
typestringyesConnector type. Active sources: postgres, sqlserver, kafka, local, stdin. Active sinks: bigquery, alloydb, cloudsql, kafka, local, stdout. See Overview for coming-soon connectors.
dsnstringnoConnection string (used by database connectors)
propertiesmapnoKey-value connector properties (connector-specific)

Inline dsn or properties on a pipeline source or sink override the named connection on conflict, without modifying the connection definition.

Environment variable expansion

Any value in the YAML can reference an environment variable with ${env:VAR_NAME}. Expansion happens at startup before the config is applied.

dsn: "postgres://user:${env:PG_PASSWORD}@host:5432/db"
properties:
  api_key: "${env:BQ_API_KEY}"

Pipelines

pipelines:
  - name: orders-to-bigquery      # unique pipeline name
    source:
      connection: prod-postgres   # reference a named connection ...
      # or inline:
      # type: postgres
      # dsn: "postgres://..."
      tables:
        - public.orders
        - public.order_items
      properties:
        replication_slot:   nanosync_slot
        chunk_size:         "10000"
        snapshot_workers:   "4"
    sink:
      connection: prod-bigquery   # reference a named connection ...
      # or inline:
      # type: bigquery
      properties:
        project_id: my-project
        dataset_id: replication
        table_id:   orders
    rate_limit:
      max_events_per_second: 10000   # 0 = unlimited
      max_bytes_per_second:  104857600
    schema_mapping:
      conflict: widen               # widen | fail | approve

Pipeline fields

FieldTypeRequiredDescription
namestringyesUnique pipeline identifier
sourceobjectyesSource connector config
sinkobjectyesSink connector config
rate_limitobjectnoThroughput limits
schema_mappingobjectnoSchema drift handling

Source fields

FieldTypeDescription
connectionstringName of a named connection
typestringConnector type (required if no connection)
dsnstringConnection string (overrides named connection)
tables[]stringTables to replicate, in schema.table format
propertiesmapConnector-specific options (see connector docs)

Sink fields

FieldTypeDescription
connectionstringName of a named connection
typestringConnector type (required if no connection)
propertiesmapConnector-specific options (see connector docs)

Rate limiting

rate_limit:
  max_events_per_second: 10000    # integer, 0 = unlimited
  max_bytes_per_second: 104857600 # integer bytes, 0 = unlimited

Rate limits apply per pipeline. They are enforced on the source read side via backpressure.

Schema mapping

Controls what happens when a type-mapping conflict is detected between source and sink schemas.

schema_mapping:
  conflict: widen   # widen | fail | approve
ModeBehaviour
widen (default)Auto-cast to the nearest compatible type and log a warning. Replication continues.
failStop the pipeline immediately if any column has no direct type mapping.
approvePause the pipeline at pending_schema_approval state and wait for human review.

When using approve mode:

nanosync schema review  <pipeline>   # inspect the proposed type mapping
nanosync schema approve <pipeline>   # accept and resume

File format sinks

When using local, s3, gcs, or iceberg sink types, configure the output format:

sink:
  type: local
  properties:
    base_path:   /data/replication
    file_format: parquet          # parquet | csv | jsonl | avro
FormatExtensionNotes
parquet.parquetDefault. Columnar, best compression, schema-aware.
csv.csvPlain text, no schema embedded.
jsonl.jsonlOne JSON object per line.
avro.avroSchema embedded in each file.

SQL Server transaction log mode

Set cdc_mode: tlog to read directly from the SQL Server transaction log via sys.fn_dblog without requiring CDC setup on the source.

source:
  type: sqlserver
  dsn: "sqlserver://user:pass@host:1433?database=mydb"
  tables: [dbo.orders]
  properties:
    cdc_mode:        tlog           # "cdc" (default) | "tlog"
    log_batch_size:  "10000"
    poll_interval:   "200ms"
    max_xact_memory: "268435456"    # 256 MiB cap per transaction

Requires: database must use FULL or BULK_LOGGED recovery model. Only VIEW DATABASE STATE privilege is needed (no CDC setup required).

Config reload without restart

Send SIGHUP to the running server to reload and apply the config file:

kill -HUP $(pgrep nanosync)

Or use:

nanosync apply --file nanosync.yaml
nanosync apply --file nanosync.yaml --dry-run   # preview changes only

apply is idempotent — it upserts all connections and pipelines, leaving unchanged resources untouched.

Full annotated example

connections:
  - name: prod-postgres
    type: postgres
    dsn: "postgres://replicator:${env:PG_PASSWORD}@db.prod:5432/orders?sslmode=require"

  - name: warehouse
    type: bigquery
    properties:
      project_id: acme-data
      dataset_id: replication

pipelines:
  - name: orders-to-warehouse
    source:
      connection: prod-postgres
      tables:
        - public.orders
        - public.order_items
        - public.products
      properties:
        replication_slot:  nanosync_slot
        chunk_size:        "5000"
        snapshot_workers:  "8"
    sink:
      connection: warehouse
      properties:
        table_id: orders_cdc
    rate_limit:
      max_events_per_second: 50000
      max_bytes_per_second: 524288000   # 500 MiB/s
    schema_mapping:
      conflict: widen