Observability

Nanosync exports metrics via OpenTelemetry, bridged to a Prometheus scrape endpoint at /metrics. A Grafana dashboard and alerting rules are included in the repository.

Prometheus

GET http://localhost:7600/metrics

Add to prometheus.yml:

scrape_configs:
  - job_name: nanosync
    static_configs:
      - targets: ['localhost:7600']
    metrics_path: /metrics
    scrape_interval: 15s

Kubernetes ServiceMonitor

helm upgrade nanosync deploy/helm/nanosync/ --set serviceMonitor.enabled=true

Metrics reference

Pipeline metrics

Metric	Type	Labels	Description
`ns_pipeline_events_per_second`	Gauge	`pipeline`	EWMA throughput, events/s
`ns_pipeline_replication_lag_seconds`	Gauge	`pipeline`	Source-to-sink commit latency — primary SLO metric
`ns_pipeline_last_checkpoint_timestamp_seconds`	Gauge	`pipeline`	Unix timestamp of last committed checkpoint
`ns_pipeline_events_total`	Counter	`pipeline`, `table`, `op`	Cumulative events processed
`ns_pipeline_sink_errors_total`	Counter	`pipeline`, `error_type`	Failed sink writes
`ns_pipeline_state`	Gauge	`pipeline`, `state`	1 if the pipeline is in this state

CDC metrics

Metric	Type	Labels	Description
`ns_cdc_table_events_total`	Counter	`pipeline`, `table`, `op`	CDC events by operation (`insert`/`update`/`delete`)
`ns_cdc_table_lag_seconds`	Gauge	`pipeline`, `table`	Per-table replication lag

Snapshot metrics

Metric	Type	Labels	Description
`ns_snapshot_rows_total`	Counter	`pipeline`, `table`	Rows backfilled during initial snapshot
`ns_snapshot_partitions_completed_total`	Counter	`pipeline`, `table`	Completed snapshot partitions
`ns_snapshot_phase`	Gauge	`pipeline`	1 during snapshot, 0 during CDC

SQL Server tlog metrics

Metric	Type	Labels	Description
`ns_tlog_read_lag_seconds`	Gauge	`pipeline`	Age of oldest unprocessed transaction log record
`ns_tlog_gaps_total`	Counter	`pipeline`	LSN gap events that triggered a snapshot fallback

System metrics

Metric	Type	Labels	Description
`ns_buffer_flush_total`	Counter	`pipeline`, `reason`	AdaptiveBuffer flush events by reason
`ns_buffer_size_bytes`	Gauge	`pipeline`	Current buffer size in bytes
`ns_worker_count`	Gauge	—	Number of active Nanosync worker instances

Grafana dashboard

Import the pre-built dashboard at deploy/dashboards/nanosync-overview.json:

curl -X POST \
  -H "Content-Type: application/json" \
  -d @deploy/dashboards/nanosync-overview.json \
  http://admin:password@localhost:3000/api/dashboards/import

Panels: replication lag (P50/P95/P99), throughput, snapshot progress, pipeline state, sink errors, worker fleet.

Alerting rules

deploy/alerts/nanosync.yaml:

groups:
  - name: nanosync
    rules:
      - alert: NanosyncReplicationLagHigh
        expr: ns_pipeline_replication_lag_seconds > 30
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Replication lag > 30s on {{ $labels.pipeline }}"

      - alert: NanosyncReplicationLagCritical
        expr: ns_pipeline_replication_lag_seconds > 300
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Replication lag > 5m on {{ $labels.pipeline }}"

      - alert: NanosyncPipelineError
        expr: ns_pipeline_state{state="error"} == 1
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Pipeline {{ $labels.pipeline }} is in error state"

      - alert: NanosyncSinkErrorsHigh
        expr: rate(ns_pipeline_sink_errors_total[5m]) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Sink errors on {{ $labels.pipeline }}: {{ $value }}/s"

Structured logging

Nanosync emits structured logs in JSON (when piped) or colourised text (on TTY).

{"time":"2026-03-11T09:14:02Z","level":"INFO","msg":"checkpoint committed","pipeline":"orders-to-bigquery","lsn":"0/1A2B3C4","events":4231,"lag_ms":12}

nanosync start server --log-format json   # force JSON
nanosync start server --log-format text   # force text

The web UI at http://localhost:7600/app/ provides basic monitoring during development — no Grafana setup needed.