PostgreSQL Replication

ProxRad’s HA cluster uses PostgreSQL streaming replication to keep a hot standby in sync with the primary. This page covers the database-level mechanics — the wal_level, replication slots, pg_basebackup, pg_hba.conf, and how to monitor lag. For the panel-level cluster UI and failover flow, see HA Cluster and Failover.

The replication setup is automated by the panel when you click Configure as Main Server and Join as Secondary. This page is for understanding what happened, debugging when it didn’t, and running it by hand if you need to.

What streaming replication is

The main server’s Postgres writes a Write-Ahead Log (WAL) for every change. With streaming replication, the secondary’s Postgres opens a TCP connection to the main, asks for “give me every WAL record from LSN X onward”, and applies them locally as they arrive. The secondary’s data is identical to the main’s, lagging by however long it takes the network to deliver and the replica to replay.

   ┌─────────────────────┐                      ┌──────────────────────┐
   │ MAIN  proxpanel-db  │ ── port 5432 ──────▶ │ SECONDARY proxpanel-db│
   │                     │  (replicator user)   │                       │
   │ pg_wal/             │                      │ standby.signal        │
   │  └─ 000…001          │  WAL stream         │ pg_wal/               │
   │  └─ 000…002          │ ────────────────▶   │  └─ 000…001            │
   │  └─ 000…003 (live)   │  via replication    │  └─ 000…002            │
   │                     │  slot replica_2      │  └─ 000…003 (applying) │
   └─────────────────────┘                      └──────────────────────┘

The replication slot ensures the main doesn’t recycle WAL segments the secondary hasn’t consumed yet, so a brief network blip doesn’t force a full re-base.

Postgres settings on the main

services/postgres_replication.go SetupMainServer() runs:

ALTER SYSTEM SET wal_level = replica;
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET max_replication_slots = 10;
ALTER SYSTEM SET wal_keep_size = '1GB';
ALTER SYSTEM SET hot_standby = on;
ALTER SYSTEM SET listen_addresses = '*';
SELECT pg_reload_conf();

Setting	Why
`wal_level = replica`	Generate enough WAL detail for physical replication.
`max_wal_senders = 10`	Allow up to 10 simultaneous replicas + base-backups.
`max_replication_slots = 10`	One slot per replica.
`wal_keep_size = '1GB'`	Retain 1 GB of WAL on disk in case a slow replica falls behind.
`hot_standby = on`	Allow read queries on the replica while it’s streaming.
`listen_addresses = '*'`	Without this, Postgres binds only to localhost and the replica can’t connect.

These are set with ALTER SYSTEM which writes to postgresql.auto.conf and survives container restarts. The proxpanel-db Postgres data directory is a Docker volume, so this state is persistent.

The replicator role

CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD '<random>';

The password is whatever was set as DB_PASSWORD on the main. Same password, separate role, narrow privileges (REPLICATION only — no DB access).

pg_hba.conf

pg_hba.conf controls who can connect over the network. Out of the box, Postgres allows nothing from outside the container. You must add:

host    replication     replicator      <secondary_ip>/32      md5

The panel logs this exact line on SetupMainServer() but does not modify pg_hba.conf automatically — that file lives inside the Postgres data volume and editing it via the container is the safest path.

Identify the secondary’s IP (the IP the secondary will connect from, not the main’s IP).

On the main, append the line:

docker exec proxpanel-db bash -c \
  "echo 'host replication replicator <SECONDARY_IP>/32 md5' >> /var/lib/postgresql/data/pg_hba.conf"

Reload Postgres (no restart needed):

docker exec proxpanel-db psql -U proxpanel -d proxpanel -c "SELECT pg_reload_conf();"

Verify:

docker exec proxpanel-db cat /var/lib/postgresql/data/pg_hba.conf | grep replication

If you skip this, the secondary’s pg_basebackup will hang with FATAL: no pg_hba.conf entry for replication connection.

Replication slots

SELECT pg_create_physical_replication_slot('replica_2');

Slots are created on the main, one per secondary. The slot name is replica_<node_id> where node_id is the auto-incrementing ID in cluster_nodes.

The slot guarantees WAL retention. If the secondary is offline for 4 hours, the main holds 4 hours of WAL on disk (subject to max_slot_wal_keep_size if set — by default unbounded). When the secondary reconnects, replication resumes from where it left off.

Inactive slots leak disk space. If a secondary is permanently dead and you don’t drop its slot, the main accumulates WAL forever and eventually fills the disk. After removing a node from the cluster (Cluster tab → ×), always confirm the slot is gone:

SELECT slot_name, active FROM pg_replication_slots;

Drop manually if needed: SELECT pg_drop_replication_slot('replica_2');

Setting up the replica

SetupReplicaServer() generates a setup script at /tmp/setup_replica.sh rather than running it directly — stopping Postgres while the API container is still running would break the live DB connection. You execute the script manually.

The script does:

docker stop proxpanel-db
docker run --rm -v proxpanel_postgres_data:/data -v /tmp:/backup alpine \
    tar -czf /backup/postgres_backup_TIMESTAMP.tar.gz -C /data .
docker run --rm -v proxpanel_postgres_data:/data alpine \
    sh -c "rm -rf /data/*"
docker run --rm \
    -v proxpanel_postgres_data:/var/lib/postgresql/data \
    -e PGPASSWORD='<replicator_password>' postgres:16 \
    pg_basebackup -h MAIN_IP -p 5432 -U replicator \
      -D /var/lib/postgresql/data -Fp -Xs -P -R -S replica_2
docker run --rm -v proxpanel_postgres_data:/data alpine touch /data/standby.signal
docker start proxpanel-db

Flag	Meaning
`-Fp`	Plain format (not tar) — output to a directory.
`-Xs`	Stream WAL in parallel during base backup.
`-P`	Show progress.
`-R`	Write `primary_conninfo` to `postgresql.auto.conf` and create `standby.signal`.
`-S replica_2`	Use replication slot named `replica_2`.

After this, Postgres starts in standby mode. The standby.signal empty file is the marker; if you delete it and restart, Postgres exits recovery and becomes writable (this is exactly what pg_promote() does).

Verifying the replica is streaming

docker exec proxpanel-db psql -U proxpanel -d proxpanel -c \
  "SELECT pg_is_in_recovery();"
# → t (true) — this is a replica

docker exec proxpanel-db psql -U proxpanel -d proxpanel -c \
  "SELECT * FROM pg_stat_wal_receiver \\gx"
# → status: streaming
#   sender_host: <main_ip>
#   slot_name: replica_2
#   last_msg_receipt_time: 2026-05-12 14:05:30+00

Monitoring lag

From the main

SELECT application_name, client_addr, state, sync_state,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
  FROM pg_stat_replication;

lag_bytes is how many bytes of WAL the replica hasn’t replayed yet. Under ~1 MB is healthy. Sustained tens or hundreds of megabytes means the replica is overwhelmed or the network is choking.

From the replica

SELECT now() - pg_last_xact_replay_timestamp() AS replay_lag;

This is the time delta — typically sub-second, may climb to seconds under load.

The cluster service uses this exact query (GetReplicationLagSeconds()) and reports it in the heartbeat. The Cluster tab in the panel shows the result.

Promoting the replica

When the secondary needs to become the new main (planned switchover or automatic failover), the panel calls:

SELECT pg_promote();

This:

Replays any pending WAL.
Removes standby.signal.
Exits recovery mode.
Begins accepting writes.

Takes 1–5 seconds typically. The connection from the application (proxpanel-api) usually doesn’t even drop; the next write succeeds.

Old primary cannot just be re-attached as a new secondary — its WAL diverged from the new primary’s at the moment of promotion. You must run pg_basebackup again from the new primary to re-base it. The panel’s DemoteToReplica() generates a script for this; see Failover → Re-attaching the old main for the full flow.

CLI cheatsheet

# Status (from main)
docker exec proxpanel-db psql -U proxpanel -d proxpanel -c \
  "SELECT * FROM pg_stat_replication;"

# Status (from replica)
docker exec proxpanel-db psql -U proxpanel -d proxpanel -c \
  "SELECT * FROM pg_stat_wal_receiver;"

# Replication slots on main
docker exec proxpanel-db psql -U proxpanel -d proxpanel -c \
  "SELECT slot_name, active, wal_status FROM pg_replication_slots;"

# Is this a replica?
docker exec proxpanel-db psql -U proxpanel -d proxpanel -c \
  "SELECT pg_is_in_recovery();"

# Force a WAL switch on main (useful during testing — forces the replica to
# receive a new segment)
docker exec proxpanel-db psql -U proxpanel -d proxpanel -c \
  "SELECT pg_switch_wal();"

# Manually promote (last-resort, normally you'd use the panel UI)
docker exec proxpanel-db psql -U proxpanel -d proxpanel -c \
  "SELECT pg_promote();"

Common pitfalls

FATAL: no pg_hba.conf entry for replication connection. Add the host replication replicator <ip>/32 md5 line. Don’t forget the pg_reload_conf().
pg_basebackup: connection refused. listen_addresses not set to * on main, or the Postgres port 5432 isn’t reachable from the secondary. The install script binds 5432 to 127.0.0.1 only — cluster setup adds an additional bind to the cluster network. Confirm docker port proxpanel-db.
Replica catches up, then falls behind, then catches up. Long-running write transaction on main (mass FUP reset, bulk subscriber import). Wait it out — the slot ensures no WAL is lost.
wal_status = lost on a slot. The main ran out of wal_keep_size and recycled WAL the replica hadn’t yet consumed. You must pg_basebackup again. Set wal_keep_size higher or use a paid backup service that captures WAL externally.
Postgres won’t start after rebase. Permissions. The pg_basebackup command runs as root inside an alpine container; Postgres won’t open a data directory it doesn’t own. The setup script ends with chown -R 999:999 /data — if you ran it by hand and skipped that, the container exits with permission denied.
Replica is read-only but the panel UI is showing live updates. That’s the heartbeat coming in over the API layer — heartbeats write to cluster_nodes on the main, which is then replicated back. The secondary’s UI dashboards refresh from its own (read-only) DB just fine.

Permissions

Running these commands requires shell access on the host (root or docker group). Inside the panel UI, the cluster setup actions are admin-only.

HA Cluster — the panel-level wrapping of this replication.
Failover — pg_promote() + Redis + DNS in one workflow.
Backups & Recovery — replication does not protect against DROP TABLE; backups do.