Endpoints — Cluster
The cluster API is used to configure a two-node HA pair (main + secondary with PostgreSQL streaming replication), to monitor the cluster, and to perform one-click failover when the main server is unreachable. This is the same API the Settings → Cluster UI calls.
Base URL
Section titled “Base URL”https://your-panel-host/api/clusterAuthentication
Section titled “Authentication”The cluster API has two auth modes:
| Route group | Auth |
|---|---|
/api/cluster/join, /heartbeat, /promote, /notify, /uploads | Cluster secret — X-Cluster-Secret: <secret> header. Used by nodes to talk to each other. |
/api/cluster/* (the rest) | JWT with admin role. Used by operators in the UI. |
The cluster secret is generated when the main server is set up and copied to the secondary at join time. It is not the same as the license key.
POST /api/cluster/setup-main
Section titled “POST /api/cluster/setup-main”Configure the local server as the main of a new cluster. Generates a cluster id + secret, sets wal_level=replica, max_wal_senders=10, creates the cluster_config, cluster_nodes, and cluster_events tables on first call.
Permission: admin.
Request
POST /api/cluster/setup-mainAuthorization: Bearer <admin-jwt>Content-Type: application/jsonBody: none required. Optional fields:
| Field | Type | Description |
|---|---|---|
display_name | string | Human label — defaults to the hostname |
Response — 200 OK
{ "success": true, "data": { "cluster_id": "cluster_a8f3kj92", "cluster_secret": "csec_lpqz2v9j...", "role": "main", "node_id": 1 }}Save the cluster_secret — it is shown once in the UI (“Copy” button) and never again. The secondary will need it to join.
Errors
| Status | message | Cause |
|---|---|---|
| 409 | already configured (role=main) | Idempotent — returns the existing cluster_id |
| 500 | failed to set wal_level — postgres restart required | Some Postgres tuning needs a container restart, not a runtime SET |
POST /api/cluster/setup-secondary
Section titled “POST /api/cluster/setup-secondary”Configure the local server as a secondary, replicating from the given main.
Permission: admin.
Body
| Field | Type | Required | Description |
|---|---|---|---|
main_ip | string | yes | IP or hostname of the main server |
cluster_secret | string | yes | Secret from setup-main |
display_name | string | no | Label for this node |
The handler:
- Hits
POST <main>/api/cluster/test-connectionto verify API + DB + Redis are reachable. - Calls
POST <main>/api/cluster/joinwith the local server’s IP + hostname. - Receives the DB connection string + a dedicated replication slot id.
- Generates a
standby.signalfile and apg_basebackupscript, restarts the local Postgres in replica mode. - Stops the local RADIUS (it will run only on the main during normal operation).
- Writes a
cluster_configrow withrole=secondary.
Response — 200 OK
{ "success": true, "data": { "role": "secondary", "main_ip": "203.0.113.10", "replication_slot": "replica_node_2", "node_id": 2 }}Errors
| Status | message | Cause |
|---|---|---|
| 400 | cluster_secret invalid | Wrong secret |
| 503 | main server unreachable | Test-connection failed — see below |
GET /api/cluster/status
Section titled “GET /api/cluster/status”Cluster overview — all registered nodes, their last heartbeat, CPU / memory / disk %, current replication lag.
Permission: admin.
Response — 200 OK
{ "success": true, "data": { "cluster_id": "cluster_a8f3kj92", "local_role": "secondary", "nodes": [ { "id": 1, "ip": "203.0.113.10", "role": "main", "status": "online", "last_seen": "2026-05-12T11:45:01Z", "cpu_pct": 12.4, "mem_pct": 38.2, "disk_pct": 41.0 }, { "id": 2, "ip": "203.0.113.11", "role": "secondary", "status": "online", "last_seen": "2026-05-12T11:45:03Z", "cpu_pct": 4.1, "mem_pct": 18.7, "disk_pct": 41.0, "replication_lag_sec": 0.8 } ], "recent_events": [ { "type": "node_joined", "node_id": 2, "at": "2026-05-11T09:00:00Z" } ] }}curl https://panel.example.com/api/cluster/status \ -H "Authorization: Bearer ..."replication_lag_sec comes from SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) on the replica. > 30 s is a yellow flag; > 120 s is red.
GET /api/cluster/check-main-status
Section titled “GET /api/cluster/check-main-status”A focused health check that the UI polls every 30 s on the secondary. Returns whether the main is online and how long it has been unreachable.
Permission: admin.
Response — 200 OK
{ "success": true, "data": { "main_ip": "203.0.113.10", "main_online": false, "offline_seconds": 312, "can_promote": true }}can_promote = true once offline_seconds exceeds 120 (the failover threshold). The UI hides the “Promote to Main” button until then.
POST /api/cluster/promote-to-main
Section titled “POST /api/cluster/promote-to-main”The big red button. Promote the local secondary to main.
Permission: admin.
The handler runs in this order:
- Re-check the main is unreachable (sanity guard).
- Warn if
replication_lag_sec > 30. Operator must includeforce: trueto proceed past 30 s lag. - Call
SELECT pg_promote()— PostgreSQL becomes primary (accepts writes). - Stop Redis replication (
REPLICAOF NO ONE). - Update
cluster_config.role = 'main'. - Mark the old main
status='failed'incluster_nodes. - Notify remaining nodes of the new main via
POST /cluster/notify. - Restart the RADIUS container so it picks up its now-primary DB.
Body
| Field | Type | Required | Description |
|---|---|---|---|
force | bool | no | Override lag warning and proceed with stale replica |
Response — 200 OK
{ "success": true, "data": { "promoted_at": "2026-05-12T11:55:00Z", "new_role": "main", "replication_lag_at_promote_sec": 0.8 }}After this, update the MikroTik RADIUS pointer (/radius set [find] address=<new-main-ip>) and the DNS record / Cloudflare LB origin. Old main DB must be re-cloned from the new main before it can rejoin as a secondary.
Errors
| Status | message | Cause |
|---|---|---|
| 409 | main is online — refusing to promote | The original main is responsive |
| 412 | replication lag too high (X seconds), use force=true to override | Lag exceeds 30 s and force not set |
| 500 | pg_promote() failed | DB-side error — see API logs |
POST /api/cluster/recover-from-server
Section titled “POST /api/cluster/recover-from-server”Run on a fresh install to seed itself from an existing production server. Used for disaster recovery when the main is gone.
Permission: admin (on the new server).
The handler:
- SSHes to the source server using the provided root password.
- Runs
pg_dumpon the source. - Downloads the dump.
- Restores into the local Postgres.
- Rsyncs
/opt/proxpanel/frontend/dist/uploads/(logos, favicons). - Writes the new server’s
cluster_configwithrole='main'.
Body
| Field | Type | Required | Description |
|---|---|---|---|
source_ip | string | yes | IP of the source server |
source_password | string | yes | Root password — used only for SSH session, not stored |
source_port | int | no | Default 22 |
Response — 200 OK
{ "success": true, "data": { "dump_size_bytes": 312456789, "tables_restored": 142, "subscribers_count": 8421, "duration_seconds": 184 }}curl -X POST https://new-server.example.com/api/cluster/recover-from-server \ -H "Authorization: Bearer ..." \ -H "Content-Type: application/json" \ -d '{"source_ip":"203.0.113.10","source_password":"the-old-root-pw"}'Errors
| Status | message | Cause |
|---|---|---|
| 503 | cannot connect to source server | SSH failed |
| 500 | pg_dump failed: ... | Source Postgres rejected the dump |
| 500 | restore failed: ... | Local Postgres rejected the import |
POST /api/cluster/test-source-connection
Section titled “POST /api/cluster/test-source-connection”Dry-run for recover-from-server — confirms SSH + Postgres reachability without doing anything. Body matches recover (source_ip, source_password, optional source_port). Returns { ssh_ok, postgres_ok, estimated_dump_size_bytes }.
Permission: admin.
POST /api/cluster/test-connection
Section titled “POST /api/cluster/test-connection”Used by the secondary during setup to verify the main is reachable on API, DB (Postgres), and Redis ports. Body: { "main_ip": "203.0.113.10", "cluster_secret": "csec_..." }. Returns { api_ok, postgres_ok, redis_ok }; on any failure the corresponding error is in data.errors.
Permission: admin.
POST /api/cluster/failover (manual)
Section titled “POST /api/cluster/failover (manual)”Planned switchover (vs the emergency promote). Fences writes on the current main, waits for replica to catch up, then promotes. Use during planned maintenance windows.
Permission: admin.
Body: { "target_node_id": 2, "drain_seconds": 30 } (drain_seconds defaults to 30 — how long to wait for connections to drain). Returns { old_main_id, new_main_id, drained_connections, completed_at }.
DELETE /api/cluster/nodes/:id · POST /api/cluster/leave
Section titled “DELETE /api/cluster/nodes/:id · POST /api/cluster/leave”Remove a node from the cluster.
DELETE /nodes/:id(called from the main): drops the replication slot, marks the row removed.POST /leave(called from a secondary): tells the main to drop us, then wipes the localcluster_config.
Permission: admin.
Errors
Section titled “Errors”{ "success": false, "message": "main is online — refusing to promote" }| Status | Meaning |
|---|---|
| 400 | Validation — message describes |
| 401 | Missing / invalid JWT (or wrong X-Cluster-Secret on internal routes) |
| 403 | Not an admin |
| 409 | State conflict — already main, main still online, etc. |
| 412 | Pre-condition failed — replication lag too high, source unreachable |
| 503 | Network failure to peer node |
Rate limits
Section titled “Rate limits”Internal cluster routes (/heartbeat, /join, /promote, /notify) bypass the global 300 req/min limit and are gated only by the cluster-secret check. The heartbeat fires every 30 s per node.
Admin routes follow the standard 300 req/min/IP global limit.
Related pages
Section titled “Related pages”- HA Cluster — UI walk-through for the same operations
- Authentication — admin JWT required for the operator-facing routes
- Backups — full backup is a prerequisite before promote