Skip to content

Schema-Per-Tenant Isolation

Every tenant on a ProxPanel SaaS cluster lives in its own Postgres schema. From the API’s perspective every tenant has its own set of tables — tenant_acme.subscribers, tenant_bigisp.subscribers, tenant_demo.subscribers. They share the same physical Postgres instance and the same connection pool, but the rows never mix. This page documents how that isolation is implemented, where the security boundary actually sits, and the design decisions behind it.

LayerMechanism
PhysicalOne Postgres instance, one Redis instance, one API container. Cheap to operate.
LogicalOne Postgres schema per tenant, named tenant_<slug> (dashes become underscores).
Per-requestSET search_path TO tenant_<slug>, public issued at the start of every HTTP request.
Per-connectionA GORM connection pool keyed by schema name — connections retain their search_path for the lifetime of the connection.
AuthJWTs carry tenant_id; the auth middleware rejects requests whose JWT tenant doesn’t match the hostname’s tenant.
ApplicationEvery handler calls middleware.GetTenantDBFromCtx(c) instead of database.DB directly.

The combined effect: a query against subscribers running for tenant Acme physically reads from tenant_acme.subscribers because Postgres’s name resolution uses the connection’s search_path. Tenant Acme’s session token cannot be replayed against tenant BigISP because the JWT tenant claim won’t match the resolved tenant from the hostname.

  1. TLS terminates at nginx. nginx sets the X-Forwarded-Host header and proxies to the API container.

  2. TenantMiddleware extracts the tenant. In backend/internal/middleware/tenant.go, the hostname is parsed:

    • If it matches *.saas.proxrad.com, the subdomain is the slug.
    • Otherwise, the full hostname is looked up against tenants.custom_domain.
    • The result (tenant ID + schema name) is cached for 60 seconds in tenantCache to avoid hitting Postgres on every request.
  3. The tenant-scoped DB handle is bound. database.GetTenantDB(schemaName) returns a *gorm.DB whose session has search_path pre-set to the tenant’s schema. It is stored in c.Locals("tenant_db").

  4. AuthMiddleware verifies the JWT. The JWT’s tenant_id claim is compared against c.Locals("tenant_id"). Mismatch → 403.

  5. The handler runs. Every database call is middleware.GetTenantDBFromCtx(c).Where(...) etc. The query is generated by GORM as SELECT ... FROM subscribers ... — unqualified — and Postgres resolves it against the connection’s search_path.

  6. The connection returns to the pool. Its search_path is preserved. Next request that uses this connection issues SET search_path again (idempotent — usually a no-op for warm connections to the same tenant).

The whole tenant-resolution path adds about 0.3 ms to a warm request and a cache-miss path of around 2 ms — negligible at SaaS-host scale.

Why schema-per-tenant, not database-per-tenant

Section titled “Why schema-per-tenant, not database-per-tenant”

The two common approaches to multi-tenancy in Postgres are database-per-tenant (each tenant gets its own CREATE DATABASE) and schema-per-tenant (each tenant gets a schema inside one shared database). ProxPanel uses the latter. The reasoning:

ConcernDatabase-per-tenantSchema-per-tenant
Connection poolOne pool per database — 50 tenants × 25 conns = 1,250 connectionsOne pool shared, ~250 connections for the whole cluster
Cross-tenant queries (e.g. super-admin reports)Multiple connections, manual UNIONsA single query against pg_namespace enumerates schemas
Backup / restore granularitypg_dump -d tenantX — cleanpg_dump --schema=tenant_x — equally clean
Schema migrationsRun migrations 50× — slow and error-proneLoop over schemas inside a transaction, atomic
WAL / replicationEach database has its own WAL — many small streamsOne WAL stream, simpler replication topology
Postgres overhead per tenantAbout 8 MB of shared buffers + a connection’s worth of memoryNegligible — schemas are cheap

At the scale of 50–500 small tenants per host, schema-per-tenant is markedly cheaper to run. Database-per-tenant becomes appealing only when individual tenants are large enough that you’d give them dedicated hardware anyway — in which case you’d just deploy a standalone ProxPanel install for them.

The isolation is enforced at four points. Every one of them has to be intact for a cross-tenant leak to be impossible.

The tenant is resolved from the request hostname, not from any user-controlled input inside the request. An attacker cannot pass ?tenant=acme to read Acme’s data. The hostname comes from TLS SNI, which the attacker controls — but if they SNI for acme.saas.proxrad.com they hit Acme’s tenant context, and their own JWT (issued for tenant evilcorp) is rejected at step 4 above.

Every JWT issued by a tenant’s /auth/login carries the tenant ID. AuthMiddleware compares the JWT’s tenant_id to the hostname-derived tenant ID. A token issued by tenant A cannot authenticate against tenant B even if the attacker controls A’s admin password.

The search_path is set explicitly at the start of every connection use, before any query runs. There is no code path where a query runs without an explicit search_path set first. If the resolution fails (no tenant found), the request is rejected with a 404 before any database call.

4. No raw database.DB writes in tenant handlers

Section titled “4. No raw database.DB writes in tenant handlers”

The handler-level rule: never call database.DB.Create, database.DB.Where, etc. directly from a tenant-scoped handler. Always go through middleware.GetTenantDBFromCtx(c). The exceptions are:

  • Super-admin handlers under /api/saas/*, which deliberately operate on the public schema (the tenant registry).
  • Background services that don’t have a request context (backup scheduler, daily quota reset) — they loop over tenants explicitly and create per-tenant DB handles inside the loop.

Some data must live outside the per-tenant schemas:

TableLives inWhy
tenantspublicThe registry. The tenant resolver itself queries this; it can’t be tenant-scoped.
saas_billingpublicBilling across tenants is a super-admin concern.
saas_audit_logspublicSuper-admin actions (suspend tenant, delete tenant) are logged centrally so a deleted tenant doesn’t take its own audit trail with it.
Redis keysRedisNamespaced by tenant_<id>:... prefix. The Postgres search_path mechanism doesn’t apply.

The original SaaS implementation tried to set search_path via a GORM Session() callback on every query. It failed in two ways:

  1. Preload chains skipped the callback — GORM issued the preload query on a fresh session that didn’t inherit the search_path override, so a db.Preload("Service").Find(&subs) would fetch subscribers from the right schema but services from public. The services were always empty for tenants and the bug went undetected for two weeks.

  2. Connection reuse confused the cache — a pooled connection that had been used by tenant Acme would, if returned to the pool with search_path still set, sometimes get picked up by a tenant BigISP request that didn’t re-issue SET search_path. The “rare” data leak was real but only happened when two near-simultaneous requests landed on the same connection.

The fix was to abandon per-query callbacks and instead maintain a separate *gorm.DB per tenant, keyed by schema name. database.GetTenantDB(schemaName) returns the cached handle for that tenant; its connection pool is configured so that every new connection starts with SET search_path TO tenant_<slug>, public baked in via a BeforeQuery callback that always runs.

The frontend has no SaaS-specific code at all. The API is the same for tenant admins and standalone admins; React only sees the data the API returns.

If you suspect a leak, the verification path is:

  1. Identify two tenants with disjoint subscriber sets — e.g. tenant Acme has subscriber alice@acme, tenant BigISP has subscriber bob@bigisp.

  2. Log in to Acme and confirm only alice@acme is visible. Capture the JWT.

  3. Replay the JWT against BigISP’s hostname: curl -H "Authorization: Bearer <acme-jwt>" https://bigisp.saas.proxrad.com/api/subscribers. Expected: HTTP 403, tenant mismatch. Anything else is a bug.

  4. From Postgres directly: SET search_path TO tenant_acme, public; SELECT username FROM subscribers; should return only Acme’s subscribers; SET search_path TO tenant_bigisp, public; SELECT username FROM subscribers; should return only BigISP’s.

  5. Inspect the API container under load: docker exec proxpanel-api curl -s localhost:8080/health should always succeed regardless of the active tenant — the health endpoint is tenant-agnostic and runs on public.

If step 3 returns BigISP’s data instead of a 403, stop the API container immediately and contact info@proxrad.com with the JWT, the URL, and a docker logs snapshot. That is the single failure mode that justifies an emergency response.

  • Schemas per database: Postgres handles thousands of schemas without trouble. ProxPanel SaaS caps at 500 per host as a soft limit; beyond that, split the cluster.
  • Tables per schema: about 60 tables per tenant — at 500 tenants that’s 30,000 tables in one database. Postgres handles this fine; pg_stat_all_tables queries become slow but no production query path touches them.
  • Backup time: pg_dump --schema=tenant_<slug> of a 1,000-subscriber tenant takes ~2 seconds. Full-cluster pg_dumpall of 50 tenants takes ~2 minutes.
  • Migration time: running a DDL change (ALTER TABLE subscribers ADD COLUMN ...) across 50 schemas takes ~30 seconds in a transaction. Plan migrations as cluster-wide maintenance windows.

When a database migration ships (new column, new table, new index), it must be applied to every tenant schema, not just to public. The migration framework iterates:

for _, tenant := range listActiveTenants() {
tx := db.Begin()
tx.Exec(fmt.Sprintf("SET search_path TO %s", tenant.SchemaName))
runMigration(tx)
tx.Commit()
}

Each tenant gets its own transaction. If a migration fails for tenant Acme, BigISP’s migration still commits — the failure is isolated and surfaced to the operator. The retry path lets the operator fix Acme’s problem (usually a data-conflict) and re-run for just that one tenant.

Schema migrations on a 50-tenant cluster take 20–60 seconds total. The cluster stays available during the migration; queries against a schema that hasn’t been migrated yet still use the old DDL (Postgres handles concurrent DDL + DML correctly via MVCC). Brief query blocking is possible during the migration of a heavily-used table, which is why migrations are scheduled outside business hours.

Cross-tenant queries — the super-admin path

Section titled “Cross-tenant queries — the super-admin path”

Super-admin reports need to aggregate across tenants — total subscriber count across the cluster, total revenue, etc. The pattern is explicit enumeration:

totalSubs := 0
for _, tenant := range listActiveTenants() {
tenantDB := database.GetTenantDB(tenant.SchemaName)
var count int64
tenantDB.Model(&Subscriber{}).Count(&count)
totalSubs += int(count)
}

This is N queries for N tenants. For 50 tenants and a simple COUNT(*), the whole loop takes about 200 ms. For larger queries (revenue aggregation, complex joins), the loop dominates request time — these endpoints are cached aggressively on the super-admin side (5-minute Redis cache).

We deliberately do not use Postgres pg_namespace enumeration to write cross-schema queries dynamically. The explicit loop is more verbose, but every tenant access is logged through the same path and the loop’s existence is grep-able for audit purposes.

Schema isolation is enforced by the platform, not by per-user permissions. Inside a tenant the normal permission system applies (see Users & Permissions). Super-admins have a separate authentication system documented in Super Admin Console.