Schema-Per-Tenant Isolation
Every tenant on a ProxPanel SaaS cluster lives in its own Postgres schema. From the API’s perspective every tenant has its own set of tables — tenant_acme.subscribers, tenant_bigisp.subscribers, tenant_demo.subscribers. They share the same physical Postgres instance and the same connection pool, but the rows never mix. This page documents how that isolation is implemented, where the security boundary actually sits, and the design decisions behind it.
The isolation model
Section titled “The isolation model”| Layer | Mechanism |
|---|---|
| Physical | One Postgres instance, one Redis instance, one API container. Cheap to operate. |
| Logical | One Postgres schema per tenant, named tenant_<slug> (dashes become underscores). |
| Per-request | SET search_path TO tenant_<slug>, public issued at the start of every HTTP request. |
| Per-connection | A GORM connection pool keyed by schema name — connections retain their search_path for the lifetime of the connection. |
| Auth | JWTs carry tenant_id; the auth middleware rejects requests whose JWT tenant doesn’t match the hostname’s tenant. |
| Application | Every handler calls middleware.GetTenantDBFromCtx(c) instead of database.DB directly. |
The combined effect: a query against subscribers running for tenant Acme physically reads from tenant_acme.subscribers because Postgres’s name resolution uses the connection’s search_path. Tenant Acme’s session token cannot be replayed against tenant BigISP because the JWT tenant claim won’t match the resolved tenant from the hostname.
How a request lands in the right schema
Section titled “How a request lands in the right schema”-
TLS terminates at nginx. nginx sets the
X-Forwarded-Hostheader and proxies to the API container. -
TenantMiddleware extracts the tenant. In
backend/internal/middleware/tenant.go, the hostname is parsed:- If it matches
*.saas.proxrad.com, the subdomain is the slug. - Otherwise, the full hostname is looked up against
tenants.custom_domain. - The result (tenant ID + schema name) is cached for 60 seconds in
tenantCacheto avoid hitting Postgres on every request.
- If it matches
-
The tenant-scoped DB handle is bound.
database.GetTenantDB(schemaName)returns a*gorm.DBwhose session hassearch_pathpre-set to the tenant’s schema. It is stored inc.Locals("tenant_db"). -
AuthMiddleware verifies the JWT. The JWT’s
tenant_idclaim is compared againstc.Locals("tenant_id"). Mismatch → 403. -
The handler runs. Every database call is
middleware.GetTenantDBFromCtx(c).Where(...)etc. The query is generated by GORM asSELECT ... FROM subscribers ...— unqualified — and Postgres resolves it against the connection’ssearch_path. -
The connection returns to the pool. Its
search_pathis preserved. Next request that uses this connection issuesSET search_pathagain (idempotent — usually a no-op for warm connections to the same tenant).
The whole tenant-resolution path adds about 0.3 ms to a warm request and a cache-miss path of around 2 ms — negligible at SaaS-host scale.
Why schema-per-tenant, not database-per-tenant
Section titled “Why schema-per-tenant, not database-per-tenant”The two common approaches to multi-tenancy in Postgres are database-per-tenant (each tenant gets its own CREATE DATABASE) and schema-per-tenant (each tenant gets a schema inside one shared database). ProxPanel uses the latter. The reasoning:
| Concern | Database-per-tenant | Schema-per-tenant |
|---|---|---|
| Connection pool | One pool per database — 50 tenants × 25 conns = 1,250 connections | One pool shared, ~250 connections for the whole cluster |
| Cross-tenant queries (e.g. super-admin reports) | Multiple connections, manual UNIONs | A single query against pg_namespace enumerates schemas |
| Backup / restore granularity | pg_dump -d tenantX — clean | pg_dump --schema=tenant_x — equally clean |
| Schema migrations | Run migrations 50× — slow and error-prone | Loop over schemas inside a transaction, atomic |
| WAL / replication | Each database has its own WAL — many small streams | One WAL stream, simpler replication topology |
| Postgres overhead per tenant | About 8 MB of shared buffers + a connection’s worth of memory | Negligible — schemas are cheap |
At the scale of 50–500 small tenants per host, schema-per-tenant is markedly cheaper to run. Database-per-tenant becomes appealing only when individual tenants are large enough that you’d give them dedicated hardware anyway — in which case you’d just deploy a standalone ProxPanel install for them.
Security boundaries
Section titled “Security boundaries”The isolation is enforced at four points. Every one of them has to be intact for a cross-tenant leak to be impossible.
1. Hostname → tenant resolution
Section titled “1. Hostname → tenant resolution”The tenant is resolved from the request hostname, not from any user-controlled input inside the request. An attacker cannot pass ?tenant=acme to read Acme’s data. The hostname comes from TLS SNI, which the attacker controls — but if they SNI for acme.saas.proxrad.com they hit Acme’s tenant context, and their own JWT (issued for tenant evilcorp) is rejected at step 4 above.
2. JWT tenant claim
Section titled “2. JWT tenant claim”Every JWT issued by a tenant’s /auth/login carries the tenant ID. AuthMiddleware compares the JWT’s tenant_id to the hostname-derived tenant ID. A token issued by tenant A cannot authenticate against tenant B even if the attacker controls A’s admin password.
3. search_path is set per request
Section titled “3. search_path is set per request”The search_path is set explicitly at the start of every connection use, before any query runs. There is no code path where a query runs without an explicit search_path set first. If the resolution fails (no tenant found), the request is rejected with a 404 before any database call.
4. No raw database.DB writes in tenant handlers
Section titled “4. No raw database.DB writes in tenant handlers”The handler-level rule: never call database.DB.Create, database.DB.Where, etc. directly from a tenant-scoped handler. Always go through middleware.GetTenantDBFromCtx(c). The exceptions are:
- Super-admin handlers under
/api/saas/*, which deliberately operate on thepublicschema (the tenant registry). - Background services that don’t have a request context (backup scheduler, daily quota reset) — they loop over tenants explicitly and create per-tenant DB handles inside the loop.
What is NOT in tenant_<slug>
Section titled “What is NOT in tenant_<slug>”Some data must live outside the per-tenant schemas:
| Table | Lives in | Why |
|---|---|---|
tenants | public | The registry. The tenant resolver itself queries this; it can’t be tenant-scoped. |
saas_billing | public | Billing across tenants is a super-admin concern. |
saas_audit_logs | public | Super-admin actions (suspend tenant, delete tenant) are logged centrally so a deleted tenant doesn’t take its own audit trail with it. |
| Redis keys | Redis | Namespaced by tenant_<id>:... prefix. The Postgres search_path mechanism doesn’t apply. |
GORM middleware fix — the long story
Section titled “GORM middleware fix — the long story”The original SaaS implementation tried to set search_path via a GORM Session() callback on every query. It failed in two ways:
-
Preloadchains skipped the callback — GORM issued the preload query on a fresh session that didn’t inherit thesearch_pathoverride, so adb.Preload("Service").Find(&subs)would fetch subscribers from the right schema but services frompublic. The services were always empty for tenants and the bug went undetected for two weeks. -
Connection reuse confused the cache — a pooled connection that had been used by tenant Acme would, if returned to the pool with
search_pathstill set, sometimes get picked up by a tenant BigISP request that didn’t re-issueSET search_path. The “rare” data leak was real but only happened when two near-simultaneous requests landed on the same connection.
The fix was to abandon per-query callbacks and instead maintain a separate *gorm.DB per tenant, keyed by schema name. database.GetTenantDB(schemaName) returns the cached handle for that tenant; its connection pool is configured so that every new connection starts with SET search_path TO tenant_<slug>, public baked in via a BeforeQuery callback that always runs.
The frontend has no SaaS-specific code at all. The API is the same for tenant admins and standalone admins; React only sees the data the API returns.
Verifying isolation manually
Section titled “Verifying isolation manually”If you suspect a leak, the verification path is:
-
Identify two tenants with disjoint subscriber sets — e.g. tenant Acme has subscriber
alice@acme, tenant BigISP has subscriberbob@bigisp. -
Log in to Acme and confirm only
alice@acmeis visible. Capture the JWT. -
Replay the JWT against BigISP’s hostname:
curl -H "Authorization: Bearer <acme-jwt>" https://bigisp.saas.proxrad.com/api/subscribers. Expected: HTTP 403,tenant mismatch. Anything else is a bug. -
From Postgres directly:
SET search_path TO tenant_acme, public; SELECT username FROM subscribers;should return only Acme’s subscribers;SET search_path TO tenant_bigisp, public; SELECT username FROM subscribers;should return only BigISP’s. -
Inspect the API container under load:
docker exec proxpanel-api curl -s localhost:8080/healthshould always succeed regardless of the active tenant — the health endpoint is tenant-agnostic and runs onpublic.
If step 3 returns BigISP’s data instead of a 403, stop the API container immediately and contact info@proxrad.com with the JWT, the URL, and a docker logs snapshot. That is the single failure mode that justifies an emergency response.
Costs and limits
Section titled “Costs and limits”- Schemas per database: Postgres handles thousands of schemas without trouble. ProxPanel SaaS caps at 500 per host as a soft limit; beyond that, split the cluster.
- Tables per schema: about 60 tables per tenant — at 500 tenants that’s 30,000 tables in one database. Postgres handles this fine;
pg_stat_all_tablesqueries become slow but no production query path touches them. - Backup time:
pg_dump --schema=tenant_<slug>of a 1,000-subscriber tenant takes ~2 seconds. Full-clusterpg_dumpallof 50 tenants takes ~2 minutes. - Migration time: running a DDL change (
ALTER TABLE subscribers ADD COLUMN ...) across 50 schemas takes ~30 seconds in a transaction. Plan migrations as cluster-wide maintenance windows.
Migrations across the cluster
Section titled “Migrations across the cluster”When a database migration ships (new column, new table, new index), it must be applied to every tenant schema, not just to public. The migration framework iterates:
for _, tenant := range listActiveTenants() { tx := db.Begin() tx.Exec(fmt.Sprintf("SET search_path TO %s", tenant.SchemaName)) runMigration(tx) tx.Commit()}Each tenant gets its own transaction. If a migration fails for tenant Acme, BigISP’s migration still commits — the failure is isolated and surfaced to the operator. The retry path lets the operator fix Acme’s problem (usually a data-conflict) and re-run for just that one tenant.
Schema migrations on a 50-tenant cluster take 20–60 seconds total. The cluster stays available during the migration; queries against a schema that hasn’t been migrated yet still use the old DDL (Postgres handles concurrent DDL + DML correctly via MVCC). Brief query blocking is possible during the migration of a heavily-used table, which is why migrations are scheduled outside business hours.
Cross-tenant queries — the super-admin path
Section titled “Cross-tenant queries — the super-admin path”Super-admin reports need to aggregate across tenants — total subscriber count across the cluster, total revenue, etc. The pattern is explicit enumeration:
totalSubs := 0for _, tenant := range listActiveTenants() { tenantDB := database.GetTenantDB(tenant.SchemaName) var count int64 tenantDB.Model(&Subscriber{}).Count(&count) totalSubs += int(count)}This is N queries for N tenants. For 50 tenants and a simple COUNT(*), the whole loop takes about 200 ms. For larger queries (revenue aggregation, complex joins), the loop dominates request time — these endpoints are cached aggressively on the super-admin side (5-minute Redis cache).
We deliberately do not use Postgres pg_namespace enumeration to write cross-schema queries dynamically. The explicit loop is more verbose, but every tenant access is logged through the same path and the loop’s existence is grep-able for audit purposes.
Permissions
Section titled “Permissions”Schema isolation is enforced by the platform, not by per-user permissions. Inside a tenant the normal permission system applies (see Users & Permissions). Super-admins have a separate authentication system documented in Super Admin Console.
Related pages
Section titled “Related pages”- SaaS Overview — when and why to choose SaaS mode.
- Tenant Onboarding — what
CREATE SCHEMA tenant_<slug>actually creates. - Wildcard Subdomain Routing — how hostnames map to tenants.
- Super Admin Console — managing the
public.tenantsregistry.