When 'logged in' means 'admin': a cluster of role-confusion bugs in self-hosted dashboards

The shape that matters. Several self-hosted dashboards shipped routes that check authentication ("are you logged in?") where they needed authorization ("are you an admin / do you own this?"). The result ranges from cross-tenant telemetry leaks to a RoleMember running shell on every server. The bug is a one-word mistake in routing: commonHandler where adminHandler belonged.

This week produced a cluster of advisories in the kind of software that runs quietly on a VPS in every small team: self-hosted server-monitoring and container-management dashboards. The products differ, but the bugs rhyme, and the rhyme is one of the most common, most under-tested vulnerability classes there is: broken function-level authorization, sometimes called missing authorization or BFLA. The endpoint confirms you are someone, and then forgets to confirm you are the right someone.

Nezha: one routing mistake, escalating impact

Nezha is a popular self-hosted monitoring dashboard with two roles: RoleAdmin and RoleMember. A run of GHSAs shows the same wiring error at different severities, which is what makes it such a clean teaching case:

Cross-tenant telemetry disclosure (CVSS 5.0). Any authenticated non-admin member can connect to the server-status WebSocket and receive telemetry for all servers, including those owned by other users. The normal REST list filters objects by HasPermission; the WebSocket stream treats “any authenticated user” as authorization for the full, unfiltered server list. The check existed in one path and was simply absent in the other.
RoleMember-reachable SSRF with full response reflection (CVSS 7.5). The notification routes were wired through the any-authenticated handler rather than the admin handler. A RoleMember can make the server send an HTTP request to a user-controlled URL and reflect the entire response body back: a server-side request forgery with a built-in exfiltration channel.
Cross-tenant remote code execution (CVSS 9.5). The worst of the set: the cron routes were also wired through the any-authenticated handler, and the per-server permission check on cron creation had a vacuous-true bypass. A RoleMember can create a scheduled task that runs a shell command on every connected server: cross-tenant RCE from a low-privilege account.

Read those three together and the pattern is unmistakable. The same mistake (route an admin-or-owner action through the “are you logged in” handler) produces a 5.0 in one place, a 7.5 in another, and a 9.5 in the third. The severity is set by what the unauthorized action does, but the bug is identical everywhere it appears.

Arcane: the same error in container management

Arcane, a container-management dashboard, shows the partner failure. Its PUT /api/environments/{id}/templates/variables endpoint (which writes the system-wide .env.global file merged into every project's compose configuration) was missing an admin authorization check. Any authenticated non-admin user could call it with their bearer token or API key and overwrite global environment variables that flow into every project. From there, poisoning a global variable that feeds a container's configuration is a short hop to influencing what those containers do. Same class, different blast radius: a write endpoint that should have been admin-only accepted any logged-in caller.

Why this class hides from scanners

Broken function-level authorization is hard for automated tools precisely because the endpoint works. It returns 200. It is not crashing, not erroring, not obviously injecting anything. The only way to catch it is to be authenticated as a low-privilege user and confirm you can reach a high-privilege action, which requires understanding the role model, holding two sets of credentials, and comparing what each can do. A scanner that only knows “is this URL reachable” sees a normal, healthy endpoint. A test that knows “a RoleMember reached an admin-only route” sees the vulnerability.

# BFLA test: needs a LOW-privilege account and the admin route map. (your own instance)
# Authenticate as a non-admin, then attempt an action that should be admin/owner-only.
MEMBER_TOKEN="...token for a RoleMember / non-admin..."
# Should be FORBIDDEN for a member; a 2xx is the finding.
curl -s -o /dev/null -w 'member->admin-route: %{http_code}\n' \
  -H "Authorization: Bearer $MEMBER_TOKEN" \
  -X POST "https://dash.example/api/v1/cron" -d '{"command":"id","servers":[1]}'
# And confirm a cross-tenant read is filtered:
curl -s -H "Authorization: Bearer $MEMBER_TOKEN" "wss-or-https://dash.example/api/ws/server" \
  | grep -c '"server_id"'   # should only return servers the member owns
# FINDING = a non-admin reaches an admin/owner action OR sees another tenant's data.

The fix

Upgrade Nezha and Arcane to the patched releases that move these routes onto the admin handler and add the missing ownership checks.
Authorize at the function, every time. Every state-changing or cross-tenant route must check the caller's role and object ownership, not just that a session exists. “Authenticated” is not “authorized.”
Filter in every code path, not just the obvious one. The Nezha WebSocket leak happened because the REST path filtered and the stream did not. The same authorization predicate must guard every way to reach the data.
Default-deny. New routes should require explicit authorization to be reachable, so forgetting the check fails closed, not open.

How Celvex Sentry tests for this

Our continuous-monitoring suite carries a function-level authorization probe that authenticates as a low-privilege principal and attempts the high-privilege and cross-tenant actions a role model says it should be denied. This is the only reliable way to surface this class. When a non-admin provably reaches an admin action, or one tenant provably reads another's data, we mint a Proof Capsule with the request, the unexpected success, and the authorize-at-the-function fix attached. When every privileged route fails closed, we record a PASS.

Sources

Get your exposure check: full report in 4-24 hours

Real assessment on production-grade infrastructure. We prove what is exploitable and attach the fix. Paying customers get priority capacity.

Queue My Assessment