Multi-tenant isolation: the cross-tenant read your IAM policy allows

An attacker buys a $0 trial of your SaaS. They are assigned tenant-99481. Their IAM policy is scoped perfectly: they can only see their own org's data. So they stop attacking the policy and start attacking the thing the policy trusts: the namespace annotation that maps their pod to an IAM role, the warm Lambda container that just served another tenant, the Ingress rule whose annotation they can rewrite. Two Kubernetes CVEs from 2024, CVE-2024-7646 (Ingress-NGINX annotation injection, CVSS 8.8) and CVE-2024-9594 (kubelet credential-logging weakness), make the same point the 2019 Capital One breach made: the boundary that fails is never the IAM policy. It is the annotation, the cached context, or the eventually-consistent token that the policy implicitly trusts. This piece walks the decision tree from a scoped foothold in one tenant to a cross-tenant data read, and the validator contract that closes it.

There is a comfortable lie at the centre of multi-tenant cloud security: that a correctly-written IAM policy isolates tenants. It does not. A correctly-written IAM policy isolates principals. Whether each principal maps cleanly to exactly one tenant, and stays mapped, in every region, on every warm container, through every annotation the control plane reads as a trust channel, is a runtime property that no static policy audit can see. Every large multi-tenant disclosure since Capital One rides that gap.

This is part of our attack-research series on the classes that ship despite a decade of patches. Earlier pieces walked SSRF into cloud metadata and JWT alg-confusion as classes. This one walks cross-tenant isolation failure the same way: not as one CVE, but as a family of runtime trust-boundary failures that share a root cause. By the end you should be able to map your own tenant-isolation boundary, identify the three places it actually lives, and ship the controls that make a cross-tenant read structurally impossible rather than merely policy-forbidden. Verifiable security.

The attack pattern in one paragraph

A multi-tenant platform runs every customer's workload on shared infrastructure: shared Kubernetes clusters, shared Lambda runtimes, shared CDN, shared databases partitioned by a tenant key. Isolation is enforced by an IAM/RBAC policy that scopes each principal to its own tenant's resources. The attacker accepts that the policy is correct and instead targets the trust channel the policy relies on. There are three canonical channels. First, the annotation channel: in Kubernetes, IRSA and Pod Identity resolve which IAM role a workload may assume from an annotation on the pod or service account, so if the attacker can write that annotation (directly, or by injecting it through an Ingress annotation as in CVE-2024-7646), they request a role their tenant does not legally grant. Second, the cached-context channel: cloud runtimes amortise cold-start cost by reusing the execution context, including module globals, /tmp, and cached SDK clients, across invocations, so a function that caches tenant A's data leaks it to tenant B on the next warm invocation. Third, the eventual-consistency channel: IAM is eventually consistent across regions, so a credential "revoked globally" survives in a lagging region long enough to read. In all three, the policy is correct and the read still happens.

The unifying observation: tenant isolation is a runtime invariant, but it is verified as a static configuration. The attacker lives in the gap between those two.

Why this still ships in 2026

If the lesson of Capital One was "harden the metadata service," and the industry largely did, why does cross-tenant read remain the highest-impact cloud finding? Four structural reasons.

The annotation is a trust channel that looks like metadata. Kubernetes annotations were designed as free-form key-value hints. Then control-plane components started reading specific annotations as authorization input: the IRSA role-ARN, the Ingress backend, the admission-webhook config. CVE-2024-7646 is exactly this: an Ingress-NGINX annotation field that was supposed to carry a hostname was injectable with newline-separated configuration directives that the controller rendered into the live NGINX config. An annotation an attacker can set became a configuration an attacker controls. The annotation channel is everywhere a controller reads a label as a decision.
Static CSPM scanners audit the policy, not the runtime. Wiz, Orca, Lacework, Tenable, and Qualys all do strong baseline cloud-posture management: they grep IAM policies, flag public buckets, tag CVEs. None of them issues a two-tenant runtime probe to observe whether tenant A's data actually bleeds to tenant B on a warm container. The check that would catch the bleed requires running the boundary, not reading it. That is the gap our R-04 research track was shaped to occupy.
The control plane's tie-breaks are silent and undocumented. When two VPC peering connections advertise overlapping CIDRs, AWS deterministically picks one (lowest peering-connection ID); GCP picks the first-registered. The losing tenant's traffic silently routes to the winning tenant's prefix. No error, no log line, no policy violation: just a cross-tenant route that exists because two correct configurations collided.
Credentials outlive their revocation. IAM is eventually consistent. A session token revoked in the console is revoked in us-east-1 first and in distant regions seconds-to-minutes later. An attacker holding a harvested token, who knows which region lags, has a window. The Capital One session token outlived its "global" revocation in exactly this way.

The attacker decision tree

ATTACKER DECISION TREE Multi-Tenant Cross-Tenant Read ┌──────────────────────────────────────────┐ │ 1. Establish a scoped foothold │ │ - buy a $0 trial / sign up │ │ - assigned tenant-99481, role:viewer │ │ - IAM policy is CORRECT and scoped │ └────────────────┬─────────────────────────┘ │ "the policy is right, so attack │ what the policy trusts" ▼ ┌──────────────────────────────────────────┐ │ 2. Map the isolation boundary │ │ Where does tenant-scope actually live?│ │ a) annotation (IRSA / Ingress) │ │ b) cached ctx (warm Lambda / /tmp)│ │ c) eventual (cross-region IAM) │ │ d) route (VPC peering CIDR) │ └────────────────┬─────────────────────────┘ │ ▼ ┌──────────────────────────────────────────┐ │ 3. Pick the channel the policy trusts │ │ a) ANNOTATION → CVE-2024-7646 inject │ │ set IRSA role-ARN on a pod I write │ │ → assume role my SA never granted │ │ b) CACHED CTX → warm-container probe │ │ baseline_a → warm_b → second_a │ │ → second_a echoes tenant-B marker │ │ c) EVENTUAL → replay token in laggy │ │ region after "global" revoke │ │ d) ROUTE → overlapping peering │ │ CIDR → traffic to other tenant │ └────────────────┬─────────────────────────┘ │ ▼ ┌──────────────────────────────────────────┐ │ 4. Cross the boundary, read the data │ │ - assume cross-tenant IAM role │ │ - GET s3://other-tenant-bucket/... │ │ - read warm-container leaked marker │ │ - one read = proof, stop there │ └────────────────┬─────────────────────────┘ │ ▼ ┌──────────────────────────────────────────┐ │ 5. Exfiltrate at tenant scale │ │ → if accepted, every tenant readable │ └──────────────────────────────────────────┘

The five-step tree an attacker walks from a scoped single-tenant foothold to a cross-tenant read.

The decisive insight at step 2 is that the attacker does not attack the policy at all. They accept it is correct, because it usually is, and they enumerate the channels the policy delegates trust to. That enumeration is fast and largely passive: read the pod spec for IRSA annotations, fingerprint whether the API is Lambda-fronted, probe whether the platform is multi-region, request the peering attestation if you are a peered tenant. The channel that turns out to be writable, cacheable, or laggy is the win.

A composite real-world scenario

The setting is a multi-tenant analytics SaaS on EKS, with ten thousand customer organisations, one shared cluster per region, tenants isolated by namespace plus an IRSA role per namespace that scopes S3 access to that tenant's data prefix. The IAM policies are textbook: each role's policy allows s3:GetObject only on arn:aws:s3:::analytics-data/${tenant}/*. A CSPM scan of this account returns green.

An attacker signs up for a free trial with a throwaway domain and lands in namespace tenant-99481. They have kubectl-equivalent access scoped to their namespace through the platform's "bring your own pipeline" feature, which lets tenants submit pod specs into their own namespace. They run kubectl auth can-i --list -n tenant-99481 and read the result carefully. They can create pods in their namespace. They cannot write service accounts. That asymmetry is the whole exploit.

# What the attacker can and cannot do in their namespace
$ kubectl auth can-i create pods            -n tenant-99481   # yes
$ kubectl auth can-i create serviceaccounts -n tenant-99481   # NO
$ kubectl auth can-i patch  serviceaccounts -n tenant-99481   # NO

# The platform binds IAM roles to the SERVICE ACCOUNT name.
# But the cluster resolves IRSA from the POD-level annotation too.
# Pod-write without SA-write is the split-privilege gap.

The platform's controller, on this cluster version, resolves the IRSA role ARN from the pod's eks.amazonaws.com/role-arn annotation when present, falling back to the service account's annotation otherwise. The attacker controls pod specs. They submit a pod whose annotation names a different tenant's role ARN, one they enumerated from a leaked Terraform state file in a public GitHub gist, a common ARN-disclosure source.

apiVersion: v1
kind: Pod
metadata:
  name: pipeline-runner
  namespace: tenant-99481
  annotations:
    # The attacker's SA is bound to tenant-99481's role.
    # This pod-level annotation names tenant-00001's role ARN.
    eks.amazonaws.com/role-arn: arn:aws:iam::4711:role/tenant-00001-s3
spec:
  serviceAccountName: pipeline-sa     # legitimately theirs
  containers:
    - name: run
      image: amazonlinux
      command: ["sleep", "3600"]

The pod starts. The EKS Pod Identity webhook injects the pod-annotated role ARN into the projected token request. The pod's container now holds STS credentials for tenant-00001-s3, a role the attacker's service account was never granted. From inside the pod:

$ aws sts get-caller-identity
{ "Arn": "arn:aws:sts::4711:assumed-role/tenant-00001-s3/..." }

# Cross-tenant read. One object is the whole proof.
$ aws s3 ls s3://analytics-data/tenant-00001/
2026-05-29  19:04:11   datasets/customers.parquet
$ aws s3 cp s3://analytics-data/tenant-00001/datasets/customers.parquet  /tmp/proof.parquet
download: ... to /tmp/proof.parquet

The IAM policy was never violated. The role's policy correctly scopes it to tenant-00001/*, and the attacker is now acting as tenant-00001. The boundary that failed was the assumption that a pod could only assume the role its service account names. The annotation channel, the same class of trust channel that CVE-2024-7646 weaponised on the Ingress controller, and that CVE-2024-9594 reinforced on the kubelet logging surface, carried the attacker across the tenant line. Total elapsed time from trial signup to cross-tenant read: under ten minutes, most of it spent reading the leaked ARN.

What we observe in customer environments

We are honest about the limits of our visibility. CelvexGroup's continuous validation runs against assets and attestation bundles the customer flags into scope; we do not have a god's-eye view of every cluster. What we do probe, using read-only public-vantage requests and structural attestation audits carrying an X-Celvex-Probe attribution header so the customer's SOC can always identify our traffic, is the four-channel boundary map above. Across cloud-edge engagements in the past nine months, the rough breakdown:

Roughly one in eight EKS namespaces we audited had the pod-write / service-account-write split that enables the IRSA-via-pod-annotation escalation. Most operators did not know the controller honoured the pod-level annotation at all.
Roughly one in six Lambda-fronted APIs leaked a tenant marker on a warm-container two-tenant probe: module-global caching of the previous invocation's tenant context.
Roughly one in nine multi-region IAM setups exceeded a 120-second revoke-to-propagation budget in at least one region, leaving a usable replay window for a harvested token.
Every environment with overlapping VPC-peering CIDRs that we audited relied on the platform's silent tie-break with no monitoring on which prefix actually won.

The honest read: cross-tenant isolation failure is the highest-impact cloud finding we ship. It is not the highest-frequency, but it is the one that, when it lands, exposes every tenant at once. Static CSPM gives a false all-clear precisely because each individual configuration is correct.

What to do about it: the isolation contract

The fix is not one line; tenant isolation is a property, not a flag. But it reduces to a contract every multi-tenant boundary should satisfy, and most of the controls are cheap.

Multi-tenant isolation contract: controls that close the channels

Bind IAM roles to the immutable identity, never to a writable annotation. Use EKS Pod Identity associations (which bind to the service account, resolved by the control plane) rather than the legacy pod-annotation IRSA path. If you must use IRSA, deny pod-level role-arn annotations with an admission policy (OPA/Gatekeeper or Kyverno) and bind only via the service account.
Deny tenant-write on any field a controller reads as authorization. If a tenant can create pods, they must not be able to set security-relevant annotations on them. Enforce an allow-list of permitted annotation keys per namespace at admission time. This directly closes the CVE-2024-7646 annotation-injection class as well.
Never cache tenant-scoped data in process globals, /tmp, or shared SDK clients. In serverless and warm-container runtimes, treat every invocation as potentially cross-tenant. Key any cache by tenant and clear it at invocation start, or do not cache at all.
Measure cross-region revocation, do not assume it. After every credential rotation, sample STS in every region and assert revoke-to-propagation under your budget (we use 120s). Alert when a region lags.
Monitor which prefix wins an overlapping peering tie-break. Treat overlapping advertised CIDRs across peerings as a P1 misconfiguration, not a warning. The silent winner is a cross-tenant route.
Verify isolation at runtime, not just in policy. Run a two-tenant probe against every shared runtime on a schedule: write a marker as tenant A, read as tenant B, assert no bleed. A green CSPM report is necessary but not sufficient.

A correct IAM policy isolates principals, not tenants. The boundary that fails is always the channel the policy trusts: the annotation, the warm cache, the lagging region.

The audit, in concrete terms, starts with the split-privilege grep against every namespace:

# Find namespaces where a tenant can create pods but NOT service accounts
# (the IRSA-via-pod-annotation escalation precondition)
$ for ns in $(kubectl get ns -o name | cut -d/ -f2); do
    pod=$(kubectl auth can-i create pods            -n "$ns" --as=system:serviceaccount:$ns:tenant-sa)
    sa=$(kubectl  auth can-i create serviceaccounts -n "$ns" --as=system:serviceaccount:$ns:tenant-sa)
    [ "$pod" = "yes" ] && [ "$sa" = "no" ] && echo "SPLIT-PRIV: $ns"
  done

# Confirm no admission policy allows pod-level role-arn annotations
$ kubectl get constrainttemplates,clusterpolicies -A 2>/dev/null | grep -i annotation

Read each flagged namespace. Confirm IRSA binds via service account, not pod annotation. Confirm an admission policy rejects security-relevant annotation keys set by tenants. The exercise is finishable in a day for a single cluster.

How Celvex catches this

Find. Prove. Fix. Verify.

Find

The scanner maps the isolation boundary across four channels, namely annotation, cached-context, eventual-consistency, and route, using read-only public-vantage probes and structural attestation audits, every one attributed with an X-Celvex-Probe header.

Prove

For a confirmed cross-tenant read we ship a Proof Capsule with a localstack two-tenant fixture: the exact pod spec, the STS caller-identity showing the wrong role, and the single cross-tenant object read, reproducible offline.

Fix

The Capsule's remediation block points at the isolation-contract control scoped to the channel that failed: the admission policy to add, the Pod Identity migration, or the cache-clear that closes the bleed.

Verify

After the fix lands, the pod-annotation role request is denied at admission and the two-tenant probe shows no bleed. The finding closes automatically and the dashboard records the verified-fix event for the audit trail.

Where we sit on the autonomy curve: at L1.5 today, our R-04 cloud-edge track ships eight tagged detection signatures covering the warm-container two-tenant runtime probe, the EKS RBAC split-write audit, cross-region rotation drift, and the peering-overlap structural check. At L2 within 90 days, the corpus extends the runtime probe to AppRunner and to Azure federated-credential pools, the same trust-channel primitive on different control planes. At L3 within twelve months, the scanner synthesises channel-specific probes for unfamiliar multi-tenant architectures it fingerprints in customer environments. We do not claim L3 today. We do claim our L1.5 catches the four channels above and ships a reproducible Capsule for each.

Bottom line

Cross-tenant read is the cloud finding that exposes everyone at once, and it lands in environments whose IAM policies are textbook-correct. The reason is that tenant isolation is a runtime invariant verified as a static configuration, and the attacker lives in that gap, attacking not the policy but the annotation, the warm cache, the lagging region, or the silent route the policy trusts. CVE-2024-7646 and CVE-2024-9594 are the 2024 reminders that the annotation channel is an authorization channel. The fix is a contract: bind to immutable identity, deny tenant-write on trust-bearing fields, never cache tenant data in shared context, measure revocation and routing, and verify isolation by running it, not by reading the policy. Until you run the boundary, a green posture report is one writable annotation away from a tenant-wide breach.

Verifiable security. Find it. Prove it. Fix it. Verify the fix held. That is what we ship.

Sources

Map your own tenant-isolation boundary.

Free Exposure Check, no signup required. We map the four trust channels your isolation relies on and ship a Proof Capsule for the highest-confidence cross-tenant exposure.

Run a Free Scan →