← Back to Research

Microsoft Storm-0558: How One Stolen Key Reached 25 Tenants

One key. Twenty-five tenants. Zero passwords stolen. In July 2023 a Chinese threat actor tracked as Storm-0558 forged authentication tokens for 25 enterprise Microsoft 365 tenants — including the U.S. State Department and Commerce Department — using a single Microsoft Account (MSA) consumer signing key from 2016. The post-incident review revealed that every architectural assumption protecting the blast radius of that key was wrong.

The Cyber Safety Review Board's 2024 report on this incident is one of the most uncomfortable documents Microsoft has ever had to acknowledge in public. Not because of the immediate damage — although exfiltrating Outlook Web Access mail from a sitting U.S. Secretary of Commerce is bad enough — but because of the chain of failures that allowed it. None of those failures were exotic. Each link in the chain was the kind of thing every engineering organization on Earth ships every quarter and assumes is fine.

This post walks the chain. We are not interested in shaming Microsoft. We are interested in the specific assumptions about cryptographic key separation, debug-environment hygiene, and token validation logic that turned out to be fictional — because every multi-tenant SaaS platform we audit has equivalent assumptions, and most of them are equally fictional.

What happened

The incident timeline as reconstructed by the Cyber Safety Review Board (CSRB) and Microsoft's own public statements:

The forged tokens permitted full read access to OWA mailboxes for any user in the impacted tenants. The attacker did not need to compromise a single password, did not need to defeat a single MFA prompt, and did not need to exploit a software vulnerability in the customer's environment. The keys to the building had already been issued; Storm-0558 just had to walk in.

Why it kept working

The failures are layered, and each layer is more uncomfortable than the last.

Layer 1: Crash dumps were trusted to redact themselves

The first assumption was that production crash-dump redaction would always remove sensitive cryptographic material before the dump left the signing environment. A race condition broke this for at least one dump. Worse, the redaction system reported success in its telemetry — there was no signal anywhere indicating that an unredacted dump had crossed the boundary.

The lesson: any control that runs on the data path and reports its own success is not a security boundary. It is a hope. The crash dump should have been forbidden from leaving the signing environment regardless of what the redaction service claimed about its contents.

Layer 2: The corporate debug environment was implicitly trusted

Once the dump crossed into the corporate debug environment, it inherited that environment's threat model — namely "engineers can poke at this." That threat model assumes the contents are debugging artifacts, not live production cryptographic secrets. There was no monitoring that scanned debug-environment artifacts for cryptographic material. There was no expiry on dumps. The 2021 crash dump sat in a place an attacker could reach for two years.

The lesson: trust boundaries must be enforced by the data, not the location. A live signing key remains a live signing key whether it sits in a crash dump on the corporate file share or in the HSM that generated it. Treating data as harmless because it landed in a folder labeled "debug" is the cryptographic equivalent of leaving a master key on the receptionist's desk because the front door is locked.

Layer 3: The "consumer vs enterprise key separation" was theoretical

This is the architectural failure that turned a corporate-network compromise into a 25-tenant breach. Microsoft's authentication architecture maintains separate signing keys for consumer (MSA) tokens and enterprise (Azure AD) tokens. The intent is exactly what you'd expect: a compromise of consumer infrastructure should never allow forgery against enterprise tenants.

The token validation logic on the OWA backend, however, did not enforce this separation. A library that was supposed to validate enterprise tokens against the enterprise key set instead accepted any token signed by either the enterprise OR the consumer key set, because of an unexpected interaction between common token-validation code paths and the way the JWKS (JSON Web Key Set) endpoints were merged at one layer of the validation stack. Forging a token with the consumer key, claiming an enterprise tenant in the tid claim, and presenting it to OWA — should have failed. It succeeded.

The lesson: the security of "key separation" depends entirely on the correctness of the validation code that enforces it. Generating two keys is the easy part. Ensuring that the relying-party logic rejects cross-realm tokens is the hard part. And in any sufficiently large codebase, "the relying-party logic" turns out to be five different libraries owned by three different teams, two of which were merged from acquisitions and never harmonized.

Layer 4: A 2016 key was still trusted in 2023

The MSA signing key that Storm-0558 used had been generated in April 2016. It had been retired from active issuance years before the incident. But "retired from active issuance" is not the same as "removed from the JWKS endpoint." The validation logic accepting the forged tokens did so because the key was still listed as a valid signer at the time of forgery. There was no mandatory rotation of keys out of the trust set after retirement, and no automated process culling old keys from JWKS.

The lesson: key rotation is a complete operation. A retired key that is not actively used to sign anything new is still actively trusted to validate anything claiming to have been signed by it. If your JWKS endpoint contains keys older than your customer onboarding cycle, you are accepting forgeries you don't know about yet.

What to check today

If you operate any multi-tenant authentication system — and if you ship a SaaS, you do — Storm-0558 is a checklist, not a curiosity.

  1. Audit your JWKS endpoint for key age. List every key advertised. For each, identify the date of last issuance. Any key older than your longest-lived issued token (typically refresh-token TTL plus a safety margin) should be culled. If you cannot answer "when was the last token signed by this key issued," you have already lost.
  2. Test cross-realm token acceptance. Forge — in a test harness, with your own keys — a token claiming a tenant ID from realm A, signed by realm B's key. Submit it to your validation endpoint. If it is accepted with anything other than an immediate hard rejection, your "key separation" architecture is decorative.
  3. Audit crash-dump and core-dump destinations. What systems write dumps? Where do those dumps land? Who has read access to that location? Is there a control that scans dumps for cryptographic material before they are opened? If your dumps from a signing process can land on an engineer's laptop, plan for the day they do.
  4. Enforce the trust boundary on the data, not the location. Production cryptographic material — keys, key fragments, in-memory token-signing state — should be cryptographically incapable of crossing into corporate environments. Use HSMs that physically cannot export key material. Use envelope encryption such that even a memory dump is opaque without an HSM-mediated unwrap.
  5. Test your validation library, not your key-generation library. Most token-related security testing concentrates on issuance: "is the key generated with sufficient entropy, in an HSM, with proper rotation." That is the easy half. The hard half is "does every relying party correctly reject every token it should reject." Storm-0558 was a relying-party validation bug, not an issuance bug.

How CELVEX Group tests for this

Cross-realm token acceptance and cross-tenant key validation gaps are exactly the kind of bug that does not appear in any single tenant's view. You can audit your own tenant exhaustively and never see it, because the bug only manifests when a token signed for a different realm is presented to your validation logic. We added a dedicated test for this class of failure in our March 2026 incident-lessons supplement.

The test, INCIDENT-XTENANT-KEY-VALIDATION-001, lives in core/test_catalog/_supplement_incident_lessons_2026-03.py. It enumerates every JWKS endpoint exposed by a target's authentication surface, catalogs the issuer claim each key advertises, and then attempts to use each key to sign tokens claiming issuance from other issuers in the same trust constellation. Any token accepted by a validating endpoint despite a mismatch between the signing key's intended realm and the token's claimed tenant is logged as a critical finding. The test also flags any key in JWKS that is older than the customer's stated key-rotation policy, which catches the "2016 key still trusted in 2023" case directly.

For tenants on Microsoft 365, Google Workspace, or any other major IdP, we additionally validate that retired key IDs are no longer present in the relying-party trust set, that token validation enforces the iss claim against the expected issuer for the customer's tenancy, and that the JWKS endpoint does not advertise keys whose last-known issuance predates the customer's onboarding date with that IdP.

Bottom line

Storm-0558 is the canonical example of how cryptographic separation is only as strong as the code that enforces it. Microsoft generated separate keys for consumer and enterprise realms. They retired the 2016 key from active issuance. They had a redaction system to keep keys out of crash dumps. They had a corporate-debug-environment trust boundary. Every one of those controls existed. Every one of those controls failed silently, in a way that produced no telemetry that anyone was looking at, for at least two years.

The reason the breach reached 25 enterprise tenants is not that any single one of those controls was naive. It is that no one was running an end-to-end test that asked "if a 2016 consumer key were exfiltrated tomorrow, would my 2023 enterprise tenant accept tokens signed by it?" If anyone had run that test, the answer would have been "yes," and the architecture would have been fixed in 2017. Nobody ran that test, because the architecture diagram said it was impossible.

Architecture diagrams are not security controls. Tests are.

Sources

Run a free Exposure Check — 60 seconds, no signup

See the publicly visible signals an attacker would use to map your authentication surface, including JWKS endpoints, IdP federation paths, and token validation behavior. No account required.

Start your Exposure Check