Continuous Validation, Not Quarterly Hope: Inside Our CTEM Platform

Continuous Threat Exposure Management (CTEM) is a Gartner-coined operating model, not a product category. The framework prescribes five stages (scoping, discovery, prioritization, validation, and mobilization) run on a loop rather than as an annual project. Most tools that wear the CTEM label implement the first three stages well and then hand you a CSV. The two stages that actually change attacker economics, validation (does the exposure really lead to compromise?) and mobilization (does the loop teach itself?), are where almost everyone stops.

This post is a tour of how our platform implements all five, with particular attention to the parts that are hard: making validation verifiable, making prioritization predictive, and making the whole thing learn.

1. Continuous validation: the difference between a finding and a fact

A scanner that says "TLS 1.0 enabled" or "Spring Boot actuator exposed" has produced a hypothesis. It has not produced a fact. The gap between those two words is the entire reason vulnerability backlogs rot: a remediation team that cannot distinguish a reachable, exploitable exposure from a theoretical one will, rationally, defer everything.

Our discovery stage runs against the live attack surface continuously, not on a quarterly cadence. Deep DNS enumeration is mandatory pre-work on every new engagement and re-runs weekly thereafter, because the asset you forgot you owned (a stale subdomain, an unmonitored admin panel, an orphaned cert) is precisely the door an attacker walks through. Discovery feeds a catalog of thousands of parameterized tests, and the validation stage executes them rather than pattern-matching banners.

The operating rule is blunt: every test that can run, runs, on every continuous-monitoring scan. There is no "manual, on-demand" bucket where coverage quietly goes to die. The only tests that skip are the ones that are genuinely internal-inaccessible, and each carries an explicit skip_reason. Coverage gaps are themselves findings, a door you are not watching is a door, whether or not you have a test pointed at it.

2. Proof Capsules: verifiable security as an artifact

This is the differentiator, so it gets the most words.

When validation confirms an exposure is real, the platform mints a Proof Capsule: a signed, self-contained, replayable artifact that proves the finding without requiring the reader to trust us. A capsule bundles:

the request/response evidence that demonstrates the condition,
the CWE classification and the test identity that produced it,
a disposition (confirmed exploit, or, and this matters, not a finding),
a detection signature the customer's own SOC can deploy, and
a remediation with a concrete fix.

Two design decisions make capsules trustworthy rather than decorative.

First, they are cryptographically signed (Ed25519). The signature binds the evidence to the disposition so a capsule cannot be silently edited after the fact. A customer (or a bug-bounty triager, or an auditor) can verify the signature offline and replay the evidence.

Second, and this is a hard-won product principle: we do not mint capsules for false positives. A "we checked and found nothing" disposition Proof Capsule is zero customer value, it is a certificate proving we added no value. Only confirmed, real exploits mint a capsule. Validation logic that distinguishes a true exploit from a body-only echo or a substring false-match is baked into the scan pipeline, not bolted on after. When a past finding gets reclassified to FALSE_POSITIVE, its capsule is retired, not archived as noise.

The slogan we put on this is the literal claim the artifact makes: Verifiable security. A Proof Capsule is the receipt.

3. Predictive exploit-prioritization: likelihood and time-to-exploit

Confirming that 400 exposures are real does not tell a remediation team what to fix on Monday morning. Severity (CVSS) answers "how bad if exploited" and is intentionally context-free; it is not a risk score, and sorting a backlog by it wastes your best attention on the wrong CVE.

Our prioritization engine fuses three orthogonal signals into two operator-facing numbers:

EPSS (the FIRST Exploit Prediction Scoring System), a daily-updated probability that a CVE is exploited in the wild within 30 days.
CISA KEV, the ground-truth catalog of vulnerabilities known to be exploited. KEV membership is not a probability; it is a step function. A KEV hit dominates the score.
Adversary interest, our own threat-intel signal: dark-web chatter, exploit-broker pricing, patch-diff forecasting, and observed scanning. This is the leading indicator that fires before EPSS catches up and often before a public PoC exists.

The engine emits a likelihood (the probability this exposure is weaponized against this asset) and a time-to-exploit estimate (how long the window is likely to stay open). The second number is the one executives actually act on: "patch within 9 days" is a decision; "CVSS 9.1" is a feeling. Where the model is uncertain, a brand-new CVE with no EPSS history and no KEV entry but loud broker interest, the engine widens the time-to-exploit band rather than fabricating false precision, and surfaces the uncertainty explicitly.

4. Attack-path blast-radius: from a single door to the room behind it

A confirmed exposure on one host is a finding. The same exposure as the first hop in a chain that reaches your customer database is an incident waiting to happen. The platform models discovered assets and their trust relationships as a graph and computes blast radius: given this exposure, what does an attacker reach next?

This reframes prioritization. An exposure with a modest standalone likelihood but a blast radius that includes an identity provider or a secrets store outranks a higher-likelihood exposure that dead-ends on an isolated marketing box. Chained tests, where the output of one validated finding becomes the precondition for the next, are first-class citizens of the catalog, because real adversaries do not stop at the first open door, and neither should validation.

5. The intel-to-test-to-chain learning loop

The four subsystems above would be a strong static product. What makes it a continuous one is that they feed each other.

threat intel (CVE feeds, KEV, EPSS, dark-web, patch-diffs)
        │
        ▼
   new tests authored + catalogued (15-20/day, grounded in real intel)
        │
        ▼
   continuous validation runs them across the surface
        │
        ▼
   confirmed exploits mint Proof Capsules ──► detection + fix to customer
        │
        ▼
   outcomes (accepted / chained / false-positive) feed back
        │
        ▼
   prioritization weights + chain graph + test catalog retrain
        └───────────────────────────────────────────────► (loop)

Every weekday, threat intelligence becomes new attack-vector tests grounded in real-world signals, never speculative padding. A patch-diff that scores high on our 0-day forecaster becomes a test before the CVE is public. Validation outcomes, including the false positives we deliberately decline to capsule, retrain the prioritization weights and prune the test catalog. The chain graph grows as new single-hop findings reveal new multi-hop paths.

This is the mobilization stage of CTEM, implemented as machinery rather than a meeting. The loop's job is to make sure the gap between "an attacker learned a new technique" and "our platform tests for it across every customer" is measured in days, not quarters.

Why this composition matters

Each subsystem is useful alone. Composed, they change the question a security team gets to ask. Instead of "here are 4,000 findings, good luck," the platform answers:

Is it real?, a signed Proof Capsule you can verify yourself.
How urgent?, a likelihood and a time-to-exploit, not just a severity.
How bad if ignored?, a blast-radius path, not an isolated host.
Are we keeping up?, a loop that turns this week's threat intel into this week's tests.

Quarterly pentests answer none of those continuously. A scanner that stops at hypotheses answers only the first, and badly. The CTEM operating model exists because exposure is continuous; the platform exists to make the response continuous too.

Verifiable security.