The sandbox you trust to run untrusted code: micro-VM escape as an attack surface

You built a sandbox so you could run code you do not trust. CI runs a pull request from a stranger. Your AI agent executes a tool the model just wrote. A multi-tenant build platform compiles whatever a customer pushes. Every one of those designs rests on a single assumption: the box holds. The moment it does not, the host runs the attacker's code, and the attacker's code was always hostile. On 10 June 2026, CVE-2026-46695 landed against Boxlite, a lightweight sandbox that launches OCI containers inside micro-VMs precisely to run untrusted code, with a CVSS of 10.0. The flaw was not exotic: Boxlite did not restrict the kernel capabilities inside the container, so hostile code could simply remount a read-only mount as read-write and write where it must not. That is the whole class in one sentence. The boundary that fails is never the policy you wrote at the edge; it is the capability you left enabled on the inside. This piece walks the decision tree from untrusted code in, to escape primitive, to host code execution, to pivot, and the defense-in-depth contract that closes it.

There is a comfortable lie at the center of every execution sandbox: that a boundary you declared is a boundary that holds. You marked a mount read-only, so it is read-only. You dropped into a container, so you are contained. You launched a micro-VM, so the host is safe. Each of those is a static declaration. Whether it survives an attacker who controls the code running inside is a runtime property, and runtime is exactly where a single missing capability drop, a writable kernel interface, or an over-trusted device node turns a declared boundary into a doorway.

This is part of our attack-research series on the classes that ship despite a decade of patches. Earlier pieces walked cross-tenant isolation failure on shared clusters and the business-logic chain into cloud-admin. This one walks sandbox escape the same way: not as one CVE, but as a family of boundary collapses that share a root cause. By the end you should be able to map where your own untrusted-code boundary actually lives, see why a static scanner cannot tell you whether it holds, and ship the layered controls that make an escape structurally hard rather than merely forbidden by a config flag. Verifiable security.

The attack pattern in one paragraph

A platform accepts code it does not trust and runs it inside an isolation primitive: a container, a micro-VM, a language-level sandbox, a seccomp-confined process. The platform declares a boundary at the edge: read-only mounts, a dropped user, a network policy, a resource limit. The attacker accepts the edge policy and instead targets the capability the runtime left reachable on the inside. There are three canonical primitives. First, the capability primitive: if the container keeps a powerful Linux capability such as CAP_SYS_ADMIN, hostile code can mount, remount, manipulate namespaces, or write kernel interfaces that the read-only declaration never actually enforced at the kernel level. That is exactly CVE-2026-46695: read-only was a label, not a kernel guarantee, because the capability to remount was never dropped. Second, the handle primitive: a file descriptor, device node, or symlink that points at a host object the sandbox forgot to sever, the shape of the classic runc escape CVE-2019-5736, where a container overwrote the host runc binary through /proc/self/exe. Third, the kernel-interface primitive: a writable kernel surface reachable from inside, such as the cgroups v1 release_agent in CVE-2022-0492, that runs an attacker-controlled program in the host context. In all three, the edge policy was correct and the host ran the attacker's code anyway.

The unifying observation: isolation is a runtime invariant, but it is configured as a static declaration. The attacker lives in the gap between what you declared and what the kernel actually enforces.

Why this still ships in 2026

If the lesson of 2019's runc escape was "do not let a container reach a host handle," and the industry largely hardened that path, why is sandbox escape still a CVSS-10 finding in 2026? Four structural reasons, each verifiable against your own runtime in an afternoon.

The boundary is a stack of independent layers, and each is configured separately. A real sandbox is namespaces plus capabilities plus seccomp plus the mount table plus the device cgroup plus, sometimes, a hypervisor. Each layer is configured in a different place by a different default. A read-only mount marked at the volume layer means nothing if the capability layer still grants the remount, which is precisely the Boxlite failure. The declaration and the enforcement live in different layers, and nothing checks that they agree.
Static scanners audit the manifest, not the running kernel. A scanner reads your Dockerfile, your pod spec, your sandbox config, and tells you the mount is declared ro. It does not exec into the running container and attempt mount -o remount,rw to see whether the kernel actually refuses. The check that catches the escape requires running the boundary from the inside, not reading the file that describes it. That is the gap our cloud-edge research track was shaped to occupy. This is a CWE-693 protection-mechanism failure wearing a green configuration report.
"Sandbox" is treated as a binary, when it is a spectrum. A language eval-sandbox, a seccomp-filtered process, an OCI container, and a hardware-virtualized micro-VM all get called "the sandbox," and teams reason about all of them with the same trust. They are not the same. A shared-kernel container hands the attacker the entire kernel attack surface; a micro-VM with a thin device model does not. Treating them as interchangeable is how untrusted code ends up one capability away from the host.
The runtime keeps powerful capabilities for convenience. CAP_SYS_ADMIN is the new root: dozens of features need it, so runtimes grant it broadly, and once it is present the read-only label, the namespace, and the seccomp gaps all become negotiable from the inside. CVE-2022-0492 needed exactly this: a container with the capability to mount a cgroup and write release_agent escaped to the host. The privilege was kept for convenience and spent for escape. This is the CWE-269 improper-privilege-management core of the class.

The attacker decision tree

The five-step tree an attacker walks from untrusted code inside the sandbox to host code execution and pivot. Step 3 is the boundary crossing.

The decisive insight at step 2 is that the attacker does not fight the edge policy. They accept that the mount is declared read-only and the user is dropped, because those declarations are usually correct, and they enumerate which enforcement layer is actually thin. That enumeration is fast and runs entirely from inside the box: read /proc/self/status for the effective capability set, probe whether /proc/sys or /sys/fs/cgroup is writable, test whether a host device node survived into the container, attempt a no-op remount. The layer that turns out to be writable, capable, or reachable is the escape. The Boxlite case is the cleanest possible illustration: the read-only mount was real at the volume layer and meaningless at the capability layer, so a single remount,rw from inside collapsed it.

A composite real-world scenario

The setting is a CI and AI-tooling platform that runs untrusted code on behalf of others. It accepts pull requests from forks and runs each one's test suite, and it hosts an AI agent that executes shell commands the model generates. Both workloads run inside a lightweight sandbox that launches an OCI container inside a micro-VM, marketed and configured to run untrusted code safely. The platform's config is textbook: the source tree is mounted read-only into the box, the network is restricted, a non-root user is set. A configuration scanner of this setup returns green.

An attacker opens a pull request, or steers the agent into running a command, that contains the test payload. The first thing the payload does is read its own capability set from inside the box, the fastest possible boundary map.

# Inside the sandbox: what did the runtime actually leave me?
$ grep Cap /proc/self/status
CapEff:  00000000a80425fb
$ capsh --decode=00000000a80425fb | tr ',' '\n' | grep -i sys_admin
cap_sys_admin            # the runtime kept CAP_SYS_ADMIN inside the box

# The "read-only" source mount the platform advertises:
$ mount | grep ' /work '
/dev/vdb on /work type ext4 (ro,relatime)   # declared ro at the volume layer

The mount is genuinely read-only at the volume layer. But the capability to remount was never dropped, and read-only on a mount is not a kernel guarantee when the holder can remount. This is the Boxlite failure shape exactly: the protection was a label, not an enforced invariant.

# read-only is a label, not a guarantee, when I hold CAP_SYS_ADMIN
$ echo test > /work/marker 2>&1
bash: /work/marker: Read-only file system     # as declared... for now

$ mount -o remount,rw /work                   # the capability was never dropped
$ echo test > /work/marker && cat /work/marker
test                                          # the "read-only" boundary is gone

Write access to a directory the host trusts is the foothold. From here the escape is whichever host-reachable handle or interface the runtime left thin. If a host path is bind-mounted, the attacker writes a unit file or a hook script the host will execute. If a writable kernel interface survives, the cgroup release_agent path of CVE-2022-0492 runs an attacker program in the host namespace. If the container can reach the runtime binary, the /proc/self/exe overwrite of CVE-2019-5736 replaces it. The capability that should have been dropped is the master key that makes each of these reachable.

# With CAP_SYS_ADMIN, the cgroup release_agent escape runs my code on the host
$ mkdir /tmp/c && mount -t cgroup -o rdma cgroup /tmp/c   # needs the capability
$ echo 1 > /tmp/c/notify_on_release
$ host=$(sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /proc/mounts | head -1)
$ echo "$host/payload" > /tmp/c/release_agent
# drop /payload that the HOST kernel runs as root on cgroup release
# -> host code execution. One escape = the whole node.

No edge policy was violated. The mount was declared read-only, the user was non-root inside the box, the network was restricted. The boundary that failed was the assumption that a declared protection is an enforced one. MITRE ATT&CK T1611, Escape to Host, is the technique; the kept capability is the cause. Total elapsed time from payload start to host code execution: seconds, because every check runs locally inside the box. And because this is a shared runner, the host the attacker now controls is the host that ran every other tenant's job.

What we observe in untrusted-execution environments

We are honest about the limits of our visibility. CelvexGroup's continuous validation runs against assets and attestation bundles the customer flags into scope; we do not have a god's-eye view of every runner fleet. What we do probe, with read-only attestation audits and benign inside-the-box capability reads carrying an X-Celvex-Probe attribution header so the customer's SOC can always identify our activity, is the layered boundary map above. Across cloud-edge engagements this year, the rough breakdown:

Roughly one in five untrusted-code runners we audited kept a powerful capability such as CAP_SYS_ADMIN inside the execution container. Most operators believed their read-only and non-root declarations were sufficient and did not know the capability made them negotiable.
Roughly one in seven had at least one host-trusted path bind-mounted writable, or a host device node surviving into the box, the handle primitive that turns a write into host execution.
Roughly one in nine exposed a writable kernel interface (/proc/sys, /sys/fs/cgroup, or a cgroup mount) reachable from inside the sandbox.
Most AI tool-execution sandboxes we reviewed ran on a shared kernel with no hypervisor boundary, treating a language or container sandbox as equivalent to a micro-VM. The model writes the code; the code runs one capability away from the node.

The honest read: sandbox escape is among the highest-impact findings we ship in untrusted-execution environments. It is not the highest-frequency, but when it lands it converts a single hostile job into control of a host that served many. Static configuration review gives a false all-clear precisely because each declaration is correct on its own.

What to do about it: the isolation contract

The fix is not one line, because isolation is a property of a stack of layers, not a single flag. But it reduces to a defense-in-depth contract every untrusted-code boundary should satisfy, and most controls are cheap. The governing principle: never trust the sandbox alone, and never let one layer's declaration substitute for another layer's enforcement.

Untrusted-code isolation contract: defense in depth that closes the escape

Drop every capability you cannot name a reason to keep, especially CAP_SYS_ADMIN. Start from an empty capability set and add back only what the workload provably needs. A dropped remount capability makes a read-only mount actually read-only, which is the direct fix for the CVE-2026-46695 class.
Use a real isolation boundary for genuinely untrusted code, not a shared kernel. Run CI from forks, AI tool-execution, and multi-tenant exec inside a gVisor-style user-space kernel or a Kata-style or Firecracker-style micro-VM, so an in-box kernel exploit hits a thin device model rather than the host kernel. Match the boundary strength to the trust level of the code.
Enforce read-only at the kernel, then verify it from inside. A volume marked read-only is necessary but not sufficient. After launch, attempt mount -o remount,rw from inside the box on a schedule and assert the kernel refuses. If the remount succeeds, the boundary is a label.
Sever host handles and interfaces. No host device nodes in the box, no host-trusted path bind-mounted writable, masked and read-only /proc and /sys kernel interfaces, and a seccomp profile that blocks the syscalls an escape needs (mount, pivot_root, keyctl, and friends).
Run the host as if the box will be breached. Non-root runner identity on the host, minimal node IAM and cloud-token scope, per-job ephemeral hosts so one escape does not inherit another tenant's residue, and patch the runtime: the Boxlite class was fixed in 0.9.0, runc and the kernel cgroup path were fixed years ago.
Verify isolation by running the boundary, not by reading the manifest. Execute an in-box escape-probe battery (capability read, remount attempt, kernel-interface write test, handle reachability) against every untrusted-code runtime on a schedule. A green configuration is necessary but not sufficient.

A declared boundary is not an enforced boundary. The sandbox that fails is always the layer whose protection was a label the kernel never had to honor: the kept capability, the writable interface, the host handle left attached.

The audit, in concrete terms, starts with reading the capability set and probing the boundary from inside every untrusted-code runtime:

# For each running untrusted-code container, dump the effective capabilities
$ for c in $(docker ps -q); do
    echo "== $c =="
    docker exec "$c" grep CapEff /proc/self/status 2>/dev/null
  done
# Decode each CapEff and flag any that still hold cap_sys_admin

# Then prove read-only is enforced, not just declared:
$ docker exec "$c" sh -c 'mount -o remount,rw /work 2>&1 || echo ENFORCED'
# "ENFORCED" = good; a silent success = the boundary is a label

Read each flagged runtime. Confirm the capability set is minimal, that read-only mounts refuse remount from inside, that no host handle or writable kernel interface survives, and that genuinely untrusted code runs behind a real isolation boundary rather than a shared kernel. The exercise is finishable in a day for a single runner fleet.

How Celvex catches this

Find. Prove. Fix. Verify.

Find

The scanner maps the isolation boundary across four layers, namely capability, handle, kernel-interface, and hypervisor, using read-only attestation audits and benign inside-the-box capability reads, every one attributed with an X-Celvex-Probe header.

Prove

For a confirmed escape primitive we ship a signed Proof Capsule with the captured capability set, the exact remount or interface-write that succeeded, and the host-side effect, reproducible offline against a local fixture, never against the live host.

Fix

The Capsule's remediation block points at the defense-in-depth control scoped to the layer that failed: the capability to drop, the seccomp rule to add, the micro-VM boundary to adopt, or the runtime version to patch.

Verify

After the fix lands, the in-box remount is refused and the escape-probe battery shows no reachable primitive. The finding closes automatically and the dashboard records the verified-fix event for the audit trail.

Where we sit on the autonomy curve: at L1.5 today, our cloud-edge track ships tagged detection signatures covering the inside-the-box capability read, the read-only remount-enforcement probe, the writable kernel-interface check, and the shared-kernel-versus-micro-VM boundary classification. At L2 within 90 days, the corpus extends the probe battery to language-level eval-sandboxes and AI tool-execution runtimes, the same trust-channel primitive on a different boundary. At L3 within twelve months, the scanner synthesizes runtime-specific escape probes for unfamiliar sandbox architectures it fingerprints in customer environments. We do not claim L3 today. We do claim our L1.5 catches the layers above and ships a reproducible Capsule for each. The deeper play sits inside our cloud security validation track.

Bottom line

Sandbox escape is the finding that converts one hostile job into control of a host that served many, and it lands in environments whose edge configuration is textbook-correct. The reason is that isolation is a runtime invariant verified as a static declaration, and the attacker lives in that gap, attacking not the policy but the kept capability, the writable kernel interface, or the host handle the runtime left attached. CVE-2026-46695 is the 2026 reminder that a read-only label is not a kernel guarantee, and CVE-2019-5736 and CVE-2022-0492 are the older reminders that one reachable handle or interface is a host takeover. The fix is a defense-in-depth contract: drop every capability you cannot justify, run untrusted code behind a real isolation boundary, sever host handles, harden the host as if the box will be breached, and verify isolation by running it, not by reading the manifest. Until you run the boundary, a green configuration report is one writable capability away from a host-wide breach. Do not trust the sandbox alone.

Verifiable security. Find it. Prove it. Fix it. Verify the fix held. That is what we ship.

Sources

Map your own untrusted-code boundary.

Free Exposure Check, no signup required. We map the four isolation layers your sandbox relies on and ship a Proof Capsule for the highest-confidence escape primitive.

Run a Free Scan →