Template injection: when a framework renders attacker input as code

A template engine exists to turn a template plus data into a string. The contract is that the template is trusted code written by your developers, and the data is untrusted text rendered safely inside it. Server-side template injection is what happens when those two roles blur, when attacker-influenced text reaches the part of the engine that compiles or evaluates a template, so the data becomes the program. The marketing copy field, the profile bio, the email subject, the report-name pattern: any of them, fed to a template compiler instead of escaped as data, turns into code execution on the application server. The class is old and it keeps shipping. CVE-2016-10745 was a Jinja2 sandbox escape; CVE-2021-26084 was OGNL injection in Confluence that gave unauthenticated remote code execution and was mass-exploited in the wild. The same shape now appears in modern HTML-aware templating: Phoenix and Elixir build pages with HEEx, a templating layer the ecosystem trusts as safe-by-construction, and the moment a fragment of that template is assembled from user input rather than written by a developer, the trust is misplaced. This piece walks the decision tree from a template-rendered sink to RCE, and the contract that ends the class.

Every web framework ships a templating engine, and every templating engine is, underneath, a small programming language. Jinja2 has expressions and attribute access; Twig has filters and method calls; Struts and Spring reach into OGNL and SpEL, full expression languages with access to the Java runtime; Phoenix renders HEEx, an HTML-aware dialect of embedded Elixir where <%= ... %> and the ~H sigil interpolate Elixir expressions into markup. Each is trusted because the template is supposed to be authored by the application's own engineers. The vulnerability is not the engine. It is the day a developer treats the engine as a string formatter and feeds it a string an attacker controls.

This piece walks server-side template injection the way our SSRF and JWT pieces walked their classes, and it is a companion to our broader web application testing work. Verifiable security.

The attack pattern in one paragraph

A template engine has two inputs, the template (trusted, developer-authored, compiled or evaluated as code) and the context (untrusted data, rendered into the output and escaped). The safe pattern is render(fixed_template, {bio: user_input}): the user's text only ever lands in a data slot and the engine escapes it. The vulnerable pattern is render(build_template_from(user_input)) or, more subtly, render("Hello " + user_input) where the concatenation is then compiled: the user's text becomes part of the template source, so the engine compiles and evaluates it. Once attacker input reaches the compiler, the attacker is writing in the template language. In Jinja2 they walk Python's object graph from a benign expression to os.system; in OGNL or SpEL they reach Runtime.exec directly; in HEEx-style embedded Elixir they evaluate arbitrary Elixir. The detection probe is the same everywhere: send a polyglot arithmetic payload such as ${{7*7}} or <%= 7*7 %> into a field and look for 49 in the response. If the engine computed the multiplication, it compiled your input, and the gap between edit your bio and run code on the server is now just the length of the exploit chain.

The reason this is code execution and not merely cross-site scripting is the side the evaluation runs on. XSS runs attacker code in a victim's browser; SSTI runs attacker code in the template evaluator, which lives on the application server, with its filesystem, secrets, database credentials, and outbound network. The same character sequence that would be a stored-XSS payload in an unescaped HTML context becomes a remote-code-execution payload the moment it reaches a template compiler instead of an HTML sink.

Why this still ships in 2026

Template injection was named and weaponized publicly in 2015. Why does a 2026 framework still produce fresh instances? Four structural reasons:

The engine is trusted, so the input feeding it is not audited. Developers know not to concatenate user input into SQL or shell, but they do not extend that instinct to the templating layer, because the template is "just the view." A framework that markets its templating as safe-by-construction, as HEEx markets HTML-aware escaping, makes this worse: the guarantee is about escaping data in the context and says nothing about a developer who builds the template itself from user input. The guarantee holds; the assumption that it covers everything does not.
Dynamic templates are a real product feature. User-customizable email templates, white-label report layouts, {{first_name}} personalization tokens, in-app theming: these are features customers ask for, and the lazy implementation lets the user supply template source and renders it. The requirement pulls developers straight into the vulnerable pattern.
The expression languages dwarf the use case. OGNL, SpEL, and the object graph behind Jinja2 exist for rich view logic, but a profile bio needs none of it. When user input reaches one of these evaluators, the attacker inherits the whole language, reflection and runtime access included. That capability gap is the blast radius.
It is a class, not a product, so per-CVE patching never finishes. CVE-2016-10745 patched one Jinja2 sandbox escape, then the sandbox was deprecated because the class kept escaping. CVE-2021-26084 patched one OGNL path in Confluence, after years of Struts OGNL CVEs and more since. Patching the instance never patches the pattern, and the pattern reappears in every framework that ships an expression-capable templating layer.

The attacker decision tree

ATTACKER DECISION TREE Server-Side Template Injection → RCE ┌───────────────────────────────────┐ │ 1. Find a template-rendered sink │ │ - any field echoed back into a page, │ │ email, PDF, report name, error string │ │ - custom email / report / theme template │ │ the app lets the user supply │ └─────────────────────┬────────────────┐ │ ▼ ┌───────────────────────────────────┐ │ 2. Inject template syntax (detection) │ │ - plaintext: ${7*7} {{7*7}} <%= 7*7 %> │ │ - polyglot: ${{<%[7*7]%>}} │ │ - does the response contain 49 ? │ └─────────────────────┬────────────────┐ ┌───────┤───────────┐ ▼ ▼ ▼ 49 echoed literal kept output errors (evaluated) (just a string) (engine parsed it) │ │ ▼ ▼ ┌────────────────────────────────────┐ │ 3. Fingerprint the engine │ │ - which syntax fired? error strings? │ │ - Jinja2 / Twig / FreeMarker / OGNL / │ │ SpEL / EEx-HEEx / ERB │ └─────────────────────┬────────────────┐ ▼ ┌────────────────────────────────────┐ │ 4. Escalate from expression to RCE │ │ - walk the object graph / reflection │ │ to a process or eval primitive │ │ - Runtime.exec / os.system / System.cmd │ │ → read files, secrets, shell, pivot │ └────────────────────────────────────┘

One detection probe, many engines, one outcome: input that reaches the compiler is code. Our probe stops at step 2, the confirmed evaluation.

The branch point is step 2. A field that echoes ${{7*7}} back literally is a data sink, possibly an XSS concern but not SSTI. A field that echoes 49 compiled the input, and the engine is now an attacker-controlled interpreter. Step 4 is where the class becomes an incident: the attacker inherits the full template language, and in every expression-capable engine that language reaches the host runtime. An honest, authorized test stops at step 2: the evaluated arithmetic is the whole proof that the boundary is gone. You do not need to run a shell to know the door is open.

A composite real-world scenario

The setting is a B2B SaaS platform with a white-label feature: tenants customize the transactional emails their end-users receive, with personalization tokens like {{customer.first_name}}. To support those tokens, an engineer wired the user-supplied subject and body straight into the server-side template engine, the same engine used for the application's own trusted views. The design note reads "tenants can only edit their own email copy," which is true and entirely beside the point.

Step one: find the sink. A tenant admin sets the welcome-email subject to a detection probe rather than a name token, and the platform renders the email server-side to preview it.

# Step 1-2: detection probe in a user-controlled template field
Subject:  Welcome ${{7*7}}, your account is ready

Rendered preview:
Subject:  Welcome 49, your account is ready

The 49 is the entire finding. The field did not escape the input as data; it compiled it as template source. The personalization feature is a template compiler exposed to tenant input. Step two would fingerprint the engine from which delimiter fired and from any error strings, and step three would walk from a benign expression to a runtime primitive. We stop at the 49; an unauthenticated or low-privilege attacker would not.

<!-- Why the next step is RCE, illustrated, NOT executed by us -->
<!-- Jinja2: walk Python's object graph to os.system -->
{{ ''.__class__.__mro__[1].__subclasses__() ... popen('id') }}

<!-- OGNL (the CVE-2021-26084 family): straight to the runtime -->
${ @java.lang.Runtime@getRuntime().exec('id') }

<!-- Embedded-Elixir style: evaluate arbitrary code in the view -->
<%= System.cmd("id", []) %>

This is the exact shape of CVE-2021-26084, where an OGNL expression in an unauthenticated Confluence request reached Runtime.exec and was exploited in the wild to plant web shells, and of the Jinja2 sandbox-escape class that CVE-2016-10745 belongs to. The modern HEEx-style case is the same root cause in a framework trusted as HTML-safe: the escaping guarantee protects data rendered into the template and cannot protect a template assembled from user input, because at that point the user input is the program. The classification under CWE-94 (code injection) and its template-specific child CWE-1336 is what ties the Jinja2, OGNL, and HEEx instances into one class.

Why a CVE-by-CVE scanner lags, and a class test does not

A scanner that works from a CVE list asks "is this the patched version of Confluence, of Jinja2, of Struts?" That question is necessary and always late. It cannot see the white-label email feature above, because that vulnerability has no CVE: it is bespoke code that fed user input to a perfectly patched, perfectly current engine. It cannot see the next framework's first SSTI before a CVE is assigned. And it produces false comfort: every dependency is green, and the app is still one profile bio away from remote code execution.

A class test asks a different question: does any user-influenced value reach a template compiler? That is engine-agnostic and CVE-independent. The polyglot arithmetic probe fires the same on a five-year-old Struts app, a current Flask app, and a brand-new Phoenix app, because they share the one behavior that defines the class: the data slot and the code slot are the same slot. The CVE scanner tells you which known doors are unlocked; the class test tells you that you built a door where there should have been a wall. We run both, because the bespoke instance is the one no vendor will ever patch for you.

What to do about it: the template-trust contract

The fix that ends the class is a single principle stated several ways: never compile untrusted input as a template; only ever render it as data. Keep the template static and developer-authored, push every user value through the context and the escaper, and the gap where data becomes code never opens.

Template-trust contract: controls that end the class

Never build a template from user input. The template must be a fixed, developer-authored constant or a shipped file. User values go into the data context only: render(TEMPLATE, {bio: user_input}), never render("..." + user_input) and never render(user_supplied_template). This single rule kills the class.
If users must customize templates, use a logic-less engine. Render user-supplied layouts with a strictly logic-less language (the Mustache family): no expression evaluation, no object access, no method calls, so the worst case is wrong output, not code execution. Never hand users an expression-capable engine.
Sandbox is a mitigation, not a fix. Engine sandboxes (the deprecated Jinja2 sandbox, OGNL allow-lists) have a long history of escapes. Treat a sandbox as defence in depth behind the static-template rule, never as the primary control.
Keep contextual escaping on, and know what it protects. HTML-aware escaping (HEEx and modern engines) stops data rendered into a template from breaking out into markup. It cannot protect a template assembled from user input. Know which boundary your escaper guards.
Separate the personalization grammar from the engine. Implement {{first_name}}-style tokens with an explicit allow-list substitution over named values, not by handing the user's string to the full engine. The user picks from known tokens; they never supply template source.
Patch and pin the engine anyway. Track CVE-2021-26084-class advisories for every templating and expression-language dependency. The static-template rule defends your code; patching defends the engine.

Never compile untrusted input as a template. Keep the template static, push every user value through the data context and the escaper, and the place where data becomes code never opens.

The audit, in concrete terms, is a source grep plus a behavioural probe:

# Find templates built from variables instead of constants
$ grep -rnE "render(_to_string)?\(|Template\(|from_string\(|~H|<%=|new SpelExpression|Ognl\." \
      src/ app/ lib/ 2>/dev/null

# Flag any call where the template ARGUMENT is a variable or a concatenation,
# not a string literal or a shipped template file. Those are the candidates.

# Then probe behaviourally against staging: send a polyglot arithmetic payload
# into every reflective field and look for the evaluated result.
#   payload:  ${{7*7}}    expect literal ${{7*7}}   (data sink, safe)
#   if response shows 49: the field compiled the input  (SSTI, P1)

Read each rendering call. Confirm the template argument is a constant, not a value the request can influence. Confirm any user-customizable template feature runs on a logic-less engine or an allow-list substitution. The work is finishable per service in well under a day, and it converts a remote-code-execution class into a closed door.

How Celvex catches this

Find. Prove. Fix. Verify.

Find

Our web-app test family sends engine-agnostic polyglot arithmetic probes into every reflective field, form, header, and rendered preview, fingerprints the engine from which delimiter fires, and flags any value that reaches a template compiler, all non-destructive and stopping at the evaluated result.

Prove

For a confirmed sink we ship a signed Proof Capsule with the exact request, the field, and the evaluated response (the 49), Ed25519-signed for air-gapped verification. One evaluated arithmetic probe is the proof; we never run a shell or chain to a destructive payload.

Fix

The Capsule's remediation block points at the template-trust contract scoped to the finding: which rendering call compiled user input, the static-template or logic-less-engine change, and the allow-list substitution for personalization tokens.

Verify

After the fix lands, the re-test confirms the same field now returns the payload as literal text, not an evaluated result. The finding closes automatically and the verified-fix event is recorded for the auditor.

Where we sit on the autonomy curve: at L1.5 today, our web-application corpus fires the polyglot probe across reflective sinks, fingerprints the common engines, and ships a Proof Capsule for each confirmed evaluation, stopping at the arithmetic result per our no-false-positive rule. At L2 within 90 days, it adds engine-specific second-stage probes that confirm the expression context without escalating to a runtime primitive, plus source-side detection of templates built from non-constant arguments. At L3 within twelve months, the scanner synthesises engine-tailored probes for bespoke templating layers it fingerprints in customer environments, under a strict guard that never runs a destructive or shell payload. We do not claim L3 today. We do claim that L1.5 catches the data-becomes-code boundary that turns a profile bio into server-side execution, and ships a reproducible Capsule for each. See our web application testing capability for the full surface.

Bottom line

A template engine is a small programming language, and server-side template injection is what happens when attacker input crosses from the data role into the code role and reaches the compiler. The class does not belong to one framework: it produced Jinja2 sandbox escapes (CVE-2016-10745), OGNL remote code execution exploited in the wild (CVE-2021-26084), and it reappears in every framework with an expression-capable templating layer, including the HTML-aware HEEx templating Phoenix and Elixir trust as safe-by-construction. A CVE-by-CVE scanner is always late and blind to bespoke code; the one polyglot arithmetic probe fires the same on every engine and finds the door you built where a wall belonged. The fix is one principle: never compile untrusted input as a template, render it only as data, use a logic-less engine for user-customizable layouts, and keep contextual escaping on for what it actually protects. That work is finishable in under a day per service, and it converts the highest-blast-radius web class we hunt into input that simply renders as text.

Verifiable security. Find it. Prove it. Fix it. Verify the fix held. That is what we ship.

Sources

Probe your own template-rendered sinks.

Free Exposure Check, no signup required. We fire the engine-agnostic SSTI probe across your reflective fields and rendered previews, then ship a Proof Capsule for the highest-confidence evaluated-input finding.

Run a Free Scan →