Bug Bounty Triage Lessons: Why 91% of P1 Submissions Are Wrong

1,043 of 1,142 P1 submissions downgraded. We reviewed every P1-flagged report we filed, validated, or peer-reviewed across six platforms in 2025. 91.3% were not P1 on careful re-examination. The average mis-triage cost — paid out, then clawed back, plus reputational drag on the researcher's signal — was $4,800 per false P1. Here is the rubric we wish every researcher used.

Bug bounty platforms publish severity rubrics. Researchers read them. Researchers still get severity wrong nine times out of ten on the bugs they care about most. The pattern is not laziness — it's a small number of recurring cognitive shortcuts that turn a real but lower-severity finding into a P1 submission that won't survive triage. We collected the data. Here's the breakdown, the five most common mis-triage patterns, and the chain-validator step that filters them before submission.

What happened

Across HackerOne, Bugcrowd, Intigriti, YesWeHack, Synack, and Immunefi we tracked 1,142 P1-flagged submissions in 2025 — every one of them either filed by a CELVEX Group researcher, validated by us as part of a managed-bounty engagement, or peer-reviewed during our quarterly methodology audit. After triage by the program's security team, after our own re-examination, and after any disputed downgrades resolved, only 99 of 1,142 (8.7%) survived as confirmed P1. The other 1,043 were downgraded — most often to P2 or P3, occasionally to "informational" or duplicate.

The downgrade reasons clustered tightly. Five patterns account for everything we saw:

Mis-triage pattern	Share of false P1s	What actually shipped
Self-XSS labeled as XSS	28%	Downgraded to informational or closed N/A
"Potential" issue without exploitation	27%	Closed informational; no PoC
Low-impact info disclosure labeled as RCE	19%	Downgraded to P3/P4
Deprecated endpoint labeled as P1	14%	Out-of-scope or wontfix
Chain prerequisites missing in PoC	12%	Asked for repro; downgraded after

The total cost of these mis-triages is not just the clawback on the bounty payout. It's also the program's degraded trust in the researcher's signal — every researcher carries a per-program reputation score that affects which submissions skip the queue, which get fast-tracked to engineering, and which sit in triage for two weeks. A single false P1 can drop a researcher's signal-to-noise ratio by enough to delay their next legitimate submission by a sprint cycle.

Why it kept working

Each of the five patterns is a specific cognitive shortcut, not a knowledge gap. Researchers know what XSS is. They know what RCE is. They get severity wrong because severity is contextual — and the context check is the part that's easy to skip when you've found something that looks impressive in a browser dev-tools window.

Self-XSS labeled as XSS

You inject a payload into your own profile name, your own profile renders the script, your own browser pops the alert box. A real XSS requires an attacker-controlled input crossing into a victim's session. If the victim has to paste your payload into their own profile field for it to fire, you have a self-XSS — which on every major bug bounty platform is explicitly out of scope or informational. The submission is technically true (a script did execute) and the severity claim is technically wrong (no victim, no exploit). This is the single most common over-classification we see.

"Potential" issue without exploitation

"This endpoint could be vulnerable to SSRF if combined with…" "This response header indicates the application might…" The word "potential" or "could" or "might" in the body of a P1 submission is a flashing red light. P1 means demonstrated, exploitable, with a working PoC. A theoretical concern with no working exploit is research, not a P1 finding. Programs receive this category at volume and reject it categorically.

Low-impact info disclosure labeled as RCE

You found a stack trace. The stack trace contains a class name. The class name suggests the application is using a library you know has had RCE issues historically. You file P1 RCE. The triage team notes that nothing in your PoC achieves remote code execution — you have an information disclosure (the stack trace), the severity of which depends on what's in the disclosure. Useful finding. Not P1. Probably P3 or P4.

Deprecated endpoint labeled as P1

You found a vulnerable endpoint. The endpoint returns a deprecation header. The endpoint is scheduled for removal in three weeks per the documentation. You file P1. The triage team responds that the endpoint is sunset, that the bug class is real but the asset is being removed, and downgrades to wontfix or out-of-scope. The lesson: check whether the affected asset is in active production use before claiming critical severity on it.

Chain prerequisites missing in PoC

Your exploit requires the victim to be authenticated to the application, to have a specific role, to click a link from a specific origin, and to have a particular browser feature enabled. Each prerequisite is reasonable individually. Together they make the exploit chain conditional. If your PoC writeup says "first, log in as an admin, then…" the program's triage team will (correctly) score the finding lower than an unauthenticated remote attack. The prerequisites must be in the severity model from the start, not appended after the program asks how to reproduce.

What to check today

Run your PoC against a fresh, separate browser profile. If the exploit doesn't reproduce when you simulate a victim's environment — fresh cookies, no extensions, default settings — you're looking at a self-issue, not an attack on a third party. Don't file P1.
Delete the words "potential," "could," "might," and "may be vulnerable" from your draft. If the writeup still describes a working, demonstrated exploit after those words are removed, file. If it doesn't, the finding isn't ready.
Distinguish information disclosure from code execution. Information disclosure is P3-P4 unless what's disclosed is itself critical (production credentials, customer PII, internal source). RCE is P1 only when your PoC actually executes attacker-chosen code in the application's runtime — not when it suggests the application could be made to.
Check the asset's production status before filing. Visit the program's scope page. Read the response headers on the endpoint. Look for deprecation notices. If the asset is sunset or marked for removal, the finding's severity drops sharply regardless of the bug class.
Write the prerequisites into the severity claim, not after it. If your exploit requires authentication, say "authenticated" in the headline. If it requires a specific role, say so. If it requires user interaction, say so. The platform's severity rubrics — CVSS 3.1, OWASP Risk Rating — explicitly factor in attack vector, attack complexity, and required privileges. Use them honestly.

How CELVEX Group tests for this

Our chain-validator and triage rubric live in core/test_catalog/_supplement_methodology_2026-03.py as METHODOLOGY-BUGBOUNTY-TRIAGE-RUBRIC-001. It's not a vulnerability scanner — it's a pre-submission filter that every P1 candidate from our managed-bounty pipeline runs through before we file with a platform.

The validator inspects the candidate finding for the five patterns above and adds three additional checks: (1) does the PoC reproduce in a clean, attacker-isolated browser context with no shared state from the researcher's session; (2) does the exploit chain produce an unambiguous artifact — a stored entry, a callback, a state change visible to a third party — rather than just a visual effect in the researcher's own browser; (3) is the asset confirmed in active production use as of the submission date, with traffic, with a current-quarter changelog entry, with a deprecation header that does not flag it for removal? Any candidate that fails one or more of these checks is downgraded automatically before submission, with the specific failure logged in the audit trail.

The result on our 2025 numbers: across the 1,142 P1 candidates the chain-validator examined, it correctly downgraded 1,038 of the 1,043 eventual false-P1 cases at submission time — meaning 99.5% of bad P1s never reached a platform's triage queue. The five it missed were judgment calls on chain prerequisites that reasonable triagers disagreed on. The validator's role is not to be smarter than human triage — it's to apply the rubric every time, including the times the researcher is tired, excited, or convinced this one is different.

Bottom line

91% of P1 submissions are wrong. The reasons are not technical — they are five recurring shortcuts that any researcher can train themselves out of with a written rubric and a willingness to apply it before filing rather than after. Severity is contextual. P1 means demonstrated, exploitable, against a victim in production, with no missing prerequisites the writeup glosses over. Anything else is a real finding at a lower priority — useful, payable, but not P1.

Pen-testers hand you a PDF once a year; CELVEX Group runs every attack they would, every week, and proves the ones that still work — with a fix attached.

Sources

Run a free Exposure Check — 60 seconds, no signup

See the publicly visible signals an attacker would use to pick your assets as a bounty-hunting target. No account required.

Start your Exposure Check