How Modern Bots Walk Past CrowdSec, Fail2Ban, and Your WAF (Without Touching the Threshold)
1. The threshold trap
Every Layer-7 defender on the open-source market - CrowdSec, Fail2Ban, ModSecurity with the OWASP CRS, and most cloud WAF rule packs - is built around the same primitive: count requests from a single source IP, and when the count crosses a threshold inside a window, take a decision (ban, challenge, tarpit). The CrowdSec hub's crowdsecurity/http-bf-wordpress scenario is a textbook example: leakspeed: 10s, capacity: 5, anchored on evt.Meta.source_ip. Fail2Ban's sshd.conf filter does the same thing, expressed as a <F-IP> named capture against auth.log. The OWASP CRS rate-limit rules in paranoia level 2 likewise key on REMOTE_ADDR.
This was the right design in 2014. It is the wrong design in 2026.
The threshold model assumes the attacker is a single source. The modern attacker is a distributed primitive: a residential-proxy network of 10,000 endpoints across 1,200 ASNs, billed at $0.50 per gigabyte. At one request per IP per minute, an attacker can sustain 10,000 login attempts per minute against your /login endpoint, and every individual source stays an order of magnitude under any sane per-IP threshold. CrowdSec sees nothing. Fail2Ban sees nothing. Your WAF sees nothing. Your auth service sees a slow trickle of failed logins from many IPs, none of which look anomalous in isolation.
The 2023 23andMe breach made this concrete: roughly 14,000 accounts were compromised through credential stuffing, and the rotational botnet held under per-IP rate limits the entire time. The follow-on data exposure for relatives covered 6.9 million additional people via the DNA Relatives feature. None of the per-source rate limits failed; they did exactly what they were configured to do. They were the wrong control.
CWE-307 ("Improper Restriction of Excessive Authentication Attempts") covers this directly, and OWASP API Security Top 10 calls it API4:2023 Unrestricted Resource Consumption. What both standards make clear, but most defender deployments miss, is that "improper restriction" includes restrictions that work fine on paper and fail under realistic attack distributions.
The thirteen tests below are the evasion patterns we now run against perimeter stacks. Each one passes silently against a default CrowdSec or Fail2Ban deployment.
2. Thirteen evasion patterns we test for
This is the ENDPOINT-L7-EVASION-001..013 family. Every probe is a read-only fingerprint - we never fire the literal attack against customer infrastructure - but each one identifies a specific gap that an actual attacker can drive a botnet through.
ENDPOINT-L7-EVASION-001: Distributed credential stuffing across multiple ASNs
A residential-proxy attacker rotates source IPs across 200 unique addresses spanning 30+ ASNs (AS7922 Comcast, AS20115 Charter, AS22773 Cox, AS7018 AT&T Mobility, et al.), at fewer than one request per IP per minute against /login. CrowdSec's hub-default scenarios are per-IP only - leakspeed: 60s, capacity: 25, anchored on evt.Meta.source_ip. No bucket fills. No decision lands. The attack runs for hours.
What should fire is a behavioural bucket like:
type: trigger
name: custom/distributed-credential-stuffing
description: "Multi-source stuffing detection"
filter: "evt.Meta.http_path == '/login' && evt.Meta.http_status == '401'"
groupby: "evt.Meta.http_user_agent_family"
distinct: "evt.Meta.source_ip"
leakspeed: 60s
capacity: 50
labels:
service: http
type: bruteforce
That distinct: evt.Meta.source_ip clause - count distinct IPs, not requests - is what catches distributed attacks. It almost never appears in production deployments because the hub doesn't ship it as a default.
ENDPOINT-L7-EVASION-002: Per-account threshold absent (the 23andMe pattern)
This is the critical one. Fifty failed logins against the same username, sourced from fifty different IPs, will trigger every CrowdSec, Fail2Ban, and CRS rule cleanly because no single IP exceeds the per-IP threshold. But the auth service should still lock the account - and in most deployments it doesn't.
Per-IP rate limit + missing per-account counter = the precise stuffing primitive that drove the 23andMe (October 2023), Norton LifeLock (December 2022), and Akamai-reported (multiple) credential-stuffing incidents into compromise. NIST SP 800-63B section 5.2.2 explicitly requires throttling at the account identifier ("verifiers SHALL implement a rate-limiting mechanism that effectively limits the number of failed authentication attempts that can be made on the subscriber's account"). PCI-DSS 4.0 Requirement 8.3.4 requires lockout after no more than 10 attempts. CWE-307 identifies this as the canonical "improper restriction" failure mode.
The fix is at the auth layer, not the WAF: after 10 fails in 15 minutes for username = X, return HTTP 423 Locked with Retry-After: 900, regardless of source IP. This is the single highest-leverage control on this list.
ENDPOINT-L7-EVASION-003: Anonymous GraphQL introspection open
Send one POST to /graphql (also /api/graphql, /v1/graphql, /query) with body {"query": "{__schema{queryType{name}}}"}, no Authorization header. If you get the schema back, the attacker just got the full mutation map, the full field map, and every nested resolver. This is the precondition for IDOR enumeration and mass-assignment attacks.
Apollo Server has shipped introspection: false in production mode since v3, but configurations that explicitly set introspection: true (often copy-pasted from dev environments) are common in the field. Hasura exposes introspection on /v1/graphql by default if the admin secret is unset. Hot Chocolate enables it unless descriptor.AllowIntrospection(false) is wired. OWASP's GraphQL Cheat Sheet has covered this since 2020.
ENDPOINT-L7-EVASION-004: GraphQL alias-bomb amplification
query {
a: __typename
b: __typename
c: __typename
...
zzzz: __typename
}
A thousand aliases on a single query expand to a thousand resolver invocations and a thousand-fold response inflation. One residential-proxy request bills the customer a megabyte of egress and N database round-trips. The defender sees one HTTP request and applies one rate-limit check.
We probe with five aliases - well under any sane production cap - and check whether the server responds with all five. If yes, and introspection is open (test 003), the 1000-alias DoS is realistic. Apollo's aliasLimit: 15 via graphql-shield, Hasura's query-cost analysis, and Hot Chocolate's MaxAllowedAliasCount(15) are the standard mitigations.
ENDPOINT-L7-EVASION-005: GraphQL mutation IDOR via introspection-derived schema
The composite. With introspection open, the attacker queries:
{__schema{mutationType{fields{name args{name type{name}}}}}}
Multiple mutations sharing a numeric id argument plus open introspection plus no per-mutation auth check is the chain that produced HackerOne report #489146 (updateUser(id:42)), the 2024 Shopify IDOR class, and the GitLab GraphQL access-control reports. One authenticated account rewrites another tenant's data.
OWASP API1:2023 Broken Object Level Authorization is the Top 10 entry that maps directly here. The remediation chain is: disable introspection, replace global numeric IDs with opaque relay-style cursors (Base64 of User:42), and add resolver-level scoping where every mutation's first line is if ctx.user.id != target.user_id: raise Forbidden().
ENDPOINT-L7-EVASION-006: CRLF log-line forgery via header reflection
This is the one that turns your own defenders into attack tools.
GET / HTTP/1.1
Host: target.example
User-Agent: Mozilla/5.0\r\n198.51.100.1 - - [01/May/2026:12:00:00 +0000] "POST /login HTTP/1.1" 401 ...
If the upstream logs the User-Agent value verbatim into a line-oriented sink, the attacker has just forged a complete additional log line. Fail2Ban parses that line, sees an authentication failure from 198.51.100.1, and bans the victim IP. CrowdSec's parsers behave the same way. The attacker has weaponized the defender into a denial-of-service primitive against arbitrary third parties.
CWE-93 (CRLF Injection) and CWE-117 (Improper Output Neutralization for Logs) are the relevant entries. RFC 9110 Section 5.5 explicitly forbids CR and LF in field values. The fix is to strip and normalize CRLF (\r, \n, %0d, %0a, and the overlong-UTF-8 variant %E5%98%8A%E5%98%8D) at ingress, plus moving to a structured logger (JSON, syslog) so embedded CRLFs cannot terminate a record.
ENDPOINT-L7-EVASION-007: Fail2Ban <F-ID> / <F-IP> named-capture poisoning
Fail2Ban filters use named captures to extract the offending IP and a deduplication ID from log lines. A typical filter:
# /etc/fail2ban/filter.d/nginx-auth.conf
[Definition]
failregex = ^<HOST> -.*"(GET|POST).*/login.*" 401
^.*X-Forwarded-For: <F-IP>.*"(GET|POST).*/login.*" 401
ignoreregex =
If <F-IP> is anchored on X-Forwarded-For and the load balancer appends rather than overwrites, the attacker can inject arbitrary IPs into the field and ban victims. If <F-ID> is anchored on a User-Agent-derived field, the attacker controls the deduplication key - which forces false negatives by making each brute-force attempt look like a "different" event.
RFC 7239 Section 5.2 documents the trust model for Forwarded (and by extension X-Forwarded-For): only the immediately upstream proxy's contribution can be trusted. Anchoring <F-IP> on the connection-source <HOST> rather than a parsed header value is the correct defense; if you must consume X-Forwarded-For, the load balancer must overwrite, not append.
ENDPOINT-L7-EVASION-008: HTTP/3 login brute force (Alt-Svc parity gap)
Two hundred POST requests to /login over HTTP/3 (QUIC) lands no decision in most CrowdSec deployments. Why? Because the hub-default parsers were authored pre-RFC 9114 and match log lines like:
192.0.2.1 - - [01/May/2026:12:00:00 +0000] "POST /login HTTP/1.1" 401 1234
with regex anchors like HTTP/(1\.[01]|2\.0). HTTP/3 logs as HTTP/3.0 and trivially evades. The attacker advertises HTTP/3 support via Alt-Svc, the client hops transports, and the entire H1/H2 detection rule pack stops applying.
The fix is to update parser regexes to HTTP/[1-9]\.[0-9]+ (covers H1/H2/H3/H4) and audit any custom WAF rule that lacks an explicit http.protocol clause - those frequently apply only to H1/H2 traffic.
ENDPOINT-L7-EVASION-009: HTTP/3 endpoint enumeration
Adjacent to 008. The customer publishes Alt-Svc and DNS HTTPS resource records (RFC 9460) advertising HTTP/3 on a parallel endpoint surface. The defender's rule pack enumerates /api/v1/login for H1/H2 but doesn't list the H3 endpoint on the same path. Defender's panic playbook says "block /api/v1/login," and that block applies to TCP 443 only - QUIC 443 stays open.
A canonical endpoint inventory plus a quarterly diff against Alt-Svc and HTTPS RR advertisements catches this. Rules should always apply by Host plus path, never by transport.
ENDPOINT-L7-EVASION-010: WebSocket post-handshake brute force
After the WebSocket Upgrade handshake, most CrowdSec scenarios stop seeing per-frame events because logs only record the initial Upgrade. An attacker can stuff 10,000 frames carrying {op: "login", user, pass} payloads through a single handshake and never trip a per-IP HTTP rate limit.
Common WebSocket libraries default to no frame-level rate limit: ws for Node.js, socket.io (unless you wire rate-limiter-flexible middleware keyed on socket.handshake.address), Spring WebFlux, and gorilla/websocket for Go. RFC 6455 doesn't specify rate limiting at the protocol layer - that's an application responsibility.
The fix is a frame-level token bucket: 10 frames per second per connection, 100 per minute per connection, with the connection torn down on breach. This is straightforward to wire and almost no one wires it.
ENDPOINT-L7-EVASION-011: WebSocket sub-protocol confusion
GET /ws HTTP/1.1
Upgrade: websocket
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: graphql-ws, json-rpc-2.0, mqtt
If the server echoes more than one protocol, or accepts the handshake without echoing any, frame-parser confusion is realistic. Frames after handshake get dispatched by content shape rather than negotiated protocol. The attacker smuggles a json-rpc method invocation through a graphql-ws handler that never expected to validate method-level authorization. Cross-protocol auth bypass.
RFC 6455 Section 4.2.2 requires the server to echo exactly one supported protocol. The fix is to enforce that, reject the handshake if no offered protocol matches your supported set, and lock the frame decoder by sub-protocol.
ENDPOINT-L7-EVASION-012: CGNAT (100.64.0.0/10) source-IP whitelist abuse
This one is brutal because it is pure misconfiguration.
RFC 6598 reserved 100.64.0.0/10 for carrier-grade NAT. ISPs assign these as customer source IPs all the time. If a customer's WAF rule, reverse-proxy trust list, or admin-panel allowlist trusts the range as "internal" - which is a common copy-paste error from corporate-LAN templates that mix RFC 1918 with the CGNAT range - any residential-ISP customer can hit endpoints the customer believes are internal-only.
Trusted internal ranges are RFC 1918 only: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. The CGNAT space is public attack surface. Audit every IP allowlist in nginx, Envoy, Cloudflare rules, and AWS WAF for 100.64.0.0/10 overlap. If you genuinely need to whitelist a CGNAT-served VPN egress, anchor on mTLS cert SAN, never on source IP.
ENDPOINT-L7-EVASION-013: CrowdSec LAPI Sybil amplification onto the CTI blocklist
This is the worst one on the list because it weaponizes the defender ecosystem itself.
CrowdSec's value proposition is the consensus blocklist: many LAPIs reporting decisions, the Central API (CAPI) aggregates them, the Cyber Threat Intelligence (CTI) feed publishes consensus IPs that bouncers around the world block. If the consensus algorithm doesn't weight by source diversity (distinct ASNs, decision age, reporter reputation), an attacker who operates 50 LAPI instances - cheap on any cloud - can report the same victim IP from all 50 and amplify it onto the consensus blocklist within hours.
Persistence is the CTI TTL, which defaults to 7 days. The victim IP gets blackholed by every CrowdSec deployment subscribed to the consensus feed. The defender ecosystem becomes the attack tool.
The remediation is configuration-side weighting:
# /etc/crowdsec/config.yaml
api:
server:
consensus:
min_distinct_asns: 5
min_decision_age: 30m
online_client:
credentials_path: /etc/crowdsec/online_api_credentials.yaml
decisions:
bouncer_allowlist:
- 203.0.113.0/24 # owned address space
- 198.51.100.0/24 # critical partners
Plus subscription to CrowdSec Premium CTI for source-reputation weighting. The community CAPI, by default, treats one LAPI as one vote.
3. What CrowdSec and Fail2Ban actually do well
A fair acknowledgment, because the rest of this post is critical: both tools are excellent at the design center they were built for.
CrowdSec's hub model is genuinely innovative. The community-curated scenario library (crowdsecurity/http-cve, crowdsecurity/http-bf-wordpress, crowdsecurity/ssh-bf) covers the common single-source attack patterns competently. The bouncer architecture - decoupling decision from enforcement so you can push decisions to nginx, HAProxy, Cloudflare, and OPNsense from a single LAPI - is operationally clean. The CrowdSec parser DSL is more expressive than Fail2Ban's regex-and-named-capture model.
Fail2Ban earned its place in the perimeter for fifteen years for the right reasons: it's small, it ships with most distros, and the filter.d directory is a usable interface for ops teams. SSH brute-force, postfix relay attempts, dovecot login storms, the exim variants - Fail2Ban handles all of them out of the box, and the operational cost is one config file per service.
The OWASP CRS plus ModSecurity stack, when tuned to paranoia level 2 or 3, catches the canonical SQL injection, XSS, and RCE payloads, plus some of the simpler credential-stuffing patterns through REQUEST-912-DOS-PROTECTION. As a baseline against unsophisticated attackers, it works.
What the entire category is bad at is what threshold-based defense was never designed to do: distributed, account-targeted, protocol-aware attacks. That gap is what we test for.
4. What you need beyond per-source thresholds
Layered controls, not stacked controls. Six categories:
Per-account thresholds at the auth service
Stop at the auth layer, not the WAF. After 10 fails in 15 minutes for username = X, return HTTP 423 Locked with Retry-After: 900, regardless of source IP. This is one to two days of engineering work and shuts down ENDPOINT-L7-EVASION-002 - the 23andMe pattern - completely. Combine with backoff (exponential up to a cap) and forced re-authentication after lockout to avoid creating a denial-of-service primitive against legitimate users via account enumeration.
Per-ASN bucketing at the WAF
For the distributed credential-stuffing case (ENDPOINT-L7-EVASION-001), per-IP buckets are useless. Resolve source IPs to ASN at request time (MaxMind GeoLite2, IPinfo, Cloudflare's cf.ip.asn field) and bucket on (asn, path, status). Cloudflare's WAF custom rules expose this directly:
(http.request.uri.path eq "/login" and ip.geoip.asnum in {7922 20115 22773}
and rate("60s") gt 100)
For self-hosted CrowdSec, write a parser that emits evt.Meta.source_asn and a scenario that buckets on it with distinct: evt.Meta.source_ip so a thousand IPs from one ASN counts as one source. This lets residential-proxy networks be detected without false-positiving on shared corporate egress.
Schema-aware GraphQL gates
Off-the-shelf WAFs see /graphql as a single endpoint. They cannot tell query { me { id } } from mutation { deleteAccount(id: 42) }. You need a GraphQL-aware layer:
graphql-shieldorgraphql-armorin front of the resolver layer for query-cost limits, alias caps, depth limits, and introspection control- Per-mutation rate limits keyed on
(user_id, mutation_name)not source IP - A CrowdSec scenario that fires on more than five
errors[].extensions.code == "FORBIDDEN"responses per IP per minute (the IDOR brute-force signal) - Disable introspection in production. Always.
HTTP/3 protocol parity
Audit every CrowdSec parser, Fail2Ban filter, and WAF custom rule for protocol-version anchoring. Replace HTTP/1\.1 with HTTP/[1-9]\.[0-9]+ and any explicit http.protocol == "HTTP/2" clause should expand to a list or be removed. Run synthetic HTTP/3 traffic through a staging VPC quarterly to confirm parity. RFC 9114 has been a Proposed Standard since June 2022; "we don't have HTTP/3 in production" is no longer a defensible answer for any internet-facing service behind Cloudflare, Fastly, or AWS CloudFront.
WebSocket frame-level inspection
Frame-level token buckets on the WS server, applied per connection. For socket.io, wire rate-limiter-flexible middleware on connection.handshake.address keyed on frame count. For ws on Node, custom middleware with socket.on('message', ...) and a per-connection counter. For Spring WebFlux, Reactor.limitRate(10) on the inbound Flux<WebSocketMessage>. For Go's gorilla/websocket, a time.Tick-based bucket in the read loop. Pair with sub-protocol locking per RFC 6455 Section 4.2.2.
CRLF sanitization at the log layer
Two-layer fix. At ingress, normalize CRLF in every header value: nginx ignore_invalid_headers on is the default but verify it's not been overridden, Envoy http_protocol_options.allow_chunked_length: false plus header-value validation, AWS ALB enforces this by default. At the log layer, structured logging only - JSON or syslog. A log line is a record, not a string. Embedded \r\n in a User-Agent value can never terminate a JSON record because JSON encoding handles the escape. This single change eliminates ENDPOINT-L7-EVASION-006 and ENDPOINT-L7-EVASION-007 (the <F-ID> poisoning variant) at the same time.
5. Closing
Threshold-based perimeter defense was the right architecture for 2014. It still works against the unsophisticated attacker who buys one VPS and points it at your /login endpoint. It does not work against the modern attacker, who rents a botnet of 10,000 residential proxies and runs distributed, account-targeted, protocol-aware campaigns at sub-threshold rates per source.
The thirteen tests above are not a hypothetical. They map directly to incidents that have already happened: 23andMe, Norton LifeLock, every published GraphQL IDOR HackerOne report, every CRLF-injection-becomes-DoS-blocklist case study, and the recurring HTTP/3 parity findings that show up in every Cloudflare cipher-and-protocol audit.
If you have CrowdSec or Fail2Ban deployed and assumed it covered credential stuffing, you do not have credential-stuffing coverage. You have brute-force-from-one-IP coverage. The fix is not to throw out CrowdSec and Fail2Ban - both are doing the job they were built for. The fix is to layer per-account thresholds at the auth service, per-ASN bucketing at the WAF, schema-aware gates for GraphQL, protocol-parity audits across H1/H2/H3, frame-level inspection at the WebSocket boundary, and CRLF sanitization at the log layer. None of these is a heroic engineering effort. Most are configuration, library wiring, or one to two days of focused work.
What ties them together is the shift in mental model: stop thinking about per-source thresholds and start thinking about per-resource saturation. The attacker is multi-source by default. Your defenses need to be multi-axis by default.
References
- OWASP API Security Top 10 (2023): https://owasp.org/API-Security/editions/2023/en/0x11-t10/
- OWASP Credential Stuffing: https://owasp.org/www-community/attacks/Credential_stuffing
- OWASP GraphQL Cheat Sheet: https://owasp.org/www-project-graphql-cheat-sheet/
- OWASP CRLF Injection: https://owasp.org/www-community/vulnerabilities/CRLF_Injection
- CWE-307 Improper Restriction of Excessive Authentication Attempts: https://cwe.mitre.org/data/definitions/307.html
- CWE-93 CRLF Injection: https://cwe.mitre.org/data/definitions/93.html
- CWE-117 Improper Output Neutralization for Logs: https://cwe.mitre.org/data/definitions/117.html
- CWE-770 Allocation of Resources Without Limits: https://cwe.mitre.org/data/definitions/770.html
- NIST SP 800-63B section 5.2.2: https://pages.nist.gov/800-63-3/sp800-63b.html
- PCI-DSS 4.0 Requirement 8.3.4: https://docs-prv.pcisecuritystandards.org/
- RFC 6455 (WebSocket Protocol): https://datatracker.ietf.org/doc/html/rfc6455
- RFC 6598 (CGNAT 100.64.0.0/10): https://datatracker.ietf.org/doc/html/rfc6598
- RFC 7239 (Forwarded HTTP Extension): https://datatracker.ietf.org/doc/html/rfc7239
- RFC 9110 (HTTP Semantics, section 5.5): https://datatracker.ietf.org/doc/html/rfc9110
- RFC 9114 (HTTP/3): https://datatracker.ietf.org/doc/html/rfc9114
- RFC 9460 (DNS HTTPS Resource Record): https://datatracker.ietf.org/doc/html/rfc9460
- CrowdSec scenarios reference: https://docs.crowdsec.net/docs/scenarios/intro
- CrowdSec hub base HTTP scenarios: https://hub.crowdsec.net/author/crowdsecurity/collections/base-http-scenarios
- CrowdSec CAPI / CTI: https://docs.crowdsec.net/docs/central_api/intro
- Fail2Ban filter tags wiki: https://github.com/fail2ban/fail2ban/wiki/Filter-Tags
- 23andMe credential-stuffing breach (Oct 2023, ~14k accounts compromised, 6.9M relatives exposed): public 8-K disclosure, December 2023
- HackerOne report #489146 (GraphQL
updateUser(id:42)IDOR): https://hackerone.com/reports/489146