Regular Expression Denial of Service (ReDoS)¶
Overview¶
Implementations MUST avoid regular expressions vulnerable to catastrophic backtracking.
The Problem¶
Certain regex patterns exhibit exponential time complexity on specific inputs:
# Vulnerable pattern
(a+)+$
# Attack string: "aaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
# Time: O(2^n) where n = number of 'a's
Vulnerable Patterns¶
Nested Quantifiers¶
# Bad
(a+)+
(a*)*
(a+)*
(.*)+
# Good
a+
a*
Overlapping Alternations¶
# Bad
(a|a)+
(ab|a)+b
# Good
a+
a+b?
Patterns in Beancount Context¶
| Use Case | Vulnerable | Safe Alternative |
|---|---|---|
| Account match | (:[A-Za-z]+)+ |
(:[A-Za-z]+)* with length limit |
| Currency | [A-Z]+ |
[A-Z]{1,24} |
| Number | [0-9,]+ |
[0-9]{1,50}(,[0-9]{3})* |
Requirements¶
Implementations MUST:
- Audit all regex patterns for ReDoS vulnerability
- Use non-backtracking engines when available (RE2, rust regex)
- Set match timeouts if using backtracking engines
- Limit input length before regex matching
Safe Regex Engines¶
| Engine | Backtracking | ReDoS Safe |
|---|---|---|
| RE2 (Google) | No | Yes |
Rust regex |
No | Yes |
| Hyperscan | No | Yes |
| PCRE | Yes | No |
Python re |
Yes | No |
| JavaScript | Yes | No |
Mitigation Strategies¶
1. Use Linear-Time Engines¶
// Rust: regex crate is safe by design
use regex::Regex;
let re = Regex::new(r"pattern").unwrap(); // guaranteed O(n)
2. Set Timeouts¶
# Python: use regex with timeout
import regex
regex.match(pattern, string, timeout=1.0)
3. Limit Input Length¶
# Check length before matching
if len(input) > MAX_LENGTH:
raise ValueError("Input too long")
result = re.match(pattern, input)
4. Avoid Regex for Simple Parsing¶
# Bad: regex for fixed format
date = re.match(r"(\d{4})-(\d{2})-(\d{2})", line)
# Good: direct parsing
year, month, day = line[:4], line[5:7], line[8:10]
Testing¶
Test with attack strings:
attack_strings = [
"a" * 30 + "!", # Nested quantifier attack
"a" * 30 + "b" * 30, # Overlapping alternation
" " * 1000 + "x", # Leading whitespace
]
Verify each completes in < 100ms.