Regular expressions (regex or regexp) are everywhere on the Internet nowadays!
From input validation code, web application firewalls, to malware detection rules, regex is becoming one of the most important cybersecurity technologies in use.
But can regex also lead to vulnerabilities? Today, let’s explore how attackers can exploit poorly written regex patterns to launch denial of service attacks!
Regular expression Denial of Service
Regular expression Denial of Service attacks, or ReDoS, is a type of denial of service attack. During a ReDoS, attackers force a situation where the regex evaluator gets stuck evaluating a string and runs for a long time. When an application is hit by a ReDoS, the application server can hang and become unavailable to legitimate users.
Attackers can launch a ReDoS when two conditions are met:
An “evil regex” pattern is used, and
The regex engine is made to test if a maliciously crafted string matches the evil regex.
A regex pattern is exploitable (called an “evil regex”) when it has the following characteristics:
It uses repetition on complex subexpressions (the use of “+” and “*” on complex subexpressions), and
within these repeated subexpressions, there are additional repetition symbols and expressions that match a suffix of another match.
These characteristics make it possible for attackers to force long run times on the evaluator.
For example, this regex pattern is an “evil regex”.
The subexpression in the parenthesis searches for “a” and its repetition. It also checks if the entire expression is a repetition of any of the subexpressions! Thus, if an attacker supplies an input such as:
In the string above, each additional “a” will cause the regex evaluator to check for the repetitions of an additional subexpression, thus doubling the amount of processing time for the regex evaluator. Let’s look at this process in detail. First, the regex engine tries to match (a+). Since the “+” operator is greedy by default, it will match all the “a”s.
(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)! -> One possibility to try
Next, the regex tries to match for the string end character, “$”. But there is no match since we have a “!” character there. Since there is no match, the “+” operator will decrease the count of repetitions and backtrack within the string.
(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)(a)! -> One possibility to try
The regex now matches two groups of (a+), and the engine tries to match the end of string character again but fails again because of the “!” character. The evaluation again backtracks. This time, the leftover two “a”s can be matched in two ways: (aa) and (a)(a). The engine will first try to match (aa), fail at the end of string character, then try again using the grouping (a)(a).
(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)(aa)! (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)(a)(a)! -> Two possibilities to try
As long as the match keeps failing, the regex engine will keep retrying by backtracking. This is called “catastrophic backtracking” and will cause the time of evaluation to grow exponentially.
(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)(aaa)! (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)(aa)(a)! (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)(a)(aa)! (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)(a)(a)(a)! -> Four possibilities to try
How to Find ReDoS Vulnerabilities
There are two scenarios that allow attackers to exploit evil regex patterns and launch a ReDoS.
User can provide custom regexes
Sometimes an application provides users the ability to inject their own regex patterns to be executed by the server. In this case, the attacker can inject an evil regex of her choice and then submit string input that would cause the evil regex to run for a long time.
One example of this is when an application checks if a user’s password contains their username during registration. When a user’s password contains their username, the user would not be allowed to use that password.
If the application blindly concatenates the username with a regular expression without sanitizing regex operators, the malicious user can inject evil regex via the username field.
For example, the following function is vulnerable to evil regex injection:
check_password(username, password): pattern = username if regex.match(pattern, password): print("Password contains username. Please provide another password.") else: print("Registration complete!")
In this case, if an attacker provides an evil regex pattern as the username, and a crafted attack string as the password, she can cause the regex function to run for a long time and hang the server.
username: ^(a+)+$ password: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
Application code contains evil regex
On the other hand, sometimes it’s the application’s code that contains exploitable regex patterns. If an attacker is able to guess or find where this regex pattern is used, she can craft a malicious string to exploit it and send it over to be processed by the regex evaluator.
Attackers can gather this intel in multiple ways. For example, it is not uncommon for organizations to open source their software and publish their source code online. They might also perform the same regex checks both on the server-side and on the client-side. And in doing so, expose its server-side regex patterns to users. Or maybe, the application might be using software a component that contains a known ReDoS vulnerability.
In these cases, attackers can study the regex patterns used and craft custom input designed to crash the evaluator.
To test ways of crashing a particular regex pattern, you can use a regex tester like this one. Notice how much the processing time increases with each additional “a” you add to the input.
Eliminating ReDoS Vulnerabilities
Fortunately, there are many ways you can prevent ReDoS from happening. Depending on the tech stack of your application, here are a few options.
Avoid using regex
The most foolproof way of avoiding ReDoS attacks is to avoid using regex. Instead, find alternative methods that can achieve the same results. For example, a good way of rewriting the above
check_password function is to use string operations such as
But beware! Although this method prevents ReDoS completely, you might open the doors to other injection vulnerabilities, depending on the string operations you choose.
Use safe regex engines
Instead of using built-in, unsafe regex engines (like the Node.js regex engine), you can opt to use safe alternatives instead.
For example, re2 is a safe alternative that you can use without fear of ReDoS:
Detect and sanitize evil regexes
You can also prevent ReDoS by detecting evil regex in your code or in user input, then sanitizing them. For this, you can utilize evil regex detection libraries like safe-regex: