Introduction
Ever since @cetfor’s video on Auditing system calls for command injection vulnerabilities using Binary Ninja’s HLIL, I have been wanting to learn more about how you can automate discovering program incorrectness statically.
In this blog series, I will demonstrate to the readers how Binary Ninja can help you in your vulnerability research. This series will be more on the basic of Binary Ninja Scripting rather then program analysis.
I should also recommend anyone that wants to learn more about this topic to take Margin Research / Vector35 PAVR (Program Analyiss For Vulnerability Research) training by @Calaquendi44 and @psifertex.
Requirement
- Binary Ninja personal or commerical license
- Note that at the time of this writing, I am using version
2.4.3072-dev
.
- Note that at the time of this writing, I am using version
- Snippets plugin
Optional but not required
Firmware is needed if you would like to run this script on real target
- upnpd binary from firmware mentioned on Grimm Blog
Target
Netgear SOHO Devices upnpd Service Pre-Authentication Stack Overflow. Read more about the bug @ Grimm Blog
I have made a program that imitates the bug shown above. Bug happens where there is a user controlled input that copies string into stack buffer as long as its less then 1024 bytes. It does not check the size of the passed array.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <stdio.h>
#include <string.h>
// Function that passes user controlled buffer and copies data.
// Problem here is that overflow_buf should always be initialized with size
// greater than or equal to 1024 (0x400)
void find_token_get_val(char * overflow_buf, char * user_control_data) {
size_t user_data_len = strlen(user_control_data);
if (user_data_len < 1024) {
strncpy(overflow_buf, user_control_data, user_data_len);
}
}
int main(int argc, char * argv[]) {
char overflow_buf[64];
memset(overflow_buf, 0, sizeof(overflow_buf));
find_token_get_val(overflow_buf, argv[1]);
}
Purpose
Looking at the simpiled bug above, it made me wonder if there are any more bugs like this in the same binary. I think it is not reasonable for a function to expect programmers to make sure that the buffer they are passing is always set to size greater than or equal to 1024 bytes. Usually there should be an additional check on the buffer that gets passed into the function just in case the programmer passes in invalid buffer size. So I wanted to see if there are more bugs like this. If they made this kind of mistake before, I believe there might be a similar bug somewhere else in the binary.
Analysis Plan
- Get all the callers for
memset
function - Iterate through all the callers and track the
buffer
/buffer size
that gets passed tomemset
- If there is a call instruction, check if it is
find_token_get_val
and it usesmemset_var
Choosing BNIL
On this blog, I choose to use HLIL because HLIL would fold aliased variable for me so I don’t have to track state of varible we iterate forward. You can see that the MLIL
buffer gets aliased many times. If you have to script this, you would need to track the reference of the variable. In HLIL You don’t have to worry about that! It is folded.
MLIL
HLIL
Depending on which analysis you do, switching IL might be a good idea. Most people seem to like to do this kind of analysis on MLIL since HLIL might fold instruction you might be looking for. You can learn more about the difference here BNIL Overview
Final Script
Here is the end result. I will explain the code below
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def check_var_in_target(instr: HighLevelILCall, memset_vars: dict):
if isinstance(instr.dest, HighLevelILConstPtr):
if bv.get_function_at(instr.dest.constant).name == "find_token_get_val":
overflow_buf = instr.params[3]
if isinstance(overflow_buf, HighLevelILAddressOf):
if isinstance(overflow_buf.src, HighLevelILVar):
ofbuf_var = overflow_buf.src.var
if ofbuf_var in memset_vars.keys():
print(f"{ofbuf_var.name}[0x{memset_vars[ofbuf_var][1]:x}] | memset @ 0x{memset_vars[ofbuf_var][0].address:x} -> {bv.get_function_at(instr.dest.constant).name}(0x{bv.get_function_at(instr.dest.constant).start:x}) @ 0x{instr.address:x}")
def main():
memset_callers = None
memset_func = None
for func in bv.functions:
if "memset" == func.name:
memset_func = func
memset_callers = set(func.callers)
break
for func in memset_callers:
memset_vars = {}
for bb in func.hlil:
for instr in bb:
if isinstance(instr, HighLevelILCall):
if instr.dest.constant == memset_func.start:
if isinstance(instr.params[0], HighLevelILAddressOf):
memset_vars[instr.params[0].src.var] = [
instr,
instr.params[2].constant,
]
else:
check_var_in_target(instr, memset_vars)
elif isinstance(instr, HighLevelILIf):
if isinstance(instr.condition, HighLevelILCmpE):
call_instr = instr.condition.left
if isinstance(call_instr, HighLevelILCall):
check_var_in_target(call_instr, memset_vars)
print("DONE!")
main()
Writing our Binja Script in Snippets
Iterating through Binary Ninja HLIL Instruction
- Get memcpy function from bv (BinaryView)
- Get list of callers that calls memset
- Iterate through func.hlil -> bb -> instr
1
2
3
4
5
6
7
8
9
10
11
12
13
memset_callers = None
memset_func = None
for func in bv.functions:
if "memset" == func.name:
memset_func = func
memset_callers = set(func.callers)
break
for func in memset_callers:
# stores memset var and size. memset(var_1, 0, 0x30) will store {'var1': 0x30}
memset_vars = {}
for bb in func.hlil:
for instr in bb:
Here is how Binary Ninja organizes instructions. Source by @carste1n
Store memsets var and size into function scoped dictionary
- Look for HLIL instruction (instr.operation) that is HLIL_CALL
- Check if
instr.dest.constant
is equal to address of memset.start (memset address) - Store
instr.param[0].src.var.name
(var name) andinstr.param[2].constant
(var size) intomemset_vars
dictionary.
Using BNIL Instruction Graph (Available in Plugins Manager), We are able to visualize all of HLIL_CALL expressions
memset BNIL instr graph
1
2
3
4
5
6
7
8
for func in memset_callers:
memset_vars = {}
for bb in func.hlil:
for instr in bb:
if isinstance(instr, HighLevelILCall):
if instr.dest.constant == memset_func.start:
if isinstance(instr.params[0], HighLevelILAddressOf):
memset_vars[instr.params[0].src.var] = [instr, instr.params[2].constant]
Check if it calls find_token_get_val using memset var
find_token_get_val instr graph
1
2
3
4
5
6
7
8
def check_var_in_target(instr: HighLevelILCall, memset_vars: dict):
if isinstance(instr.dest, HighLevelILConstPtr):
if bv.get_function_at(instr.dest.constant).name == "find_token_get_val":
overflow_buf = instr.params[3]
if isinstance(overflow_buf, HighLevelILAddressOf):
ofbuf_var = overflow_buf.src.var
if ofbuf_var in memset_vars.keys():
print(f"{ofbuf_var.name}[0x{memset_vars[ofbuf_var][1]:x}] | memset @ 0x{memset_vars[ofbuf_var][0].address:x} -> {bv.get_function_at(instr.dest.constant).name}(0x{bv.get_function_at(instr.dest.constant).start:x}) @ 0x{instr.address:x}")
Check Result
Imitated Program Result
Real UPNPD Program Result
Welp, that doesn’t make sense. It only gave us one result when there should be about 19 of them. Also, it didn’t really find the bug we were looking for either. So what gives?
Adding more expressions
If you look at the BNIL instr graph for real find_token_get_val
, it will make more sense. We need to write a case where the call operation is inside the if operation.
BNIL instr graph for find_token_get_val
1
2
3
4
5
elif isinstance(instr, HighLevelILIf):
if isinstance(instr.condition, HighLevelILCmpE):
call_instr = instr.condition.left
if isinstance(call_instr, HighLevelILCall):
check_var_in_target(call_instr, memset_vars)
Final Result
After adding the expression it finds the 0x40 byte buffer!
Improvements
The final result was pretty bad. It really only gave us one other instance of this pattern. Well at most there are about 20 of them for find_token_get_val function. When I started writing this script, the purpose of this script was actually more generic. Instead of matching on only find_token_get_val, I matched on every single function that used memset_var. Then I would also check if it is using the var to either memcpy, strcpy, and etc. Then Check if the size that I am writing is bigger than the actual buffer size. But to do so we would need to learn about SSA!
Conclusion
Binary Ninja is great SRE when you want to do program analysis or automate things. Try modifying this script to be more generic and running the script again on a real target. You will get alot more results. Thank you all for reading my blog!