Using Binary Ninja for Vulnerability Research

Introduction

Ever since @cetfor’s video on Auditing system calls for command injection vulnerabilities using Binary Ninja’s HLIL, I have been wanting to learn more about how you can automate discovering program incorrectness statically.

In this blog series, I will demonstrate to the readers how Binary Ninja can help you in your vulnerability research. This series will be more on the basic of Binary Ninja Scripting rather then program analysis.

I should also recommend anyone that wants to learn more about this topic to take Margin Research / Vector35 PAVR (Program Analyiss For Vulnerability Research) training by @Calaquendi44 and @psifertex.

Requirement

Binary Ninja personal or commerical license
- Note that at the time of this writing, I am using version 2.4.3072-dev.
Snippets plugin

Optional but not required

Firmware is needed if you would like to run this script on real target

upnpd binary from firmware mentioned on Grimm Blog

Target

Netgear SOHO Devices upnpd Service Pre-Authentication Stack Overflow. Read more about the bug @ Grimm Blog

I have made a program that imitates the bug shown above. Bug happens where there is a user controlled input that copies string into stack buffer as long as its less then 1024 bytes. It does not check the size of the passed array.

  
#include <stdio.h>
#include <string.h>

// Function that passes user controlled buffer and copies data.
// Problem here is that overflow_buf should always be initialized with size
// greater than or equal to 1024 (0x400)
void find_token_get_val(char * overflow_buf, char * user_control_data) {
    size_t user_data_len = strlen(user_control_data);
    if (user_data_len < 1024) {
        strncpy(overflow_buf, user_control_data, user_data_len);
    }
}

int main(int argc, char * argv[]) {
    char overflow_buf[64];
    memset(overflow_buf, 0, sizeof(overflow_buf));
    find_token_get_val(overflow_buf, argv[1]);
}

Purpose

Looking at the simpiled bug above, it made me wonder if there are any more bugs like this in the same binary. I think it is not reasonable for a function to expect programmers to make sure that the buffer they are passing is always set to size greater than or equal to 1024 bytes. Usually there should be an additional check on the buffer that gets passed into the function just in case the programmer passes in invalid buffer size. So I wanted to see if there are more bugs like this. If they made this kind of mistake before, I believe there might be a similar bug somewhere else in the binary.

Analysis Plan

Get all the callers for memset function
Iterate through all the callers and track the buffer/buffer size that gets passed to memset
If there is a call instruction, check if it is find_token_get_val and it uses memset_var

Choosing BNIL

On this blog, I choose to use HLIL because HLIL would fold aliased variable for me so I don’t have to track state of varible we iterate forward. You can see that the MLIL buffer gets aliased many times. If you have to script this, you would need to track the reference of the variable. In HLIL You don’t have to worry about that! It is folded.

MLIL

HLIL

Depending on which analysis you do, switching IL might be a good idea. Most people seem to like to do this kind of analysis on MLIL since HLIL might fold instruction you might be looking for. You can learn more about the difference here BNIL Overview

Final Script

Here is the end result. I will explain the code below

  
def check_var_in_target(instr: HighLevelILCall, memset_vars: dict):
    if isinstance(instr.dest, HighLevelILConstPtr):
        if bv.get_function_at(instr.dest.constant).name == "find_token_get_val":
            overflow_buf = instr.params[3]
            if isinstance(overflow_buf, HighLevelILAddressOf):
                if isinstance(overflow_buf.src, HighLevelILVar):
                    ofbuf_var = overflow_buf.src.var
                    if ofbuf_var in memset_vars.keys():
                        print(f"{ofbuf_var.name}[0x{memset_vars[ofbuf_var][1]:x}] | memset @ 0x{memset_vars[ofbuf_var][0].address:x} -> {bv.get_function_at(instr.dest.constant).name}(0x{bv.get_function_at(instr.dest.constant).start:x}) @ 0x{instr.address:x}")

def main():
    memset_callers = None
    memset_func = None
    for func in bv.functions:
        if "memset" == func.name:
            memset_func = func
            memset_callers = set(func.callers)
            break

    for func in memset_callers:
        memset_vars = {}
        for bb in func.hlil:
            for instr in bb:
                if isinstance(instr, HighLevelILCall):
                    if instr.dest.constant == memset_func.start:
                        if isinstance(instr.params[0], HighLevelILAddressOf):
                            memset_vars[instr.params[0].src.var] = [
                                instr,
                                instr.params[2].constant,
                            ]
                    else:
                        check_var_in_target(instr, memset_vars)

                elif isinstance(instr, HighLevelILIf):
                    if isinstance(instr.condition, HighLevelILCmpE):
                        call_instr = instr.condition.left
                        if isinstance(call_instr, HighLevelILCall):
                            check_var_in_target(call_instr, memset_vars)

    print("DONE!")

main()

Writing our Binja Script in Snippets

Iterating through Binary Ninja HLIL Instruction

Get memcpy function from bv (BinaryView)
Get list of callers that calls memset
Iterate through func.hlil -> bb -> instr

  
memset_callers = None
memset_func = None
for func in bv.functions:
    if "memset" == func.name:
        memset_func = func
        memset_callers = set(func.callers)
        break

for func in memset_callers:
    # stores memset var and size. memset(var_1, 0, 0x30) will store {'var1': 0x30}
    memset_vars = {}
    for bb in func.hlil:
        for instr in bb:

Here is how Binary Ninja organizes instructions. Source by @carste1n

Store memsets var and size into function scoped dictionary

Look for HLIL instruction (instr.operation) that is HLIL_CALL
Check if instr.dest.constant is equal to address of memset.start (memset address)
Store instr.param[0].src.var.name (var name) and instr.param[2].constant (var size) into memset_vars dictionary.

Using BNIL Instruction Graph (Available in Plugins Manager), We are able to visualize all of HLIL_CALL expressions

memset BNIL instr graph

  
for func in memset_callers:
    memset_vars = {}
    for bb in func.hlil:
        for instr in bb:
            if isinstance(instr, HighLevelILCall):
                if instr.dest.constant == memset_func.start:
                    if isinstance(instr.params[0], HighLevelILAddressOf):
                        memset_vars[instr.params[0].src.var] = [instr, instr.params[2].constant]

Check if it calls find_token_get_val using memset var

find_token_get_val instr graph

  
def check_var_in_target(instr: HighLevelILCall, memset_vars: dict):
    if isinstance(instr.dest, HighLevelILConstPtr):
        if bv.get_function_at(instr.dest.constant).name == "find_token_get_val":
            overflow_buf = instr.params[3]
            if isinstance(overflow_buf, HighLevelILAddressOf):
                ofbuf_var = overflow_buf.src.var
                if ofbuf_var in memset_vars.keys():
                    print(f"{ofbuf_var.name}[0x{memset_vars[ofbuf_var][1]:x}] | memset @ 0x{memset_vars[ofbuf_var][0].address:x} -> {bv.get_function_at(instr.dest.constant).name}(0x{bv.get_function_at(instr.dest.constant).start:x}) @ 0x{instr.address:x}")

Check Result

Imitated Program Result

Real UPNPD Program Result

Welp, that doesn’t make sense. It only gave us one result when there should be about 19 of them. Also, it didn’t really find the bug we were looking for either. So what gives?

Adding more expressions

If you look at the BNIL instr graph for real find_token_get_val, it will make more sense. We need to write a case where the call operation is inside the if operation.

BNIL instr graph for find_token_get_val

  
elif isinstance(instr, HighLevelILIf):
    if isinstance(instr.condition, HighLevelILCmpE):
        call_instr = instr.condition.left
        if isinstance(call_instr, HighLevelILCall):
            check_var_in_target(call_instr, memset_vars)

Final Result

After adding the expression it finds the 0x40 byte buffer!

Improvements

The final result was pretty bad. It really only gave us one other instance of this pattern. Well at most there are about 20 of them for find_token_get_val function. When I started writing this script, the purpose of this script was actually more generic. Instead of matching on only find_token_get_val, I matched on every single function that used memset_var. Then I would also check if it is using the var to either memcpy, strcpy, and etc. Then Check if the size that I am writing is bigger than the actual buffer size. But to do so we would need to learn about SSA!

Conclusion

Binary Ninja is great SRE when you want to do program analysis or automate things. Try modifying this script to be more generic and running the script again on a real target. You will get alot more results. Thank you all for reading my blog!