Project 3: Reverse Engineering

I’m assuming that you’ve already done lab 7 by now. If you haven’t yet, you are not prepared for this project.

In this project, you’ll be reverse engineering an executable to figure out what inputs you need to type in to prevent the BOMB from exploding! AAAAAA!

…

It’s not a real bomb. It’s just a printf. It can’t hurt you.

You won’t be writing any C! In fact you won’t really even be looking at C at all! Just x86-64 assembly! AAAAAAAAAAA!

…

Yeah that’s a lot scarier than the “bomb,” huh.

Grading Rubric

[20] for phase 1
[20] for phase 2
[25] for phase 3
[25] for phase 4
[10] for …………. 🐮

0. Starting off

(Thank you so so so so mooch to Dr. Luis Filipe Nunes Quaresma de Oliveira for providing the server, executable generation, etc. for this project.)

Get your custom-tailored project from here: Download project materials.
- On that page, fill in the form with your Pitt username and email. For example, I would enter jfb42 and jfb42@pitt.edu. You will get a file in return :)
- That file is generated uniquely for you. Don’t share it with anyone!
Upload that file to your VM with your SFTP client (in ~/private/cs0449/projects/proj3/, preferably).
Log into your VM, cd to where you put the .tar file, and run the following command to extract it:
```
 tar xf <filename>.tar
```
- This will create a directory that contains some files.
cd into that directory and ls. You will see these 4 files:
- README - a plain text file that says whose bomb this is.
- ID - a plain text file with your username.
- bomb.c - the code for the main function… but nothing else!
- bomb - an executable file that YOU have to crack open!
Try running ./bomb. It threatens you and asks you to type something in. Go ahead. Type something.
- Oh no.

Your Task

Have a look at bomb.c. This is the source code for the main function of the bomb executable. main does some unimportant stuff before it prints out those introductory messages. Then you can see repeated pieces of code like this:

    input = read_line();             /* Get input                   */
    phase_1(input);                  /* Run the phase               */
    phase_defused();                 /* Drat!  They figured it out! */

phase_1, phase_2, etc. are the functions you need to reverse engineer. They all take a string as input, so you don’t have to worry about deducing argument numbers and types… at least not yet ;O

If any of the phase_ functions don’t like your input, they’ll print

BOOM!!!
The bomb has blown up.

and the program ends.

Your goal is to come up with inputs that make all 4 phase_ functions happy. Yep. Juuuust 4. Mhm. Wait, aren’t there 5 categories on the grading rubri—-

Making a `bomb.txt` (you’ll be submitting this!)

In order to avoid having to re-type the passwords for earlier phases when you get to later ones, you will make a file called bomb.txt that is just a plain text file containing your answers for each phase, one answer per line. For example:

this is my password for phase 1
blah blah blah
12345678
hahahahah

As you solve phases, put the passwords for each in this file. Then, you can run:

./bomb bomb.txt

And it will automatically type in the earlier passwords for you, letting you try the next phase. (Or it’ll finish all the phases, when you’re done.)

You will be submitting this file! I mean, finding the passwords is kinda the whole point.

Two important things:

bomb.txt must have Unix line endings. Windows users, you’re probably using VS Code, right? Click the “CRLF” in the bottom right and change it to “LF.”
bomb.txt must end in a blank line. I don’t know why, I thought the bombs accepted it without it, but apparently they don’t.

Setting breakpoints in functions where you don’t have the source

It’s entirely possible to reverse engineer all these functions statically (i.e. without running the code to see how it works.) However, unless you are fairly experienced and comfortable with reading assembly code, doing it entirely statically might be a bit MOOch for you.

The nice thing about dynamic analysis is that you can look at some code, have no idea what it does, stick a breakpoint after it, run it, then have a look around at local variables, register values etc. Often, that will tell you a lot about what the previous code just did. Or sometimes it’ll be super confusing and you’ll still have no idea. Lol!

To that end, you might want to set breakpoints inside of the phase_ functions so you can see what they are doing. But the bomb executable has no debugging info, so you can’t say to break on a specific line. Or can you? (Yes)

Let’s say you have this instruction that you want to break on that you see in the disassembly:

   0x00005555555554e2 <+9>:	call   0x555555555875 <read_number>

That huge hex number on the left is its memory address. In order to put a breakpoint on a memory address, you use the syntax b *0x0000.... That is, you put an * before the address. Just copy and paste it:

(gdb) b *0x00005555555554e2
Breakpoint 1 at 0x5555555554e2
(gdb)

There you go. Breakpoint set.

Uh… have at it!

What follow are some tips that will be helpful to you.

Some more x86 tips

Keep in mind: [square brackets] mean a memory access (load or store). So:

    mov rax, rbx

copies the value of rbx into rax. But

    mov rax, QWORD PTR[rbx]

performs a load from memory, using rbx as the address, and loads a QWORD (8-byte value) into rax. Big difference! This is also true of other non-mov instructions:

    cmp rax, rbx

compares the contents of two registers. But:

    cmp DWORD PTR[rax], rbx

compares the contents of the 4-byte (DWORD) variable at the address held in rax with the value in rbx. (This instruction has a load “built in”.)

In class I mentioned that lea is sometimes “misused” to perform a three-operand add or quick multiplication (or both). However, if it’s not being used to do that, remember its name - load effective address. That means it puts an address in a register. So:

    lea  rdi, [rsp+0x4]
    call func

puts the address rsp+0x4 into rdi. Since anything accessed relative to rsp/rbp is a local… this is like using the address-of operator & on a local variable. This code may have been generated from:

    int var;    // compiler places this at rsp+4
    func(&var); // passing the address of that local to func generates the lea

The “pre-running” state

When you first do gdb ./bomb, gdb loads the ./bomb executable into memory, but it doesn’t actually finish the dynamic linking yet. That doesn’t happen until you run the program.

Before you use run, the only things in memory are from the ./bomb executable. None of the libc.s0 stuff is there, or a whole pile of other shit that ./bomb dynamically links to.

So at that point, these two commands can be really helpful to get your bearings:

info functions
info variables

These list the functions and global variables that are in this executable, respectively. Remember from lab 7 that the func@plt functions are ones that are part of the standard library and can be safely ignored. The rest of the functions may or MOOay not be important to you!

Breakpoint issues in the pre-running state

Before the program is running, you can e.g. disas phase_1, too. But watch out: you cannot set a breakpoint in this mode. I mean, it’ll let you, but it’ll fail when you run the program.

(gdb) b *0x00000000000014e2
Breakpoint 1 at 0x14e2
(gdb) r
Starting program: /afs/pitt.edu/home/a/b/abc123/private/cs0449/projects/proj3/bomb5/bomb
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x14e2

See? It whines about “Cannot access memory at address 0x14e2.” That’s because when you run the program, all the code gets relocated into another address in memory.

To solve this: put a breakpoint on main, then run.

(gdb) b main
Breakpoint 1 at 0x12e9: file bomb.c, line 37.
(gdb) run
Starting program: /afs/pitt.edu/home/a/b/abc123/private/cs0449/projects/proj3/bomb5/bomb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main (argc=1, argv=0x7fffffffe5b8) at bomb.c:37
37	{
(gdb)

Now you can use disas phase_1 etc. and set breakpoints using those addresses.

`p`rinting registers

If you’re stopped at a breakpoint, you probably want to have a look at what’s in the register(s). If you want a dump of all the registers, info registers shows their contents in both hex and decimal. But uh… there are a lot. And most of the time you only really care about one or two.

You can use the print command to print out the contents of registers. You just have to put a $ in front of the register’s name:

(gdb) p $rax
$4 = 1
(gdb)

So that means rax contains the value 1.

If you think a register contains a pointer, you can print it in hex with p/x:

(gdb) p $rdi
$10 = 140737488346656
(gdb) p/x $rdi
$11 = 0x7fffffffde20
(gdb)

If you want to see what’s at that pointer, that’s what the x command is for (next section).

Sometimes looking at a number in a different base can help a lot! Remember that p is a pretty general command and you can use it as a calculator or whatever. If you have a weird-looking hex number, try printing it in decimal.

E`x`amining memory

These are all functions written in C, and as such they all use the stack to hold their local variables. (…most of the time.)

If you want to look at the items on the stack, you can use the x command to examine memory. You give it a count, a type, and optionally a base, and it prints it out. For example:

x/4wx $rsp interprets the 4 words (32-bit values) pointed to by rsp as hex.
x/10g $rsp will show you the 10 giant values (64-bit values…) pointed to by rsp.
- by default, x “reuses” the same base as the last time you used it, so you may have to do x/10gd to switch back to decimal.
x/s should be familiar to you from lab 7 - it’s how you examine strings! And this project is all about strings, right?

Stepping through assembly

If things get really rough, you may need to step through assembly one instruction at a time. This is often not really worth the effort, but sometimes it can be useful.

First, do set disassemble-next-line on. This will make it so whenever you run one instruction, you will see the next instruction that is about to be executed.

Then:

ni executes the current instruction and moves to the next.
si does the same, but it will follow call instructions.
- often you don’t want to go into the function that is called (like string_length, gee, what do you think that does? boring)…
- …but sometimes you do!

But honestly? Stepping through assembly one instruction at a time rarely gives you 🐮MoOch insight into what it’s doing. If you find yourself doing this and being super confused, take a break, take a step back, use another approach. Go do some static analysis on the function. Find the control flow first. Use what you know from 447 to guide you. Remember: every function was originally written in C, so it is going to “follow the rules” of control flow structures. Mostly. Sometimes the compiler likes to stuff two instructions after the ret and I don’t really know why??? Whatever

Phase 5

At this point you may have realized that there is a secret_cow_function. There are really two parts to this: figuring out how to make that function run, and then figuring out how to defuse it.

I can’t give you TOO much information but here are some hints:

It is called. It’s called by one of the functions that is called from main, so start investigating the functions that you see being called in bomb.c.
It’s called under a specific circumstance. You don’t have to go too deep reverse engineering what that circumstance is; you might be able to figure it out just based on context.
Once you’ve found it, reversing it isn’t too bad, but some notes:
- It’s not doing anything very tricky or weird.
- Trust the names of functions.
- It is doing something a bit weird with one of the registers, treating it as a normal local variable instead of its usual responsibility…

Submission

When the autograder is open, you will be required to submit two files:

your bomb executable
your bomb.txt file with the passwords

Additionally, you can submit a third file, notes.txt, that contains any notes that you have made about phases that you have not figured out. This is how the grader will give you partial credit for phases that you don’t have a password for.

Please make notes.txt a plain text file. Just create it in your code editor of choice. Do not take a .docx and rename it to .txt. That doesn’t work.

⬅ Project 3: Reverse Engineering

Due by 9:00 PM, Friday 4/4 (or late Saturday 4/5)

Grading Rubric

0. Starting off

Your Task

Making a `bomb.txt` (you’ll be submitting this!)

Setting breakpoints in functions where you don’t have the source

Uh… have at it!

Some more x86 tips

The “pre-running” state

Breakpoint issues in the pre-running state

`p`rinting registers

E`x`amining memory

Stepping through assembly

Phase 5

Submission

Grading Rubric

0. Starting off

Your Task

Making a bomb.txt (you’ll be submitting this!)

Setting breakpoints in functions where you don’t have the source

Uh… have at it!

Some more x86 tips

The “pre-running” state

Breakpoint issues in the pre-running state

printing registers

Examining memory

Stepping through assembly

Phase 5

Submission

Making a `bomb.txt` (you’ll be submitting this!)

`p`rinting registers

E`x`amining memory