Work on this lab with a friend! For this lab only, it’s not cheating. Each one of you do one part, then talk to each other about it and see if you got the same answers.

Debugging

Oh, that’s what happened.

Debugging is the process of figuring out why your program is broken. Not only do you fix your bugs, but you also get a deeper understanding about why your program is incorrect. Then you can avoid those mistakes in the future.

A debugger is a tool to help with this process. If your program fails, a debugger lets you watch it fail step-by-step, so that you can figure out what went wrong, and when. I love this quote: “a debugger lets you watch your program crash in slow motion.”

printf debugging is probably all you’ve used so far: you stick a bunch of prints into your code to print out the values of variables, or to say "got here". You can get pretty far with this, but it’s tedious.

How does a debugger like gdb work?

A debugger is a sort of “supervisor.” It has full control of your program: it can pause, resume, run it step-by-step, look at all the variables, change all the variables, etc.

Probably the most important thing a debugger can do is pause with something called a breakpoint. A breakpoint is a way of telling the debugger, “when my program gets to this line, pause it!”

Once the program is paused, you can look at everything, see where you are, see what went wrong, and so on. It’s like stopping time. This is an incredibly powerful ability that lets you narrow in on bugs extremely quickly.


1. Things are bad!

  1. Login to thoth and cd into your private directory. Make a directory for this lab, and in there, wget this file.
  2. Take a look at its contents. There are some mistakes in there, but don’t fix them yet!
  3. Compile it with gcc -o lab5 lab5.c. It whines but it does produce an executable.
  4. Run it with ./lab5. It gives you a new, more exciting kind of error: a floating point exception!

Well by now your first instinct should be to **run the program in gdb to see what’s wrong.

  1. Recompile it with debug info by adding the -g flag: gcc -g -o lab5 lab5.c
  2. Run it in gdb: gdb ./lab5, run
Program received signal SIGFPE, Arithmetic exception.
0x00005555555551c7 in fun () at lab5.c:10
10		c = a / b;

That’s weird, didn’t it say “floating point exception before?” I don’t see any floats here. Well, SIGFPE is sent for any kind of arithmetic error, integer or floating point. (We’ll talk about signals like SIGFPE soon.)

I think it’s pretty obvious why this is crashing, but let’s learn a new command that can show you the contents of the local variables: info locals:

(gdb) info locals
a = 5
b = 0
c = 52

Well well well. b is 0. We can’t divide by 0 now can we?

Exit gdb, fix the code (change b to something else), recompile, and rerun.


2. Things are still bad!

When we run the program now, it says Result is <something> without crashing. But then… we crash again. Good old segfault. If we run it in gdb, there is where it’s crashing:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555555256 in less_fun () at lab5.c:22
22			printf("*q = %d\n", *q);

Let’s print out the locals again.

Notice – this pointer is definitely not NULL, but it crashes anyway, since it’s definitely not valid.

(gdb) info locals
p = 0x5555555596b0
q = 0x2d

Uh. Ok. That’s weird. q doesn’t look like a proper pointer value.

Let’s watch less_fun crash in slow motion. We can do this by setting a breakpoint on it. This tells gdb “pause right before this thing gets executed.”

(gdb) break less_fun
Breakpoint 1 at 0x5555555551f5: file lab5.c, line 15.

Now we can restart the program with the r (run) command again.

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /afs/pitt.edu/home/j/f/jfb42/private/cs0449/examples/lab5
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Result is 2

Breakpoint 1, less_fun () at lab5.c:15
15		int* p = malloc(sizeof(int) * 4);

Okay. It says we’re on the malloc line, which means that is the line about to be run. It hasn’t run yet.

If we use the n (next) command, it will run that line of code and go to the next line. There’s a loop there so it’s a little bit annoying, since it has to go through the loop 4 times:

(gdb) n
16		for(int i = 0; i < 4; i++) {
(gdb) n
17			p[i] = 50 * (i + 1);
(gdb) n
16		for(int i = 0; i < 4; i++) {
(gdb) n
17			p[i] = 50 * (i + 1);
etc.............

But eventually we get to:

(gdb)
19		*p = 45;
(gdb) n
20		int* q = *p;

Okay. Now p has been assigned, the array has been filled in, and then *p has been assigned. Let’s print those out with… print, or p for short!

(gdb) print p
$1 = (int *) 0x5555555596b0
(gdb) print *p
$2 = 45

You can use any C expression with print. For example I could even write this:

(gdb) print p[2]
$3 = 150

Amazingly, you can even use print/p to call functions. Like sbrk. Boy that might be useful for the project, huh?

(gdb) p sbrk(0)
$5 = (void *) 0x55555557a000

Back on track. Let’s use n to run the int* q = *p; line. Then, let’s print q.

(gdb) n
21		if(q != NULL) {
(gdb) p q
$6 = (int *) 0x2d

Uh… okay. Wait. 0x2d. What is that in decimal? We can use p/d to print it in decimal.

(gdb) p/d q
$7 = 45

Oh!!!!!!!! 45. 45 is the value we put in *p. Ohhhh that’s what that compiler warning was about “makes pointer from integer” blah blah ohhhh okay. Okay. Okay.

Now you know what’s wrong, and you can fix that bug.

Uh oh. There’s another bug now.


3. The last bug

The last bug is happening here:

Program received signal SIGSEGV, Segmentation fault.
0x00005555555552f4 in list_fun () at lab5.c:43
43			printf("%d -> ", t->value);

Ooh, list_fun. Sounds like some list fun.

Let’s look at the locals with info locals (or i lo for short):

(gdb) i lo
t = 0x555555559
c = 0x5555555596b0
b = 0x5555555596d0
a = 0x5555555596f0

Huh… that’s weird. a, b, c look relatively normal, but t looks a little… off.

If you don’t wanna switch over to your editor, you can actually have gdb list your code for you. Try list fun, then just hit enter a second time (that usually repeats the last command in gdb):

(gdb) list list_fun
27	typedef struct Thing {
28		struct Thing* next;
29		int value;
30	} Thing;
31
32	void list_fun() {
33		Thing* c = malloc(sizeof(Thing));
34		Thing* b = malloc(sizeof(Thing));
35		Thing* a = malloc(sizeof(Thing));
36		a->value = 10;
(gdb)
37		a->next = b;
38		a->value = 20;
39		b->next = c;
40		a->value = 30;
41
42		for(Thing* t = a; t != NULL; t = t->next) {
43			printf("%d -> ", t->value);
44		}

43 is the line that’s crashing. Now we have a better idea of what’s going on. We can see the struct, and then this function that creates a singly-linked list of that struct on the heap. Looks like a->next is b, and b->next is c.

If we want to look at the list in gdb, we kinda can! You can dereference a pointer in a print command, and if it’s a struct, gdb prints out all the fields of the struct for you:

(gdb) p *a
$2 = {next = 0x5555555596d0, value = 30}

Hey, that’s pretty cool. So what’s *a->next?

(gdb) p *a->next
$3 = {next = 0x5555555596b0, value = 0}
(gdb) p *b
$4 = {next = 0x5555555596b0, value = 0}

It’s the same thing as *b.

What about *c?

(gdb) p *c
$5 = {next = 0x555555559, value = 0}

Hey, there’s that weird address from the segfault. That’s what t is.

(gdb) p t
$6 = (Thing *) 0x555555559

Wait a second. c is the tail of the list, right? Shouldn’t its next be NULL?

Let’s do something a little fancy. Instead of editing the program right away, here’s what we’ll do:

  1. Set a breakpoint on line 42 by doing b 42.
    • hey, that would have been nice in the previous function to break after the loop, huh?
  2. Restart the program with r.
  3. The loop has not run yet. If you p c->next right now, you get that weird address.
  4. Do p c->next = 0
    • Yes: you can even do assignments to change variables in gdb.
  5. Finally, use the continue (c) command to resume execution of the program.
30 -> 0 -> 0 ->
[Inferior 1 (process 2803610) exited normally]

Look at that. It printed out the list and exited normally! …sorta! Didn’t the code try to make the list contain 10, 20, and 30? Looks like there’s the worst kind of bug: a copy-and-paste bug 😱

Now of course my modification to c->next didn’t change the source code. I’d have to insert c->next = NULL in my code to actually fix that bug. But it’s neat that you can even try out attempted bugfixes while running the program, huh?


Common commands

Now you can play with gdb some more. Try it on your previous labs, or use it while you work on project 2!

You can learn more about all of the commands by typing help, or on a specific command by typing help command_name e.g. help bt

Command Shortcut Description
help   Get help on a command or topic
apropos   Search the help for a term
set args   Set command-line arguments (alternative to gdb --args)
run r Run (or restart) a program
quit q Exit gdb
break b Place a breakpoint at a given location
continue c Continue running the program after pausing
backtrace bt, back Show the function call stack
where   Same as backtrace
next n Go to next line of source code (doesn’t follow calls)
step s Go to next line of source code (follows calls)
nexti ni Go to the next instruction (doesn’t follow calls)
stepi si Go to the next instruction (follows calls)
print p Print the value of an expression written in C notation
x   Examine the contents of a memory location (pointer)
list l List the source code of the program
disassemble disas List the assembly code of the program

Important takeaways


Submission

You don’t have to submit any code. Just go on Gradescope and “do” lab 5. :)