Work on this lab with a friend! For this lab only, it’s not cheating. Each one of you do the same part at the same time, then talk to each other about it and see if you got the same answers.
Debugging
Oh, that’s what happened.
Debugging is the process of figuring out why your program is broken. Not only do you fix your bugs, but you also get a deeper understanding about why your program is incorrect. Then you can avoid those mistakes in the future.
A debugger is a tool to help with this process. If your program fails, a debugger lets you watch it fail step-by-step, so that you can figure out what went wrong, and when. I love this quote: “a debugger lets you watch your program crash in slow motion.”
printf
debugging is probably all you’ve used so far: you stick a bunch of prints into your code to print out the values of variables, or to say "got here"
. You can get pretty far with this, but it’s tedious.
How does a debugger like gdb
work?
A debugger is a sort of “supervisor.” It has full control of your program: it can pause, resume, run it step-by-step, look at all the variables, change all the variables, etc.
Probably the most important thing a debugger can do is pause with something called a breakpoint. A breakpoint is a way of telling the debugger, “when my program gets to this line, pause it!”
Once the program is paused, you can look at everything, see where you are, see what went wrong, and so on. It’s like stopping time. This is an incredibly powerful ability that lets you narrow in on bugs extremely quickly.
1. Things are bad!
- Login to thoth and
cd
into your private directory. Make a directory for this lab, and in there,wget
this file. - Take a look at its contents. There are some mistakes in there, but don’t fix them yet!
- Compile it with
gcc -o lab5 lab5.c
. It whines but it does produce an executable. - Run it with
./lab5
. It gives you a new, more exciting kind of error: a floating point exception!
Well by now your first instinct should be to **run the program in gdb
to see what’s wrong.
- Recompile it with debug info by adding the
-g
flag:gcc -g -o lab5 lab5.c
- Run it in
gdb
:gdb ./lab5
,run
Program received signal SIGFPE, Arithmetic exception.
0x00005555555551c7 in fun () at lab5.c:10
10 c = a / b;
That’s weird, didn’t it say “floating point exception before?” I don’t see any floats here. Well, SIGFPE
is sent for any kind of arithmetic error, integer or floating point. (We’ll talk about signals like SIGFPE
soon.)
I think it’s pretty obvious why this is crashing, but let’s learn a new command that can show you the contents of the local variables: info locals
:
(gdb) info locals
a = 5
b = 0
c = 52
Well well well. b
is 0. We can’t divide by 0 now can we?
Exit gdb, fix the code (change b
to something else), recompile, and rerun.
2. Things are still bad!
When we run the program now, it says Result is <something>
without crashing. But then… we crash again. Good old segfault. If we run it in gdb
, there is where it’s crashing:
Program received signal SIGSEGV, Segmentation fault.
0x0000555555555256 in less_fun () at lab5.c:22
22 printf("*q = %d\n", *q);
Let’s print out the locals again.
Notice – this pointer is definitely not NULL, but it crashes anyway, since it’s definitely not valid.
(gdb) info locals
p = 0x5555555596b0
q = 0x2d
Uh. Ok. That’s weird. q
doesn’t look like a proper pointer value.
Let’s watch less_fun
crash in slow motion. We can do this by setting a breakpoint on it. This tells gdb
“pause right before this thing gets executed.”
(gdb) break less_fun
Breakpoint 1 at 0x5555555551f5: file lab5.c, line 15.
Now we can restart the program with the r
(run
) command again.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /afs/pitt.edu/home/j/f/jfb42/private/cs0449/examples/lab5
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Result is 2
Breakpoint 1, less_fun () at lab5.c:15
15 int* p = malloc(sizeof(int) * 4);
Okay. It says we’re on the malloc
line, which means that is the line about to be run. It hasn’t run yet.
If we use the n
(next
) command, it will run that line of code and go to the next line. There’s a loop there so it’s a little bit annoying, since it has to go through the loop 4 times:
(gdb) n
16 for(int i = 0; i < 4; i++) {
(gdb) n
17 p[i] = 50 * (i + 1);
(gdb) n
16 for(int i = 0; i < 4; i++) {
(gdb) n
17 p[i] = 50 * (i + 1);
etc.............
But eventually we get to:
(gdb)
19 *p = 45;
(gdb) n
20 int* q = *p;
Okay. Now p
has been assigned, the array has been filled in, and then *p
has been assigned. Let’s print those out with… print
, or p
for short!
(gdb) print p
$1 = (int *) 0x5555555596b0
(gdb) print *p
$2 = 45
You can use any C expression with print
. For example I could even write this:
(gdb) print p[2]
$3 = 150
Amazingly, you can even use print
/p
to call functions. Like sbrk
. Boy that might be useful for the project, huh?
(gdb) p sbrk(0)
$5 = (void *) 0x55555557a000
Back on track. Let’s use n
to run the int* q = *p;
line. Then, let’s print q
.
(gdb) n
21 if(q != NULL) {
(gdb) p q
$6 = (int *) 0x2d
Uh… okay. Wait. 0x2d
. What is that in decimal? We can use p/d
to print it in decimal.
(gdb) p/d q
$7 = 45
Oh!!!!!!!! 45. 45 is the value we put in *p
. Ohhhh that’s what that compiler warning was about “makes pointer from integer” blah blah ohhhh okay. Okay. Okay.
Now you know what’s wrong, and you can fix that bug.
Uh oh. There’s another bug now.
3. The last bug
The last bug is happening here:
Program received signal SIGSEGV, Segmentation fault.
0x00005555555552f4 in list_fun () at lab5.c:43
43 printf("%d -> ", t->value);
Ooh, list_fun
. Sounds like some list fun.
Let’s look at the locals with info locals
(or i lo
for short):
(gdb) i lo
t = 0x555555559
c = 0x5555555596b0
b = 0x5555555596d0
a = 0x5555555596f0
Huh… that’s weird. a, b, c
look relatively normal, but t
looks a little… off.
If you don’t wanna switch over to your editor, you can actually have gdb
list your code for you. Try list fun
, then just hit enter a second time (that usually repeats the last command in gdb
):
(gdb) list list_fun
27 typedef struct Thing {
28 struct Thing* next;
29 int value;
30 } Thing;
31
32 void list_fun() {
33 Thing* c = malloc(sizeof(Thing));
34 Thing* b = malloc(sizeof(Thing));
35 Thing* a = malloc(sizeof(Thing));
36 a->value = 10;
(gdb)
37 a->next = b;
38 a->value = 20;
39 b->next = c;
40 a->value = 30;
41
42 for(Thing* t = a; t != NULL; t = t->next) {
43 printf("%d -> ", t->value);
44 }
43 is the line that’s crashing. Now we have a better idea of what’s going on. We can see the struct, and then this function that creates a singly-linked list of that struct on the heap. Looks like a->next
is b
, and b->next
is c
.
If we want to look at the list in gdb
, we kinda can! You can dereference a pointer in a print
command, and if it’s a struct, gdb
prints out all the fields of the struct for you:
(gdb) p *a
$2 = {next = 0x5555555596d0, value = 30}
Hey, that’s pretty cool. So what’s *a->next
?
(gdb) p *a->next
$3 = {next = 0x5555555596b0, value = 0}
(gdb) p *b
$4 = {next = 0x5555555596b0, value = 0}
It’s the same thing as *b
.
What about *c
?
(gdb) p *c
$5 = {next = 0x555555559, value = 0}
Hey, there’s that weird address from the segfault. That’s what t
is.
(gdb) p t
$6 = (Thing *) 0x555555559
Wait a second. c
is the tail of the list, right? Shouldn’t its next
be NULL
?
Let’s do something a little fancy. Instead of editing the program right away, here’s what we’ll do:
- Set a breakpoint on line 42 by doing
b 42
.- hey, that would have been nice in the previous function to break after the loop, huh?
- Restart the program with
r
. - The loop has not run yet. If you
p c->next
right now, you get that weird address. - Do
p c->next = 0
- Yes: you can even do assignments to change variables in
gdb
.
- Yes: you can even do assignments to change variables in
- Finally, use the
continue
(c
) command to resume execution of the program.
30 -> 0 -> 0 ->
[Inferior 1 (process 2803610) exited normally]
Look at that. It printed out the list and exited normally! …sorta! Didn’t the code try to make the list contain 10, 20, and 30? Looks like there’s the worst kind of bug: a copy-and-paste bug 😱
Now of course my modification to c->next
didn’t change the source code. I’d have to insert c->next = NULL
in my code to actually fix that bug. But it’s neat that you can even try out attempted bugfixes while running the program, huh?
Common commands
Now you can play with gdb
some more. Try it on your previous labs, or use it while you work on project 2!
You can learn more about all of the commands by typing help
, or on a specific command by typing help command_name
e.g. help bt
Command | Shortcut | Description |
---|---|---|
help |
Get help on a command or topic | |
apropos |
Search the help for a term | |
set args |
Set command-line arguments (alternative to gdb --args ) |
|
run |
r |
Run (or restart) a program |
quit |
q |
Exit gdb |
break |
b |
Place a breakpoint at a given location |
continue |
c |
Continue running the program after pausing |
backtrace |
bt, back |
Show the function call stack |
where |
Same as backtrace |
|
next |
n |
Go to next line of source code (doesn’t follow calls) |
step |
s |
Go to next line of source code (follows calls) |
nexti |
ni |
Go to the next instruction (doesn’t follow calls) |
stepi |
si |
Go to the next instruction (follows calls) |
print |
p |
Print the value of an expression written in C notation |
x |
Examine the contents of a memory location (pointer) | |
list |
l |
List the source code of the program |
disassemble |
disas |
List the assembly code of the program |
Important takeaways
- If your program crashes, your first instinct should be to run it in
gdb
to find out where it’s crashing.- DON’T GUESS WHERE YOUR PROGRAM IS CRASHING.
- Often it’s not where you think.
- If you’ve checked a pointer for NULL, that’s not necessarily a guarantee that it’s valid.
- The
print
/p
command is super useful and powerful.
Submission
You don’t have to submit any code. Just go on Gradescope and “do” lab 5. :)