Exam 1 Review Day Notes

Please read this page about taking my exams!

Exam format

When/where
- During class, here, like normal
- 75 minutes
- it is not going to be “too long to finish”
Length
- 3 sheets of paper, double-sided
- there are A Number of Questions and I cannot tell you how many because it is not a useful thing to tell you because they are all different kinds and sizes.
  - But I will say that I tend to give many, smaller questions instead of a few huge ones.
Topic point distribution
- More credit for earlier topics
- Less credit for more recent ones
- More credit for things I expect you to know because of your experience (labs, projects)
- Only on lectures 1 through 11 inclusive
Kinds of questions
- A few multiple choice and/or “pick n“ (but not many)
- Some fill in the blanks
  - mostly for vocabulary
  - or things that I want you to be able to recognize, even if you don’t know the details
- Several short answer questions
  - again, read that page above about answering short answer questions!!
- No writing code from scratch, but:
  - tracing (reading code and saying what it does)
  - debugging (spot the mistake)

Things people asked about in the reviews

This is a list of what people asked about. The exam may have other topics not listed, and some of these topics may not appear on the exam.

Pointers

pointer types look like T* and are read right-to-left: “a pointer to a T”
pointers are created, confusingly, with the address-of (ampersand) operator: &var
- for any T, if var is of type T, then &var is of type T*
- so using & on an int gives you an int*; using it on an int* gives you an int** etc.
when you assign a pointer variable, you are changing where it points
if you want to change the value that it points to, you use the value-at, or dereference operator (asterisk)
- for any pointer p of type T*, *p is of type T - the dereference operator “strips a star off”
- so if you have an int* p, then *p is of type int. you can get an int out of *p or put an int into *p
there are two other “hidden” dereference operators in C:
- p[n] == *(p + n) (array indexing is just pointer arithmetic plus a dereference)
- p->f == (*p).f (pointer-to-struct field access with -> is just a dereference plus .)
- all three of these - *p, p[n], p->f - can cause UB, crashes, etc. if p is pointing Somewhere Bad

example:

int x = 10;
int y = 20;
int p = &x; // p points to x
//   ________
// x |  10  |<--+
//   |------|   |
// y |  20  |   |
//   |------|   |
// p |    --|---+
//   |------|
printf("%d\n", *p); // "print the value-at p" - prints 10, cause that's what p points to
p = &y; // changes where p points; now it points to y
//   ________
// x |  10  |
//   |------|
// y |  20  |<--+
//   |------|   |
// p |    --|---+
//   |------|
(*p)++; // increments the value that p points to; increments y to 21
//   ________
// x |  10  |
//   |------|
// y |  21  |<--+
//   |------|   |
// p |    --|---+
//   |------|
printf("%d\n", y); // prints 21

Undefined Behavior (UB)

C is specified in kind of a weird way. there is a small set of things that are guaranteed to work correctly every time, and anything outside that set is considered undefined behavior (UB)
an operation that causes UB can do different things depending on…
- which OS you’re using
- which version of that OS you’re using
- which compiler you’re using, and which version of it
- which flags you pass to the compiler
- which CPU architecture you use, which version, which brand, etc. etc. etc.
- randomly, based on how the OS lays out your program’s memory space when running your program
many (but by no means all) instances of UB occur when dereferencing a pointer (*p, p[n], p->x)
- remember that there are valid pointers (which point to valid areas of memory); NULL pointers (which point to memory address 0); and invalid pointers (which aren’t NULL, but don’t point to valid memory areas either)
if you do *p to get the value at an invalid pointer, it could…
- crash (segfault, alignment error, bus error, etc)
- appear to work properly
- give you some arbitrary value
- give you some secret value that you shouldn’t have access to
if you do *p = x to set the value at an invalid pointer, it could…
- crash (segfault, alignment error, bus error, etc)
- appear to work properly
- change some variable that it shouldn’t be possible to change
- mess up the activation records for one or more functions, causing erratic behavior or a crash later on
- mess up the data structures of the heap allocator, causing erratic behavior or a crash on the next malloc/free

Pointer arithmetic (you should def know this)

again, p[n] is shorthand for *(p + n)
but p + n is weird. p is a pointer (any pointer), and n is an integer
it calculates an address by:
- starting at p
- implicitly multiplying n by sizeof(*p) (the size of one “thing” that p points to)
- adding that to p
for example if you have double* p pointing at some array of doubles…
- p + 0 is the address of item 0 of the array.
  - p + 0 == p, cause duh
  - p + 0 is also the exact same thing as &p[0] - “the address of item 0 of the array p”
- p + 1 is an address 8 bytes after p, which is item 1 of the array
  - because sizeof(*p) == sizeof(double) == 8, and 8 x 1 = 8
- p + 2 is an address 16 bytes after p, because 8 x 2 = 16
that implicit multiplying step is called “scaling” and can trip you up on project 2!
- if you have a Header* h and you add e.g. sizeof(Header) + size to it…
- well, sizeof(*h) == sizeof(Header), and n == sizeof(Header) + size here, so…
- you will actually be adding sizeof(Header) * (sizeof(Header) + size) bytes to the address!
- this is why I gave you PTR_ADD_BYTES(p, offset)! it adds a number of bytes to p without the scaling.

Passing arguments by value versus by reference

passing arguments by value is the “normal” way we pass them.
when you call a function and pass by value, it copies the values into the arguments of the callee (and arguments are just local variables, so passing arguments is like assigning into the argument variables).

void my_function(int x) { // passing by VALUE (or "by COPY")
	// this x is different from the x in main.
	// modifying it only affects this variable, not main's.
    x = 10;
    printf("%d\n", x); // prints 10
}

int main() {
    int x = 20;
    my_function(x); // this *copies* the value 20 into my_function.
    printf("%d\n", x); // prints 20
    return 0;
}

passing arguments by reference means giving the callee a pointer to a variable, which allows the callee to change a caller’s variable.
essentially the caller is letting the callee “borrow” the variable for a bit.
the pointer variable itself is still local to the callee, but the thing it points to belongs to someone else.

void my_function(int* p) { // passing by REFERENCE
    *p = 10; // dereferencing - changes main's x!
    printf("%d\n", *p); // prints 10
}

int main() {
    int x = 20;
    my_function(&x); // &T ==> T*
    printf("%d\n", x); // prints 10!
    return 0;
}

Returning pointers to locals (why it’s bad)

Before every function starts running, it pushes an activation record onto the stack, which contains all of its local variables.
- This is why you can get the addresses of locals - because they are physically in memory, on the stack
Before every function returns (stops running), it pops that AR back off
At any given time, the stack pointer (sp) is pointing to the most-recently-pushed AR
Everything above the stack pointer is ARs that belong to currently-executing (or currently-waiting) functions
Everything below the stack pointer is memory that is technically accessible on most implementations but which you should not access in any way because it is UB - it COULD crash, give you garbage, give you the right value…
This is why returning pointers to locals (including local arrays) is bad. e.g.

int* my_function() {
    int x = 500;
    // have to do this to trick gcc into compiling
    int* p = &x;
    return p;
}

void another_function() {
    // ooh it has variables
    int a = 10, b = 20, c = 30;
}

int main() {
    int* p = my_function();
    // at this point, p points to a region of the stack
    // that is BELOW the sp. if you printed out *p now,
    // it would *likely* print out 500, but you are not
    // guaranteed *anything* about the validity of doing
    // it; it is UB.

    // if we then call another function...
    another_function()

    // ...now we have *no* idea what this will print,
    // because another_function reused the stack space
    // that p is pointing to.
    printf("%d\n", *p);
    return 0;
}

Struct padding

typedef struct {
    int x;    // 4 bytes
    char c;   // 1 byte
    double d; // 8 bytes
} MyStruct;

// if we print out sizeof(MyStruct) you might expect it to be 13, but it's actually 16.

you do not need to know the details of the rules that the compiler uses to insert struct padding.
- if you are curious: every field must appear at an offset that is a multiple of its alignment (which is actually different from its sizeof but I don’t wanna get into it); and the entire struct’s sizeof must be a multiple of the maximum alignment of all of its fields.
but you do need to know why padding exists and to be careful about it
- padding exists to preserve the alignment of the fields
- alignment means a value that is n bytes long must exist at an address that is a multiple of n
  - like, the actual numerical address must be a multiple of n
  - 4-byte values can only exist at addresses that are multiples of 4
    - so the address in hex ends in 0, 4, 8, or C
  - 8-byte values (like double) can only exist at addresses that are multiples of 8
    - so the address in hex ends in 0 or 8
- alignment is important because some platforms crash your program if you don’t respect it (e.g. MIPS), and on other platforms, there can be a performance penalty for breaking alignment (e.g. x86)
the other annoying thing is that different platforms have different rules about alignment, and therefore different C compilers can put different amounts of padding in your structs when compiling the same code on different compilers/computers
- this means that sizeof(MyStruct) can be wildly different on different platforms!
- therefore you have to be extremely careful about e.g. writing and reading structs to and from files or sending them over networks
- only in some very specific cases (e.g. proj1 where you had the Pixel struct) can you safely do it, because there’s no way for the layout to be different on any platform
  - I think. I’m like 99% sure. lol.

`enum`

a way of declaring a collection of (typically) related integer constants
they just declare constants. that’s it.
the underlying type of an enum is implementation-defined
- which means that different compilers can choose different underlying types for that enum depending on the values that are in it
- e.g. on gcc - if all the values are >= 0, the underlying type is unsigned; otherwise, it’s signed
  - and that value may be a char, or a short, or an int, or a long
  - and you don’t know which
enums are declared and used very similarly to structs - they have a “tag name” which you would refer to as enum Tag, and they are often typedefed to avoid having to use the enum keyword everywhere:

typedef enum {
	// by default, enum values are integers starting at 0 and increasing.
	// so A == 0, B == 1, C == 2
	// but you can set them to whatever you want, by writing:
	// 	A = 5, B = -17, C = 494
    A, B, C
} E; // E is now a typedef for whatever underlying integer type the compiler chose for this enum

int main(int argc, char const *argv[])
{
    E e = A; // there is no namespace, you don't write E.A, just A
    e = B;
    e = C;

    // the compiler doesn't prevent you from doing this, but it's not good.
    // on gcc, E is given an unsigned type, and this line actually puts 4294967295
    // into e!
    e = -1;

    return 0;
}

Using the heap in C

C makes the programmer manage heap memory. this is tricky.
here are some rules for using the heap:
1. you should check if malloc returns NULL (meaning out of memory)
  - for many programs it might just mean printing an error message and quitting.
2. you must call free on everything you malloced
  - though when the program exits, this is essentially done for you, so for short-running programs it may be fine to not call it
3. you must never call free more than once on the same pointer
  - cause this will corrupt the heap (see below)
4. you must never access heap memory that has been freed
  - basically for the same reasons as returning pointers to locals - that memory is no longer alive! you don’t own it anymore!
if you don’t free a piece of heap memory, and you lose all the pointers to it, that is a memory leak: neither the user program nor the heap allocator know that that memory is done being used, so it sticks around taking up space forever
- well, not forever. just until the program ends. that’s the only way to free leaked memory.

The heap allocator and how it works

you are currently working on project 2, so I don’t expect you to know all the low-level concrete details of managing the heap (e.g. the exact pointer arithmetic calculations needed to put a header in the middle of an existing block when splitting, or the sequence of operations needed to link/unlinke a node in a doubly-linked list)
- and those are just implementation details anyway
but I do expect you to know what the heap allocator is, what it does, what its responsiblities are, and the data structure we use to represent the heap (at least, the one that we learned about… there are others)
the heap allocator is the part of the standard library that implements malloc() and free()
its responsibilities are:
1. to keep track of which regions of the heap memory are used and which are free for reuse
2. to allocate memory for the user when requested, either by reusing some free memory, or by asking the OS for more heap
3. to free memory for the user when requested, by marking that memory free for reuse in a future allocation
  - linked list, splitting and coalescing, pointer arithmetic
the data structure we learned about managing the heap is a doubly-linked list
- each region of the heap is a block that consists of a header (small, fixed size, used by the heap allocator) and the data (variable size, used by the user)
- the entire heap is a contiguous list of blocks, linked together into a doubly-linked list
- each block knows if it’s used or free, and how many bytes it is - this satisfies responsibility 1 above
- to allocate (responsibility 2):
  - the allocator looks for a block to reuse with some algorithm (see below)
  - if it found a reusable block, it marks it used and gives the user a pointer to the data part. (see below for splitting)
  - if it didn’t, it asks the OS for more heap, appends that new block to the end of the heap, and gives the user that
- to deallocate (responsibility 3):
  - the allocator marks the block as free.
  - that’s all it has to do, but for better performance, see below.
reuse selection algorithms
- when the allocator is looking for a block to reuse, there are a number of algorithms that can be used.
- first-fit: reuse the first block on the heap whose size is >= requested.
- next-fit: remember the last-allocated block. instead of starting at the start of the heap, you start looking after that block. if you get to the end of the heap, you wrap around to the beginning of the heap and keep looking. other than that, same as first-fit: you reuse the first block that you find whose size is >= requested.
- best-fit: reuse the *smallest free block on the heap whose size is >= requested.
- worst-fit: reuse the biggest free block on the heap whose size is >= requested.
- quick-fit: instead of keeping one list of blocks, we keep several lists of blocks, categorized by size. this way, we only consider a few blocks for each allocation instead of the entire heap
splitting
- if a block is selected for reuse, it may be beneficial to split it into two smaller blocks
  - e.g. if the block selected is 1000 bytes, and the user only asked for 100, then giving them the entire 1000 byte block would be wasting 900 bytes to internal fragmentation
- conceptually splitting is simple: just cut the block into two parts, give the user one part, and keep the other part as a smaller but still free block. that’s all I care about you knowing for the exam.
coalescing
- the opposite of splitting.
- when the user frees a block, it may be next to other freed block(s).
- in that case, it makes sense to coalesce or merge the adjacent free blocks into a single, larger free block.
  - because larger blocks are easier to reuse.
- conceptually coalescing is simple: you just remove the boundaries between any adjacent free blocks, giving you a single, larger free block.

Scope, Lifetime, Ownership

no one asked about this but hm maybe you should know this 🙃
scope is “where a name can be seen
- in most C-like languages, local variable scope lasts from the declaration until the enclosing close-brace }
lifetime is the span of time from when a piece of memory is allocated to when it’s deallocated
- the lifetime of local variables is from the beginning of a function (when the AR is pushed) to the end of the function (when the AR is popped)
- the lifetime of heap memory is from when it is malloc()ed until when it is free()d.
  - the programmer controls the lifetime of heap memory.
- remember that the lifetime of a piece of heap memory is not the same as the lifetime(s) of the variable(s) that point to it
  - e.g. if I have int* p = malloc(10); as a local variable, p’s lifetime is like any other local variable - deallocated at the end of the function. but the lifetime of the memory that p points to only ends when I call free(p)
ownership is about who decides when it’s okay to deallocate (i.e. “end the lifetime”) of some piece of memory.
- locals are owned by their function. when the function returns, they’re no longer needed, so it’s okay to deallocate them by popping them off the stack.
- in C, heap memory is owned by… you. the programmer. you are responsible for deciding when it’s okay to deallocate every piece of heap memory.
  - sometimes it’s really easy and straightforward
  - many times it’s kind of fuzzy…
  - sometimes it’s extremely hard to know when it’s okay.
- in GC’ed (garbage collected) languages like Java and Python, heap memory is owned by the GC.
  - it uses Fun Graph Algorithms to determine when heap memory is unreachable by the user program, and anything unreachable is safe to deallocate (because there’s no way for the user program to ever use it again!)

⬅ Exam 1 Review Day Notes

plus some more stuff!

Exam format

Things people asked about in the reviews

Pointers

Undefined Behavior (UB)

Pointer arithmetic (you should def know this)

Passing arguments by value versus by reference

Returning pointers to locals (why it’s bad)

Struct padding

`enum`

Using the heap in C

The heap allocator and how it works

Scope, Lifetime, Ownership