Exam 1 Study Guide

Please read this page about taking my exams!

Exam format

When/where
- During class, here, like normal
- 75 minutes
- it is not going to be “too long to finish”
- no calculator
Closed-note
- You may not have any notes, cheat sheets etc. to take the exam
- The open-note thing was just for when we were remote
Length
- 3 sheets of paper, double-sided
- there are A Number of Questions and I cannot tell you how many because it is not a useful thing to tell you because they are all different kinds and sizes.
  - But I will say that I tend to give many, small questions instead of a few huge ones.
Topic point distribution
- More credit for earlier topics (e.g. numerical representation, memory addresses, arrays, MIPS programming)
- Less credit for more recent ones
- More credit for things I expect you to know because of your experience (labs, exercises)
- VERY ROUGHLY:
  - ~50% MIPS
  - ~25% understanding memory
  - ~25% numeric representation
Kinds of questions
- Few or no multiple choice
- A few “pick n“ (but not many)
- Some fill in the blanks
  - mostly for vocabulary
  - or things that I want you to be able to recognize, even if you don’t know the details
- Application questions about numbers and arithmetic (i.e. math problems, basically)
  - Base conversion
  - Interpreting patterns of bits in different ways (signed, unsigned, etc)
  - Unsigned and signed (2’s complement) addition
- Several short answer questions
  - again, read that page above about answering short answer questions!!
- No writing code from scratch, but:
  - tracing (reading code and saying what it does)
  - debugging (spot the mistake)
  - interpreting asm as HLL code (identifying common asm patterns)
  - fill in the blanks (e.g. picking right registers, right branch instructions)
  - identifying loads and stores in HLL code

Things people asked about in the reviews

This is a list of what people asked about. The exam may have other topics not listed, and some of these topics may not appear on the exam.

CISC vs RISC
- CISC: Complex Instruction Set Computer
  - made for humans to write programs directly in assembly
- RISC: Reduced Instruction Set Computer
  - made for compilers to produce assembly/machine code from high-level languages
- The differences are really in the name:
  - CISC has complex, flexible, multi-step instructions that are great for humans (do more stuff with fewer instructions!) but terrible for performance
    - x86 is really the only CISC still in widespread use
  - RISC has reduced, simple, single-step instructions that are great for compilers (so easy to write algorithms to write RISC code!) but more awkward for humans to write
    - MIPS and Berkeley RISC were the first mainstream RISC architectures (and where the name RISC came from)
    - most architectures designed after MIPS works like MIPS (e.g. ARM)
Conversion from decimal to binary
- I presented one way on the slides, the “long division” method:
  - You have to know the binary place values
  - From MSB to LSB (left to right):
    - If the place value fits into the remainder, put a 1 and subtract it off the remainder
    - Otherwise put a 0
- There’s another method that involves repeatedly dividing by 2 until you get a quotient of 0, and you write every remainder even if it’s a 0, and the binary representation is the remainders read from top to bottom.
  - Try doing both methods to see what I mean, if you’re curious.
Conversion between hex and binary
- 4 bits = 1 hex digit (nybble)
- The table is simple - count up in binary from 0000 to 1111, and count up in hex from 0 to F next to it.
- When going from binary to hexadecimal, group the bits into 4 starting from the right (LSB)
  - add 0s to the left side as needed to make a group of 4 bits
  - then each group of 4 bits is 1 hex digit
Unsigned integers
- There are no negatives. It’s in the name: unsigned = NO SIGN.
- To convert to decimal, add up the place values for each 1 bit.
- You, the programmer, decide when an integer is unsigned. It’s then up to you to use the appropriate unsigned versions of things like addu, bltu, lbu/lhu, “print unsigned integer” syscall, etc.
Sign-magnitude
- Is NOT used for integers, it’s used for floats
- Is also how we write numbers on paper. +123 and -123: same digits, different sign.
- The MSB is the sign, 0 for positive, 1 for negative, and is totally separate from the rest of the bits
- Downsides: two “versions” of 0 (+0 and -0); arithmetic is more complicated (special cases, just like we learn in school)
- To negate: just flip the sign bit. The rest of the digits are unchanged.
2’s complement integers
- The one and only system used to represent signed integers on computers today
- It works by making the MSB the negative version of its place value
  - The MSB also represents the sign - 0 for positive, 1 for negative
- This representation is great because it makes arithmetic super simple, no special cases
  - You can add any two numbers of any signs and it will Just Work (unless there’s overflow (“going off the ends of the number line segment”))
- Downside: there is one more negative number than positives, and it is A Bit Weird (it has no positive counterpart, so if you negate it, you get the same value back out).
- To convert to decimal, you still just add the place values up. e.g. 1001 is -8 + 1 = -7.
- To negate: -x == flip(x) + 1, or, “flip the bits, then add 1.”
  - The negative of a number is also called its “2’s complement.”
Addition and subtraction
- Binary addition works just like in base 10, but you carry at 2 instead of 10.
- The same addition algorithm is used for both unsigned and signed integers.
- Remember, when adding 2’s complement numbers, nothing special happens. You just add the bits and you will get the correct value/sign at the end.
- Subtraction is defined in terms of addition:
  - x - y == x + (-y)… and because of how 2’s complement works…
  - x + (-y) == x + (flip(y) + 1)
  - amazingly, this works for signed and unsigned subtraction! the 2’s complement of x (flip(x) + 1) “behaves like” its negative, even in a number system that has no negative numbers. remember: two ways around the number circle.
Extension
- Going from a smaller number of bits to a bigger number of bits while preserving the value
  - e.g. the number 5 can be represented as 0101 binary, or as 0000 0101 binary - same value, but the second one has more bits
- There are two flavors of extension:
  - Zero-extension is for unsigned numbers and puts 0 bits on the left side of the number
  - Sign-extension is for signed numbers and puts copies of the sign bit (MSB) on the left side of the number
    - importantly, this happens in binary, not hex!
    - so if you have a number like 0xC4…
      - the MSB is 1 (because hex C is binary 1100).
      - it gets sign extended in binary and looks like 1111 1111 1111...
      - in hex that looks like 0xFFFFFFC4
      - it’s not hex 0x111111C4 (that would be binary 0001 0001 0001...).
Truncation
- Going from a larger number of bits to a smaller number
  - You can also look at it as “erasing bits on the left side of the number”
- There is only one kind of truncation, doesn’t matter if it’s signed or unsigned.
- However, truncating too far can change the value
  - e.g. If you have 0001 0010 (decimal 18), and truncate it to 5 bits, you get 1 0010 (still decimal 18)… but if you keep going and truncate to 4 bits, you get 0010 (decimal 2)!
  - This is actually performing modulo: truncating to n bits gives you the value modulo 2ⁿ. If the number is less than that, it’ll be preserved; if it’s bigger, it’ll be changed.
Control flow
- Conditional branches go to the label if their condition is true (satisfied)
  - e.g. beq t0, 10, _label says “if t0 == 10, then go to _label, else go to the next line”
- So there is a mismatch between the way we write conditions in ifs in Java vs. how they work in asm
  - When writing ifs in asm, you usually have to invert the condition
  - Because in asm, the condition is really testing “when do we skip the contents of the if”
- However, you don’t always invert the condition. E.g. do-while, “simple” for loops
  - These test the condition at the end of the loop, so we do want to go backwards when the condition is true
MEMORY
- Alignment
  - The address of an n-byte value must be a multiple of n.
    - e.g. words are 4 bytes. so, their addresses are multiples of 4 (0x00, 0x04, 0x08, 0x0C, 0x10, 0x14, 0x18, 0x1C,...)
    - BUT THAT WAS JUST AN EXAMPLE. words are 4 bytes, but not everything is.
  - There are underlying hardware design reasons for this, but some architectures (like MIPS) will crash your program if you don’t respect this rule.
- Zero/sign extension
  - when you load a value < 32 bits (byte, half) into a 32-bit register, have to extend it
    - want to preserve the same value, just represent it with more bits
  - lb/lh does sign extension (copies sign bit (0 OR 1) to left)
  - lbu/lhu does zero extension (fills extra bits with 0s)
  - No extension happens with lw because you’re loading a 32-bit value into a 32-bit register - same size
- Truncation
  - When you store into a half/byte variable, only the least significant bits (rightmost bits) of the register are stored
  - The rest are truncated (cut off)
  - sb stores the 8 least significant bits of the register in memory and leaves 24 behind
  - sh stores the 16 least significant bits of the register in memory and leaves 16 behind
- Endianness, discussed below
- Does the CPU crash if you lw a byte or lb a word?
  - NO! the only thing it will crash for is address misalignment, and that only happens with loads/stores larger than a byte.
  - Otherwise it assumes you know what you’re doing and does exactly what you tell it to do.
Endianness
- it is a rule which is used to decide the order of BYTES
  - when going from things bigger than a byte to bytes
  - or vice versa.
- it comes up in…
  - memory (cause it’s an array of bytes)
  - files (also arrays of bytes)
  - networking
- big endian stores the big end (most significant byte) first.
  - “read it in order”
  - 0xDEADBEEF is stored in memory as 0xDE, 0xAD, 0xBE, 0xEF
- little endian does the opposite, stores the least significant byte first.
  - “swap the order”
  - 0xDEADBEEF is stored in memory as 0xEF, 0xBE, 0xAD, 0xDE
  - Notice that we don’t swap the hex digits or the bits, we swap the order of entire bytes
- but 1-byte values and arrays of 1-byte values are not affected by endianness
  - because they aren’t chopped up when loading or storing
Accessing arrays in MIPS
- An array is multiple variables of the same type and size, equidistantly spaced apart in memory
  - E.g. arr: .word 1, 2, 3 is 3 words/12 bytes of memory; each item of the array is 4 bytes apart because a word is 4 bytes.
- The address of A[i] is A + S×i where:
  - A is the address of the array (in asm, the label is the address)
  - S is the size of one item in bytes (so for .word it’s 4, .byte it’s 1, etc)
  - i is the index you want to access (in asm, typically a register)
- The “long form” of array access looks like:
```
  # ASSUMING that s0 is the index (maybe we're in a for loop and s0 is the loop counter):
  la  t0, arr    # t0 = address of arr
  mul t1, s0, 4  # t1 = s0 * 4
  add t0, t0, t1 # t0 = address of arr + (s0 * 4)

  # Now you can load/store using (t0) as the address
  lw  a0, (t0)   # a0 = arr[s0]
```
- The “short form” folds the la and add into the lw instruction, but you still have to multiply the index:
```
  mul t1, s0, 4   # t1 = s0 * 4
  lw  a0, arr(t1) # a0 = arr[s0]
```
- The stupid arr(t1) syntax means “add the address of arr and t1 together, and use that as the address to load from”
ATV Rule
- Any function is allowed to change the A, T, V registers at any time for any reason! :))))))
- but the consequence is that a caller cannot assume that the a, t, or v registers have the same values after a jal as they did before it.
- So every time you jal, on the line after jal, you have no clue what is in any of the a, t, or v registers - only the s registers’ values will be the same as they were before the call.
  - you just can’t read the value out of those registers anymore. they’re not like, poisoned or something. you can use the same register before and after a jal for different purposes.
- This is a scary-sounding rule, but it gives you the freedom to:
  - Use any a, t, or v register at any time for any purpose
    - yes! go ahead and use t0 everywhere! everyone is allowed to use it! :DDDDDD
  - Use the a, t, and v registers without having to “ask permission” or “put them back the way they were” by pushing and popping them
    - you never have to push or pop any of them.
Function call mechanism in MIPS
- jal func does two things:
  - sets ra = pc + 4 (pc is pointing at the jal, so pc + 4 is the instruction after it)
  - sets pc = func (whatever its address is)
- jr ra does one thing:
  - sets pc = ra (where ra was the address of the instruction after the jal that jal set up for us)
- this is all these instructions do!
- there is also only one ra register. what this means is: you can only go one function call deep.
  - if main calls fork, and fork calls knife…
  - then the jal knife overwrites the value that was in ra
  - meaning we will be able to get back to fork from knife, but we will get stuck in an infinite loop when we try to return to main from fork
- to solve this, we make every function push ra at the beginning, and pop ra at the end
  - this way, every function’s return address goes on the stack, where it’s safe (because there are lots of stack slots)
The stack
- A region of memory that contains information about function calls
- Pushing puts a value on top of the stack, popping removes a value from the top of the stack
- A stack is a perfect match for the way function calls work:
  - Whenever a function is called, it pushes its activation record (AR) -saved registers, local variables in HLLs
  - Whenever a function is about to return, it pops the AR
  - ARs are removed in the opposite order from when they are created, so stack is exactly what is needed
  - In-progress functions’ data is safe on the stack (in memory)
- The stack is necessary to make recursive functions work:
  - Every time a recursive function calls itself, a new copy of its local variables is pushed
  - So there can be multiple activation records for the same function on the stack at the same time, each with different values for the local variables
Calling Convention
- Honor system used to let multiple functions work together
  - Remember that all functions share the registers so this is important!
- Makes them agree on:
  - How arguments are passed from caller to callee
  - How values are returned from callee to caller
  - How control flows from caller to callee, and then back again
  - What goes on the stack
  - Who is allowed to use which registers, and for what purposes
  - Which registers must be preserved across calls, and which can be trashed
- In MIPS, part of this is the s register contract
  - If you want to use some s register sx, you:
    - push sx at the beginning of the function that wants to use it
    - pop sx at the end of the function that wants to use it
  - By following this protocol, it’s as if every function gets its own s registers
    - But everyone has to follow the protocol, or the guarantee is gone!

⬅ Exam 1 Study Guide

it's a guide, for studying

Exam format

Things people asked about in the reviews