Please read this page about taking my exams!
Exam format
- When/where
- Mon/Wed 1:00 PM section: Fri 12/13 at 8:30 AM in the normal classroom (5401 Posvar)
- yes we’ll start at 8:30 instead of 8:00 to give people a little more time to get here
- Mon/Wed 3:00 PM section: Tue 12/17 at 12:00 PM in the normal classroom (11 Thaw)
- Tue/Thu section: Mon 12/16 at 10:00 AM in 104 Lawrence Hall instead of Sennott
- Mon/Wed 1:00 PM section: Fri 12/13 at 8:30 AM in the normal classroom (5401 Posvar)
- Closed-note, no calculator
- You may not have any notes, cheat sheets etc. to take the exam
- The math on the exam has been designed to be doable either in your head or very quickly on paper (e.g. 2 x 1 digit multiplication); if you find yourself needing a calculator, you did something wrong
- Keep numbers in scientific notation, do not take them out of it until the end
- Avoid division when you can - do reciprocal first, then multiply by that
- I literally design the test questions to be easy to do reciprocals
- Length
- Very much like the first exam.
- 75 minutes
- Topic point distribution
- It is not cumulative, omg
- More credit for earlier topics (e.g. AND, OR, multiplexers)
- Less credit for more recent ones (e.g. microcode, pipelining)
- More credit for things I expect you to know because of your experience (labs, project)
- VERY ROUGHLY:
- ~30% Logic (combinational and sequential)
- ~40% CPU design
- ~25% Performance
- ~5% Other
- Kinds of questions
- Very much like the first exam.
Things people asked about in the reviews
Remember, these are just the things that people asked about. There may be topics on the exam not on this list; and there may be topics on this list that are not on the exam.
- Combinational logic
- Anything that doesn’t have memory (latches, flip flops, registers, RAM)
- Includes gates (AND, OR, NOT etc), plexers, arithmetic computations
- Boolean expressions/functions
- Boolean inputs, one (or more) boolean output
- (multiple boolean outputs are really separate expressions)
- Basically, if you can represent it as a truth table, it’s a boolean expression
- Turning a truth table into a boolean expression is extremely straightforward:
- find every row of the truth table where the output is 1
- for each of those, write a term that is all the input variables ANDed together, with bars (NOTs) on each variable that is 0 in that row
- OR all those terms together. you will get a “sum-of-products” expression that is like
Y = term + term + term...
- using the Engineering notation for these is very compact -
Y = AB + CD
or something
- using the Engineering notation for these is very compact -
- Boolean inputs, one (or more) boolean output
- Propagation delay
- Propagation delay is how long it takes for a signal to pass through some circuit
- Nothing moves infinitely fast in the real world, so there are limits on how quickly we can compute things
- The propagation delay of a sequential circuit’s critical path (longest series of operations that cannot be done in parallel) limits clock speed
- Ripple carry
- Method of implementing multi-bit addition where the carry-out of each bit becomes the carry-in of the next higher bit
- Simple to implement, but linear time in the number of bits
- Double the number of bits? Doubles the time
- When the inputs change, it will produce invalid results for a while, because the carries must “ripple” from LSB to MSB
- The critical path is from the LSB’s input to the MSB’s output - hence why it’s linear time
- HOW TO DETECT OVERFLOW
- AN OVERFLOW OCCURRED IF:
Addition Subtraction Unsigned MSB carry out is 1 MSB carry out is 0 (i.e. there is no carry out) Signed same sign inputs, different sign output same as addition, but after negating second input - For signed addition: you get an overflow only if you add two numbers of the same sign and get the opposite sign out (e.g. add two positives, get a negative)
- it’s totally possible to add two numbers of the same sign and not have overflow
- also if the inputs are opposite signs, then overflow is impossible.
- Remember that detecting overflow is only the first step.
- Once it has been detected, you can respond to it in 3 ways: store, ignore, fall on the floor (crash)
- in MIPS,
add/sub
crash on signed overflow, andaddu/subu
ignore all overflow. - not all architectures are this limited.
- in MIPS,
- Once it has been detected, you can respond to it in 3 ways: store, ignore, fall on the floor (crash)
- Responding to overflow
- After detecting an overflow occurred (see above), there are three possible ways in which the addition or subtraction instructions can respond:
- Store the extra bit into a special 1-bit carry register
- This can be checked after the addition or subtraction to see what happened
- Or it can be used as an input to a subsequent addition or subtraction to perform arbitrary-precision arithmetic, which lets you add or subtract numbers of any number of bits
- Ignore that an overflow occurred, and use the result truncated back to n bits
- This sucks and is the most popular way to respond because it’s Easy
- Fall on the floor (crash the program) instantly
- This lets the programmer/user know right away that something went wrong
- But in some high-reliability environments (e.g. aerospace) this might be a Bad Idea
- It really depends
- Store the extra bit into a special 1-bit carry register
- After detecting an overflow occurred (see above), there are three possible ways in which the addition or subtraction instructions can respond:
- Arbitrary precision arithmetic
- If you have a 32-bit CPU, and you want to add numbers > 32 bits, you are not out of luck
- If you want to add two 64-bit numbers for example, you:
- add the lower 32 bits of both numbers
- save the carry-out from that addition (the carry that comes out of the MSB)
- add the upper 32 bits of both numbers, plus the carry-out from the first addition
- Steps 2/3 can be repeated ad nauseam to add numbers of any arbitrary number of bits.
- Bitwise AND and its uses
- Two main uses:
- masking: isolating the lowest n bits of a number by ANDing with 2n - 1
- doing fast modulo by 2n by ANDing with 2n - 1
- Notice both of those are the same operation, just different interpretations
- Do not confuse bitwise AND (
&
, works on ints) with logical AND (&&
, works on booleans, is lazy)!
- Two main uses:
- Bit shifting
- Shifting left by n places is like multiplying by 2n
- Shifting left writes 0s on the right side of the number and then erases bits on the left side, which means it has a truncation “built in”
- Truncation can give you weird results if you lose meaningful bits!
- Shifting right by n places is like dividing by 2n
- Shifting right erases bits on the right side of the number, which forces you to add bits on the left side, which means it has an extension “built in”
- Because of that, there are two flavors of right-shift:
- Logical (unsigned) right shift
>>>
puts 0s to the left of the number - Arithmetic (signed) right shift
>>
puts copies of the sign bit to the left of the number
- Logical (unsigned) right shift
- Shifting left by n places is like multiplying by 2n
- Bitsets
- Simplification of bitfields where each field is 1 bit (0 or 1)
- Bits are numbered starting with bit 0 on the right side and increasing to the left
- (this is because bit numbers are the powers of 2 that they represent)
- To turn on bit n:
sets |= (1 << n)
- To turn off bit n:
sets &= ~(1 << n)
- note the~
in there - To test if bit n is 1:
if((sets & (1 << n)) != 0)
- do NOT use~
in there!
- Bitfields
- Given the specification for a bitfield, you can determine these for each field:
- Position: the low bit number (the one on the right)
- this indicates how far to shift left/right for encoding/decoding that field
- Size: high bit + 1 - low bit
- this is how many bits the field is
- Mask:
2^size - 1
(wheresize
is calculated in the previous point)- another way of thinking of it is writing
size
1 bits in binary - and then turn that into hex
- e.g. if size = 6, in binary that’s
11 1111
(6 1s in a row) - turn that to hex, it’s
0x3F
- another way of thinking of it is writing
- Position: the low bit number (the one on the right)
- Then, to decode (get a field OUT of an encoded bitfield):
- shift value right by position and AND with mask
- so,
field = (encoded >> FIELD_POSITION) & FIELD_MASK
- e.g. with a position of 7 and a mask of
0x3F
,field = (encoded >> 7) & 0x3F
- Finally, to encode (put fields together into an encoded bitfield):
- shift each field left by position, and or them all together
- e.g. with 3 fields it might look like
encoded = (A << 9) | (B << 7) | (C << 0)
- Given the specification for a bitfield, you can determine these for each field:
- Floats
- IEEE 754 standard is the only way floating-point numbers are encoded and manipulated on modern computers
- Based on binary scientific notation, e.g. +1.10101 x 26
- Represented in sign-magnitude, not 2’s complement
- Three parts of a number: sign, fraction, and exponent
- Sign is the MSB and follows same rule as ints (0 = positive, 1 = negative)
- Fraction is just the bits after the binary point, left-aligned
- e.g. if significand is
1.001
, then fraction is00100000....
(many 0s after it)
- e.g. if significand is
- Exponent: if you have 2n, n is encoded as an unsigned number n + k, where k is the bias constant
- The bias constant is given to you, e.g. for single-precision floats, k = 127.
- So for a
float
, an exponent of +6 is encoded as 127 + 6 = 133, as an unsigned integer.
- When should you not use floats?
- NEVER EVER EVER USE FLOATS TO REPRESENT CURRENCY/MONEY/FINANCIAL TRANSACTIONS. This is because floats use binary (base-2) fractions, and 1/10, 1/100, 1/1000 etc. are infinitely repeating fractions in binary.
- it is not a matter of “not having enough precision” or “the numbers get rounded off.” it’s that they are infinite in size and computers are incapable of representing infinitely sized values, it’s just a mathematical impossibility
- If you need to represent money, do it in base 10 - e.g. in Java, there is the
BigDecimal
class built in to do so.
- NEVER EVER EVER USE FLOATS TO REPRESENT CURRENCY/MONEY/FINANCIAL TRANSACTIONS. This is because floats use binary (base-2) fractions, and 1/10, 1/100, 1/1000 etc. are infinitely repeating fractions in binary.
- Sequential logic
- can remember things unlike combinational logic
- any memory (flip flop, register, RAM), or any circuit that contains any memory
- relies on the clock signal to tell the memory components when to update their contents
- combined with combinational logic to make finite state machines
- Latches, flip-flops, and registers
- a latch is the simplest circuit that can remember 1 bit of information
- (there are actually several kinds of latches, but we only looked at the RS Latch)
- a flip-flop is a latch surrounded by some extra circuitry which:
- makes it more stable and less prone to oscillation
- makes it work with the clock signal
- may also give it a write enable input
- a flip-flop is a 1-bit register
- an n-bit register is n flip-flops
- a latch is the simplest circuit that can remember 1 bit of information
- Multiplication and division
- NO YOU DON’T NEED TO MEMORIZE THE ALGORITHMS
- Multiplication is made of multiple additions
- Addition is commutative and associative,
- This means that the sub-steps of multiplication can be reordered and even done in parallel
- This gives us two practical multiplication algorithms:
- the slow, sequential, linear, grade-school multiplication algorithm is
O(n)
time (n = number of bits)- This is the algorithm implemented with the FSM, with 3 registers and an adder
- the fast, combinational, parallel multiplication algorithm is
O(log n)
time- This is the algorithm implemented as a tree of adders, no registers at all
- the slow, sequential, linear, grade-school multiplication algorithm is
- but this is a time-space tradeoff
- the linear time multiplier needs only
O(n)
1-bit full adders - while the logarithmic time multiplier needs
O(n^2)
1-bit full adders - double the number of bits, quadruple the space needed for the circuitry!
- the linear time multiplier needs only
- Division is made of multiple subtractions
- and subtraction is neither commutative nor associative
- which means the sub-steps of division must always be done in order
- therefore, division is always
O(n)
- yes, even if you guess with the SRT algorithm
- This is the algorithm that looks a lot like the multiplication FSM but remixed
- division is not slower “because of the remainder” or something. the remainder is calculated at the same time as the quotient.
- there are Logisim examples of all three things on the materials page!!
- FSMs (Finite State Machines)
- inputs, state, transition logic, outputs
- the transition logic determines the next state based on the current state and inputs
- the output logic determines the outputs from the current state (and optionally, also from the current inputs)
- this is the distinction between Moore and Mealy machines and I forget which is which but it’s not important for this class and you can look it up if you’re curious
- transition logic can be shown as either a state diagram (the nodes with arrows indicating transitions) or as a transition table (for each combination of state + inputs, show the “next” state)
- these two are equivalent - each arrow in the state diagram is a row in the transition table
- but the table is easier to mechanically translate into circuitry to implement the transition logic
- Parts of the CPU
- PC FSM controls the PC and lets it advance to next instruction, do absolute jumps, or do relative branches
- Absolute jumps just set the PC to some value (e.g.
PC = 0x80004004
) - Relative branches add a number to the current PC to move forward or backwards by a certain amount (e.g.
PC = PC + 12
) - Branches do not have a return address, I don’t know why so many people put this on the exam, branches make a choice and never return, you are somehow confusing them with function calls
- MIPS’s
jal
instruction is an absolute jump, but it also setsra
to the address of thejal
plus 4. totally different thing
- MIPS’s
- Absolute jumps just set the PC to some value (e.g.
- Instruction memory contains the instructions, and is addressed by the PC - corresponds to the
.text
segment of your program- This is where instructions are. Instructions are not “in” the PC FSM.
- Control decodes the instruction and produces all the control signals for the rest of the CPU
- Control signals are things like write enables and MUX/DEMUX selects - they control what the other components do.
- Register file is an array of general-purpose registers; typically we can read and write multiple registers simultaneously
- ALU is the Arithmetic and Logic Unit - performs arithmetic and logic (bitwise) operations - add, subtract, AND, OR, NOT, shifts…
- Data memory contains variables that you can load or store - corresponds to the
.data
segment of your program - Interconnect is all the wires and multiplexers that connect all of the above components together, so that data can be flexibly routed to different components depending on the instruction
- PC FSM controls the PC and lets it advance to next instruction, do absolute jumps, or do relative branches
- Phases of instruction execution
- Fetch: use PC to get the instruction from memory
- Decode: control decodes instruction and sets control signals
- eXecute: wait for ALU to do its work
- Memory: (only for loads and stores) do the load or store
- Writeback: (only for instructions that have a destination reg) put result the register file, not the memory
- How instructions are decoded/control the datapath
- the opcode identifies which instruction it is (
add, lw, beq,
etc) - in a single-cycle machine, the control takes that opcode, and combinationally produces all the various control signals for the CPU (e.g. ALU operation, register write enable, memory write enable, jump enable etc.)
- for example, an
add
instruction might…- ALUOp = add (makes the ALU add)
- ALUSrc = register (chooses the second input to the ALU)
- RegDataSrc = ALU (chooses what data to write into the register file)
- RegWrite = 1 (yes, we’re writing a value into the register file)
- MemWrite = 0 (no, we’re not storing a value into memory)
- and the
rd, rs, rt
signals come from the encoded instruction itself.
- the opcode identifies which instruction it is (
- Critical path + clock speed
- Critical path is the longest possible path through a circuit
- If it’s a sequential “loop-shaped” circuit, it’s the longest path “through the loop”
- Think of a race track with multiple routes
- The critical path is important because it’s the slowest operation that the circuit can perform…
- And therefore the clock cannot tick faster than that without breaking things
- The maximum clock speed is the reciprocal of the time it takes for a signal to propagate through the critical path
- e.g. if the critical path lenth is 2 ns (= 2 x 10-9 s), then the maximum clock speed is the reciprocal of that - 500 MHz (= 5 x 108 Hz)
- Harvard vs. von Neumann and Single- vs. Multi-cycle
- In a single-cycle machine, every instruction takes one clock cycle.
- In a multi-cycle machine, instructions take 2 or more clock cycles.
- Harvard = 2 memories: one for instructions and one for data
- von Neumann = 1 memory: contains everything!
- We tend to prefer this - it’s just easier to deal with a single address space, a single “flavor” of pointer, a single “flavor” of loads/stores etc.
- There is a fundamental limitation on most memory: you cannot access two different addresses in one piece of memory at the same time.
- This is a practical issue - adding circuitry to do so would make the memory way more expensive and slower, so we just… don’t.
- If you want to make a single cycle machine, you must use a Harvard (2-memory) architecture
- because you cannot do the fetch and memory phases at the same time (within 1 cycle)
- If you want to make a von Neumann (1-memory) machine, you must make it multi-cycle
- that way we can use the same memory for fetch and memory phases, but at different times
- So,
- single-cycle => Harvard (that is, “single cycle implies Harvard” - if you want a simple single-cycle machine, you must accept that you will have two memories)
- von Neumann => multi-cycle (“von Neumann implies multi-cycle” - if you want a von Neumann architecture, you must build the CPU to be multi-cycle)
- single-cycle von Neumann is impossible to build
- (multi-cycle Harvard is useful for pipelined CPUs - separate instruction and data caches so one instruction can fetch at the same time another instruction does a load/store)
- Average CPI calculation
- In a multi-cycle machine, each instruction takes a certain number of cycles
- E.g. ALU = 4 cycles, loads = 10 cycles, stores = 8 cycles, jumps = 5 cycles, branches = 3 cycles
- If we run a test (benchmark) program, we can count how many of each instruction will be executed to come up with proportions for each kind of instruction
- E.g. 40% ALU instructions, 20% loads, 20% stores, 10% jumps, 10% branches
- Then CPI is the weighted average of those instruction classes
- E.g. (4 * 0.4) + (10 * 0.2) + (8 * 0.2) + (5 * 0.1) + (3 * 0.1) = 6.0
- You can then compare the CPI of different CPUs (different numbers of cycles) by using the same program (instruction proportions)
- You can also compare the performance of different programs (different instruction proportions) on the same CPU
- In a multi-cycle machine, each instruction takes a certain number of cycles
- Performance equation
- Calculates how long a program will take to run on a given CPU
- (n instructions) x (CPI cycles per instruction) x (t seconds per cycle); or
- (n instructions) x (CPI cycles per instruction) x (1 / f Hz)
- Be careful about your exponents and SI prefixes here
- nano is negative nine
- Calculates how long a program will take to run on a given CPU
- Kinds of control
- Hardwired single-cycle
- Entirely combinational: instruction goes in, control signals come out
- Simple to design, terrible performance
- Hardwired multi-cycle (FSM)
- Multiple steps/phases for each instruction
- Have to keep track of what phase we’re on (what FSM state we’re in)
- Number of phases is tailored to each instruction to avoid wasting time
- Each phase is still hardwired though
- Microcoded multi-cycle (FSM, but fancy)
- Like the FSM one, but the states and transition table can be reprogrammed
- Firmware is the “program” that implements the control FSM
- (Details on microcode below)
- Hybrid microcoded and hardwired
- Use hardwired control for really common and simple instructions
- Fall back on microcode for more complex operations
- Hardwired single-cycle
- Microcode!
- What is it?
- A way of designing multi-cycle control so that each ISA instruction is implemented as a sequence of “micro-instructions” that perform the various phases of execution.
- What are the benefits?
- FLEXIBILITY!
- While designing the CPU, you can change the instruction set, add instructions easily, etc. without having to change the circuitry of the CPU itself
- And if the microcode is in a writable ROM, we can update the CPU after it’s already been sold and installed in users’ computers
- What’s the downside?
- slower than a hardwired FSM because of the complexity - accessing the microcode ROM and decoding the microinstructions adds propagation delay.
- What is it?
- Caching is keeping copies of recently-used data in a smaller but faster memory so it can be accessed more quickly in the near future
- Pipelining is partially overlapping instruction execution to improve throughput and complete instructions faster
- Superscalar CPUs can complete > 1 instruction per cycle by fetching and executing multiple instructions simultaneously (completely overlapping instruction execution)
- Out-of-order CPUs analyze several (a dozen or more) instructions in advance, then dynamically reorder them so they can be executed more quickly than they would as written