• Pros and cons of arrays
• Benefits
• They are simple to understand
• They also keep things in a definite order, if we need that
• They allow random access - “get the nth item” is easy
• When we ran out of space in the array implementation of Bag, what did we have to do?
• Make a new, bigger array
• Copy the old contents into it
• Get rid of the old array (Java does this for us)
• What if the user had added 10,000 items to the Bag, then removed them all?
• The array would stay huge
• Maybe this is good, in case they add 10,000 items again…
• …Or maybe this is a waste of space
• What if, unlike a Bag, we cared about the order of the items?
• We would have to shift all the items down one at a time if we removed one at the beginning
• Rulers vs. Chains
• You can think of an array as a (really long) ruler
• It has regularly-spaced numbered markings on it
• You can tie an item on at each of these markings
• You know exactly where to go if you want “the item at the nth marking”
• But the ruler has a fixed size
• (Let’s assume you can’t cut the ruler :P)
• What if we instead used a chain?
• The chain can have as few or as many links as you want
• You can tie an item on each link
• Let’s say each link has a hole in it so you can hook and unhook the links
• Adding or removing a link to the ends is really easy
• Adding or removing one in the middle is trickier, but still pretty easy
• You could even split one chain into two or more chains
• But the links are not numbered, so if you want to get the nth item…
• You have to start at one end and count n links
• A linked list is a data structure where each item is held by a link in a chain
• We call each link a node
• A node contains two things:
• The value it holds
• A link to the next node in the chain
• To keep track of a whole linked list…
• We only need to hold onto the first node
• Just like a chain - you only have to hold one end, and the chain hangs from it
• But if each node only links to the next node…
• Can you go backwards?
• No.
• There is a variant of linked lists where each node keeps links in both directions
• But that’s kind of a pain in the ass and not always worth it

## Implementing a Bag using a linked list in Java

• We’re gonna have a class called LinkedBag
• And inside that class, we can declare the Node class!
• This is a nested class
• We can either declare it as a static class or not
• In this case we don’t need the extra features a non-static class has
• So we’ll just use private static class Node
• It works just like any other class, but it can only be seen and used by LinkedBag
• Keeping track of the list and size
• We only need a reference to the head of the list - the first link in the chain.
• If the list is empty, then _head == null.
• We’ll also keep track of the size so we don’t have to count nodes every time.
• Where’s the easiest place to add a node?
• The beginning.
• There are two cases to consider: when _head == null and when _head != null
• But really, the two cases are the same:
1. Make a new node which points to the old head (which might be null)
2. Make that node the new head
3. Increase the size by 1
• Removing an arbitrary item
• Again, where’s the easiest place to remove from? The beginning.
• Again, two cases to consider, but the _head == null case is easy: just return null.
• Otherwise:
1. Get the item from the old head
2. Make the old head’s next node the new head (which might be null)
3. Decrease the size by 1
• Looking for an item
• With the array-based Bags, how did we find an item?
• By iterating over the array one item at a time.
• Either we found it, or we got to the end of the array and didn’t find it.
• We do the same thing here.
• We start at the head…
• And follow the links until we find the item or we run out of nodes.

## Algorithm Analysis

• Every solution to a problem has two costs: time and space
• Time is how many steps we take to get the end
• Space is how many things we have to remember at once
• (We won’t really get into space analysis in this class)
• (But we can often trade time for space and vice versa)
• Computers get faster every year…
• But algorithm analysis is about something deeper.
• Two solutions to the same problem can take drastically different amounts of time.
• And it’s not because of the computer they’re running on.
• An example: summing integers from 1 to n
• Say sum(n) = 1 + 2 + 3 + ... + (n-1) + n
• So sum(1) == 1, sum(2) == 3, sum(3) == 6 etc.
• Three algorithms to do this:
• A. for(i = 1 to n) sum = sum + i
• B. for(i = 1 to n) { for(j = 1 to i) sum = sum + 1 }
• C. sum = n * (n+1) / 2
• Intuitively, what order do these come in, from fastest to slowest?
• C, A, B.
• But why?
• It has to do with the number of repeated steps.
• Let’s count the number of additions, multiplications, and divisions in each
• A. n additions, nothing else. Total = n operations
• C. 1 addition, 1 multiplication, 1 division. Total = 3 operations.
• B…
• This one is trickier.
• How many times does the outer loop (i) run?
• How many times does the inner loop (j) run for each of those?
• There’s 1 addition inside the inner loop.
• The first outer loop we do 1 iteration of the inner loop (1 addition).
• The second outer loop, we do 2 iterations of the inner loop (2 additions).
• The third, 3; the fourth, 4… hey, what’s this pattern?
• It’s n * (n + 1) / 2 additions!
• right? we’re adding 1 each time, so to get to sum, we have to do sum additions.
• In total, we do (n^2 + n) / 2 operations.
• How well do these perform for various values of n?
• A is a linear function of n.
• C is a constant value (3) - it doesn’t depend on n at all.
• B is a quadratic function of n.
• As n grows larger, C always takes the same amount of time; A gets worse; and B gets way worse
• Like, mathematically provably worse.
• This is algorithm analysis: focusing on the trends of time (or space) as you increase the size of the input.
• Next years computers might be twice as fast, but a bad algorithm is always bad.

## Big-O notation (the letter, not the digit)

• If we have a polynomial…
• What does its graph look like?
• It kinda wiggles around at x = 0
• But in the long run…
• Doesn’t matter what order polynomial it is, it just goes up.
• The higher the order, the faster it goes up.
• Is there much of a difference between n^2 and n^2 + n?
• Lower-order terms don’t matter.
• How about n^2 and 2n^2?
• Multiplicative constants don’t matter.
• Big-O notation
• If you have an algorithm whose behavior is characterized by f(n)
• Then we say it’s bounded above by a function g(n) if:
• g(n) >= f(n) “in the long run.”
• The proper formal definition of “in the long run” is c*g(n) >= f(n) for all n >= n_0
• c is some positive constant, and
• n_0 is some positive integer.
• That n_0 bit has to do with the sort of “break even” point some functions have
• Consider n vs n^2
• It’s possible to have an n^2 function that is less than n… for a while.
• But at some point, they intersect. That intersection point is n_0.
• For every n >= n_0, n^2 is larger.
• We say that an algorithm is O(something) if “something” bounds its runtime from above.
• O(1) is constant time.
• No matter what the input is, it always takes the same amount of time.
• O(n) is linear time.
• A singly-nested loop is usually this.
• O(n^2) is quadratic time.
• A doubly-nested loop is often this.

## Analyzing some things

• Sequential search through an array or linked list
• For a list of length n, what is the worst case?
• We look through all n items and don’t find it.
• Therefore, it’s linear time – O(n).
• What about the best case?
• We find the item at the very beginning.
• Therefore, it’s constant time.
• We actually could write this best-case time as Ω(1)
• What about the average case?
• ??????
• Average what?
• To say what kind of average, we have to have a probability distribution.
• Let’s say it’s equally likely to find the item anywhere in the list.
• So we’d add up all the possibilities, and divide by n to get the average.
• In this case, it’s ((n^2 + n) / 2) / n = n. It’s O(n) again.
• Intuitively, this makes sense - chances are, it’s gonna be somewhere in the middle.
• Let’s analyze add(), remove(), contains(E), and remove(E) for the Array and Linked Bags.
• As it turns out, both implementations are as follows:
• add() is O(1)
• remove() is O(1)
• contains(E) is O(n)
• remove(E) is O(n)