## Announcements

• There’s a quiz today!

## Recap

• Selection sort
• Select min from unsorted part
• Swap with first unsorted item
• Repeat until whole array is sorted
• Bubble sort
• Find inversions and swap
• Repeat until no more inversions are found
• Insertion sort
• Take first unsorted item
• Insert into sorted part in correct order, sliding things over if needed
• Repeat until whole array is sorted
• Last time we couldn’t get any better than $O(n^2)$ worst case for sorting
• Selection and bubble sort were tied
• Insertion sort could be $O(n)$ best case, but was still $O(n^2)$ worst case
• Let’s try to do better!

## Divide-and-conquer sorting

• A divide-and-conquer algorithm…
• Divides a problem into subproblems which are a multiplicative fraction of the original size
• Combines the results of the subproblems into a solution for the larger problem
• They’re typically written recursively, as it’s the most natural way to express this behavior
• How could we divide an array? What’s the most obvious solution?
• Chop it in half!
• And then sort each half, somehow?
• And then combining those halves together…
• That’s a bit less obvious
• It depends on how we chopped the array in half
• Let’s try doing the subproblems with a simple sort, like selection sort
• If we start with $\{ 7, 5, 2, 1, 0, 3, 6, 4 \}$
• We divide to get $\{ 7, 5, 2, 1 \}$ and $\{ 0, 3, 6, 4 \}$
• We sort each half to get $\{ 1, 2, 5, 7 \}$ and $\{ 0, 3, 4, 6 \}$
• And now we have to combine the two halves…
• Think of it like cars merging onto a highway
• Except the cars have numbers on them
• And whoever has the smaller number goes first
• So this “merging” procedure looks like this:
1. Look at the two values at the beginnings of the arrays.
2. Remove the smaller one and put it at the end of a new “sorted” array.
3. Repeat until both arrays are empty.

• Did we make anything better?
• How long does this merging procedure take?
• There’s just a single loop with two constant time operations inside…
• So it’s $O(n)$
• How long did sorting each sub-array take?
• We used selection sort, which is $O(n^2)$
• So our procedure took $O(2n^2 + n) = O(n^2)$ time
• GOD
• DAMN IT
• WHY AREN’T THINGS GETTING BETTER??!?

## Merge Sort

• We just did most of merge sort!
• Here’s merge sort:
1. If the array length is 0 or 1, it’s sorted.
2. Else:
• split the array into two halves
• recursively merge sort each half
• merge the two halves back together using the procedure we just talked about
• Wait, but how does this make things any better???
• Well… it’s a little tricky, but…
• Let’s use a recursion tree diagram to see
• mergesort({ 7, 5, 2, 1, 0, 3, 6, 4 }) recursively calls…
• mergesort({ 7, 5, 2, 1 }) which recursively calls…
• mergesort({ 7, 5 }) which recursively calls…
• mergesort({ 7 }) which is sorted.
• mergesort({ 5 }) which is sorted.
• mergesort({ 2, 1 }) which recursively calls…
• mergesort({ 2 }) which is sorted.
• mergesort({ 1 }) which is sorted.
• mergesort({ 0, 3, 6, 4 }) which recursively calls…
• mergesort({ 0, 3 }) which recursively calls…
• mergesort({ 0 }) which is sorted.
• mergesort({ 3 }) which is sorted.
• mergesort({ 6, 4 }) which recursively calls…
• mergesort({ 6 }) which is sorted.
• mergesort({ 4 }) which is sorted.
• When analyzing a “branching tree” structure, it’s best to look at it “by levels.”
• So the first “level” has 1 recursive call
• The second “level” has 2 recursive calls
• The third has 4, then the fourth has 8…
• At each level, how many comparisons are done?
• Comparisons are only done during merging, so the base cases can be ignored.
• At the top level, we have to merge two n/2-sized arrays
• What would be the worst possible case there?
• n comparisons - first array A, then B, then A, then B… all the way down
• At the next level, we have to merge two n/4-sized arrays, twice
• so it’s 2n/4, twice… so again, n comparisons
• At the third level, we have to merge four n/4-sized arrays, 4 times
• again, it’s n comparisons!
• But here’s the kicker: how many levels are there?
• There are $\log(n)$ levels.
• So it’s not $O(n^2)$ anymore.
• It’s $O(n \log n)$! 🎺🎉🎊
• We did it!
• We broke the $O(n^2)$ barrier
• What does $O(n \log n)$ look like?
• Well, it grows faster than linear…
• …but not as fast as quadratic.
• We also call this linearithmic (it’s a fun portmanteau)
• It comes up a lot in sorting and tree algorithms so it’s a useful term
• For $n=100$, linear is $100$, linearithmic is $200$, quadratic is $10{,}000$
• For $n=1{,}000$, linear is $1{,}000$, linearithmic is $3{,}000$, quadratic is $1{,}000{,}000$
• Now why did this work while the earlier example didn’t do better than $O(n^2)$?
• Cause before, we didn’t keep splitting the problem up.
• We just made the problem size $\frac{n}{2}$.
• It’s the recursive splitting that’s the secret.
• However…
• One of the big downsides of mergesort is that we need $O(n)$ additional space.
• Or in English, to sort an array of n items, we need to allocate a second array of n items.
• So if you don’t have memory to spare…

## Quick Sort

• Despite its name, it’s not really any “quicker” than mergesort…
• Here’s quicksort:
1. If the array length is 0 or 1, it’s sorted.
2. Else:
• pick a value from the array. This is the “pivot”.
• partition the array into two halves: everything less than the pivot and everything greater than the pivot.
• Now we know where the pivot goes, so put the pivot there.
• recursively quicksort each half of the array.
• It feels almost like binary search, but backwards!
• We could be lazy and just allocate new arrays for the partitions.
• But unlike mergesort, quicksort can be performed without using any extra space!
• Here’s the partitioning algorithm:
1. look at the last value in the array (at length - 1). that is your “pivot.”

2. have two “fingers”, one at each end of the array (at 0 and length - 2). then in a loop:
1. move the left one right until you find something >= pivot (or you cross the right).
2. move the right one left until you find something < pivot (or you cross the left).
3. if they cross (left finger > right finger), then break.
• the > is super important here. If you use >=, it doesn’t work!
4. swap the values at the left and right fingers, and move them inwards by 1.

3. swap the pivot (at length - 1) with the thing at the “left” finger (value larger than pivot):

• Now everything to the left of the pivot is less than it,
• and everything to its right is greater than or equal to it.
• hey, this feels like binary search again!
• and to sort, we just recursively partition the left and right sides.
• To analyze it, let’s consider the best and worst cases.
• How efficient the partitioning is really has to do with one important decision:
• What pivot value do we use?
• Above, we used a simple method: use the last value in the array.
• But what if the last value in the array happens to be the biggest?
• Then how many things would be to the left?
• All of them!
• And how many to the right?
• None of them!
• And we’d repeat the process on the left side…
• and maybe its last value is also the biggest…
• and so on and so on…
• So the worst case is an array that is already sorted.
• So each time, we have to look at n values, but each recursion we’re only making it smaller by 1.
• If we look at the recursion tree, it’s not much of a tree at all.
• More of a recursion linked list…
• Since there are n levels, and n steps on each level, it has a worst case performance of $O(n^2)$.
• Unlike mergesort, which is always $O(n \log n)$.
• What is the best case? That is, when will we get the same number of values on either side of the pivot?
• When the pivot is the median.
• That’s “the value with equal numbers of values on either side.”
• If we keep picking the median, then the array keeps getting split evenly…
• And just like mergesort, we end up with $\log n$ levels, so $O(n \log n)$.
• Picking the pivot is an important part of implementing quicksort.
• If you always pick the first, the last, the nth…
• Then there will always be pathological cases (arrays that take $O(n^2)$ time).
• You could try picking a random pivot
• But then you can’t really predict the performance.
• Maybe you get lucky every time! Maybe you get unlucky every time!
• We can get a sample of values and pick the median from those
• A common technique is median-of-three
• You pick the first, last, and middle items, and whichever is in the middle of the other two becomes your pivot
• It greatly reduces the chances of getting a pathological case, but…
• It is possible to have them
• Consider an array where every item is the same!
• Or we can sample the whole array and get the real median
• It can be found in $O(n)$ time, so that we always get the “best” split
• And therefore it’s $O(n \log n)$ in all cases!
• But in practice… it ends up not being worth it.

## Stability

• One important property of sorting algorithms is stability.
• If you have two equal items in the input array…
• Let’s call them i and j, where i comes before j in the input array
• In a stable sort, it will keep i and j in the same order as in the input array.
• An unstable sort might swap them so that j comes before i in the output array.
• Why on earth is this important??
• They’re equal, right? Who cares?
• Well it doesn’t matter much for numbers. But for other things…
• Consider a spreadsheet of users.
• Each has a first (given) name, last (family) name, user ID, email, and department
• Let’s say I want to sort them so that they are sorted by department, and then within each department, they are sorted by last name.

• We work backwards: first sort by last name. (This does not have to be done stably.)

• Then, stably sort on the department.
• Notice that the two people in CS will not swap places. Their names will remain in alphabetical order. (Same with the Business people.)

• Do we need stability?
• I’m kinda talking out of my butt here, but
• Honestly? For the above problem? I’d just use a comparator (compareTo() method) that compares both the name and the department at the same time, rather than doing 2 sorts.
• That would work even if we used an unstable sort.
• I’m sure I’m wrong about something.
• I’m like 80% sure.

## The Best of Both Worlds

• Remember what the graphs of log n, n, and n^2 look like?
• What do you notice about the values of these functions near n = 0?
• They’re all about the same… and in fact, n^2 is smaller in some cases!
• Quicksort and mergesort work great for large values of n
• But their performance for small arrays is not really significantly better than the simple sorts
• And can be worse in some cases!
• So instead of using one sorting algorithm…
• We can use two.
• We modify the recursive versions of quicksort or mergesort by adding this condition:
• If the array size < k, perform an insertion sort.
• (or selection sort, or bubble sort, whatever you want)
• Else, proceed as normal.
• k is some arbitrary constant.
• It decides when we “switch over” from the divide-and-conquer sort to the simpler sort.
• How we pick k is… kind of throwing stuff at the wall and seeing what sticks.
• It’s usually on the order of magnitude of 10 to 100.
• Yep, even for 100 items, an $O(n^2)$ sort is more than fast enough in most cases!

• One last sort, and we’re not gonna go too deeply into it cause it’s kind of scary to analyze
• But I want you to have a peek of “another kind” of sort.
• All the algorithms we talked about so far are comparison-based algorithms.
• It has been proven that these kinds of algorithms can never do better than $O(n \log n)$ in all cases.
• We saw cases where they could be as fast as $O(n)$ in the best case!
• But not in all cases.
• However…
• If we know something about the data to be sorted…
• We can take advantage of that information to do things in a faster way.
• Radix Sort is a non-comparison sort.
• You have probably done a radix sort yourself without knowing it.
• If I were to give you 300 cards with a bunch of different words on them, all shuffled…
• You might go “well, this is too much to sort at once.”
• “I’m going to put all the ‘a’ words in one pile, and all the ‘b’ words in another, and…“
• This is the idea behind radix sort.
• If you have data that can be represented as some sort of string…
• Where each position in the string can have a small number of possible values…
• Then you can do a radix sort.
• Fortunately, many common cases fit this criteria!
• But this is how we get the better performance: we lose generality.
• We can’t apply this sort to ALL kinds of data.
• There are two important variables:
• The maximum length (k) of the values to be sorted
• For integers, that would be how many digits the biggest number has.
• The number of possibilities (d) for each position in the value.
• For integers, each place can be one of 10 digits (0 through 9).
• Here’s the idea:
• Create d buckets.
• These are probably just arrays.
• For i = 0 to k:
• For j = 0 to n:
1. Look at the digit at position i.
2. Place it in the matching bucket.
• Now all the values have been sorted into buckets.
• Next, take the values out of the buckets, in order, and put them back into the original array.
• It doesn’t seem obvious that this works, but…
• If we’re sorting 3 digit numbers, the first pass makes sure the rightmost digits are in order.
• Then the second pass makes sure the middle digits are in order, and the rightmost ones will maintain their order. (It’s a stable sort!)
• Then the last pass makes sure the first digits are in order, and the numbers are sorted!
• If we look at the loops, it’s like… $O(kn)$.
• It’s not technically linear.
• But for most practical cases, k is usually pretty small.
• So it’s almost like a constant… meaning we can sort in $O(n)$ in all cases.