MIT6006 Lec07 Linear-Time Sorting

2018-11-15

Need to digest again!

First lists some claims:

searching among n preprocessed items requires $Ω(lg n)$ time: binary search, AVL tree search optimal
sorting n items requires $Ω(nlg n)$: mergesort, heap sort, AVL sort optimal

Comparison model

All the input items are black boxes (ADTs)
Only support comparisons (>,<, etc.)
Time cost = number of comparisons

Decision tree

Any comparison algorithm can be viewed/specified as a tree of all possible comparison outcomes & resulting output.

internal node = binary decision
leaf = output (algorithm is done)
root-to-leaf path = algorithm execution
path length (depth) = running time
height of tree = worst-case running time

In fact, binary decision tree model is more powerful than comparison model, and lower bounds extend to it

A sorting algorithm is comparison based if it uses comparison operators to find the order between two numbers. Comparison sorts can be viewed abstractly in terms of decision trees. A decision tree is a full binary tree that represents the comparisons between elements that are performed by a particular sorting algorithm operating on an input of a given size. The execution of the sorting algorithm corresponds to tracing a path from the root of the decision tree to a leaf. At each internal node, a comparison ai <= aj is made. The left subtree then dictates subsequent comparisons for ai <= aj, and the right subtree dictates subsequent comparisons for ai > aj. When we come to a leaf, the sorting algorithm has established the ordering.

Lower Bounds

Search Lower Bound

number of leaves >= number of possible answers >= n
decision tree is binary
height >= $lg Θ(n) = lg n ± Θ(1)$

Sorting Lower Bound

leaf specifies answer as permutation: A[3] ≤ A[1] ≤ A[9] ≤ . . .
all n! are possible answers
number of leaves >= n!

$$ \begin{align*}
height & ≥ lg n! \\
& = lg(1·2 /cdots (n − 1)·n) \\
& = lg 1 + lg 2 + /cdots + lg(n − 1) + lg n \\
& = \sum^n_{i=1} lg i \\
& \ge \sum^n_{i=n/2} lg i \\
& \ge \sum^n_{i=n/2} lg \frac n2 \\
& = \frac n2 lg n - \frac n2 \\
& = \Omega (n lg n)
\end{align*}$$

Therefore, the lower bound for Comparison based sorting algorithm (Merge Sort, Heap Sort, Quick-Sort .. etc) is $Ω(nLogn)$, i.e., they cannot do better than $n logn$.

Linear-time Sorting

Radix sort

Counting sort is a linear time sorting algorithm that sort in $O(n+k)$ time when elements are in range from 1 to k. But if the elements are in range from 1 to $n^2$, counting sort will take $O(n^2)$.

How to sort such an array in linear time? Use Radix sort.
The idea of Radix Sort is to do digit by digit sort starting from least significant digit to most significant digit. Radix sort uses counting sort as a subroutine to sort.

Reference:
https://www.geeksforgeeks.org/lower-bound-on-comparison-based-sorting-algorithms/
https://www.geeksforgeeks.org/radix-sort/