Remaining TODOs: 61
Relevant reading:
T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein. Introduction to Algorithms
1. Program cost and asymptotic notation
Definition 1.3: Insertion sort compares each the element and compares it with the previously sorted elements, inserting it in the correct place.
In CRLS-style pseudocode (as used in Introduction to Algorithms):
Input: An integer array A
Output: Array A sorted in non-decreasing order
for j = 1 to A.length - 1
key = A[J + 1]
// insert A[j + 1] into the sorted sequence A[1..j]
i = j
while i > 0 and A[i] > key
A[i + 1] = A[i]
i = i - 1
A[i + 1] = key
Definition 1.4: The running time of a CLRS program is defined as:
- Line takes constant time
- When a loop exits normally, the test is executed one more time than the loop body
Remark: The running time of insertion sort as given is where is the number of times the test of the while loop is executed for a given value of .
Then in the worst case, , so for some . Hence is quadratic in .
In the best case, so is linear.
Definition 1.5: Let . Then
If then is an asymptotic upper bound for .
Proposition 1.6: The algorithm is correct.
Proof:
By a loop-invariant argument:
- Initialisation - Prove the invariant holds prior to first iteration
- Maintenance - Prove that if holds just before an iteration, then it holds just before the next iteration
- Termination: Prove that, when the loop terminates, the invariant along with the reason the loop terminates imply the correctness of the program
This is similar to mathematical induction, but rather than proving for all numbers, we expect to exit the loop.
Invariant: At the start of the th iteration, A[1..j] is sorted.
When , is a singleton so is trivially sorted.
The outer loop terminates when j = A.length. So the loop invariant at termination says that A[1..A.length] = A is sorted.
To prove maintenance, we need to prove that, at the end of the while loop, the sequence A[1], ..., A[i], key, A[i+2], ..., A[j+1] are sorted.
The invariant of the while loop is: TODO
Remark: Some nice properties of insertion sort:
- It is stable (preserves relative order of equal keys)
- In-place
- Online (can sort the list as it is recieved)
Lemma 1.7: Let . Then:
Definition 1.10: If , is an asymptotic tight bound for .
2. Divide and conquer algorithms
Example (merge sort):
- Split array into 2 ()
- Carry out merge sort of on the subarrays (or for singletons, do nothing)
- Merge the elements from the sorted sublists in order, which is easy as we just pick the smallest first element from each
Overall (worst case).
Psuedocode: TODO
if r > p + 1
q = floor((p + r) / 2)
Pseudocode for merge subroutine: TODO
todo
Remark: Merging should be left-biased so that mergesort is a stable sort.
Merge sort is not in-place (requires extra space) and is not inline.
The running time of a divide-and-conquer algorithm can be analysed as follows:
Let be the running time on a problem of size .
- If is small, , use constant-time brute force solution
- Otherwise, divide the problem into subproblems, each the size of the original
- Let the time to divide a size- problem be
- Let the time to combine solutions be .
Then
Example (merge sort runtime analysis): We have , and .
Then
where , i.e.
TODO guess and check
TODO recursion tree unrolling
Theorem 2.2 (Master Theorem):
Suppose
then
Proof:
TODO
Example (Master theorem for merge sort):
For merge sort, , so we have , , so .
TODO integer multiplication, matrix multiplication examples
TODO binary search
TODO th-order statistic
Theorem 2.3: The running time of every comparison-based sorting algorithm is , i.e. at best it is .
Proof:
In a descision tree of a comparison-based sorting algorithm on an input of length , there are leaves. Every binary tree of depth has at most leaves; so to get to we need a depth of at least .
Definition 2.4: If we know our elements come from a certain discrete interval, we can use counting sort: just count how often each element appears, then insert the relevant number of each elements. This is ( if ) but is not in-place; however we can design it to be stable.
Input: An array A of n elements with elements in the range [arrow(0)..k]
Output: An array B consisting of a sorted permutation of A
TODO
Definition 2.5: The Fast Fourier Transform (FFT) can be used to multiply two polynomials of degree in time.
-
Split each polynomial into 2 polynomials of even degree:
This is nice because each polynomial is an even function. We now have that
-
Represent each polynomial by a list of values at points, which can be computed in by Horner’s rule,
… TODO
Essentially, we split into two polynomials and recurse. Once we are in point-value form, we can multiply each point, all in , and then convert back to coefficient form (interpolation).
3. Data structures: heaps and queues
Definition 3.2: A heap is a data structure that is a tree that is completely filled on all levels except the lowest, which is filled from the left to right.
The height of a tree is the longest simple path from the root to a leaf; a binary heap with nodes has height .
Heaps can be represented with arrays, with each level stored next to each other, read left-to-right and top-to-bottom. This is unambiguous because the tree is complete, so we know how many children each element has.
For a binary tree, if we have the root at A[0], the left child of A[i] is at A[2i + 1] and the right child is at A[2i + 2]. If arrays are 1-indexed, we instead get A[i]‘s children at A[2i] and A[2i + 1].
To turn a heap into a max heap, we carry out the MaxHeapify algorithm.
Input: A 1-indexed array A and index i such that the subtrees of the node A[i] are max-heaps.
Output: A max heap with root at A[i].
MaxHeapify(A, i):
n = A.size
l = 2i
r = 2i + 1
if l <= n and A[l] > A[i]:
largest = l
else:
largest = i
if r <= n and A[r] > A[largest]:
largest = r
if largest != i:
swap A[i] and A[largest]
MaxHeapify(A, largest)
This is as we are bounded by the depth of the tree.
We carry out MaxHeapify on every node, starting from the bottom (skipping leaves - this skips about half of the vertices).
MakeMaxHeap(A):
for i = ceil((n + 1) / 2) - 1 downto 1
MaxHeapify(A, i)
TODO
Therefore the full procedure is definitely ; but in fact a tighter bound is TODO proof
To extract the maximum element from a max-heap, we extract the root, place the last vertex into the root, and carry out MakeMaxHeap. This is .
To insert an element, append to the end of the backing array, and carry out MaxHeapify recursively on the parents of the new vertex. This is .
Definition 3.5: Heap sort:
- Build a max-heap
- Starting from the root, swap the maximum with the final element in the backing array, and ignore the final node from now on (it remains in the backing array, but we decrement the heap size), then carry out
MaxHeapifyon the root (this is fine because the children are still max-heaps) - Repeat until the heap size is 1
This is worst case like merge sort, but is in-place like insertion sort.
Definition 3.6: A priority queue maintains a set of elements, each with an associated key. Max-priority queues give priority to elements with larger keys; min-priority keys are defined similarly.
Max-priority queues have the following operations:
insert(S, x, k)inserts elementxwith keykintoSmaximuum(S)returns the element ofSwith the largest keyextractMax(S)removes and returns the element ofSwith the largest key- TODO
If we implement the priority queue as a heap, both insertion and extraction are .
TODO increasing the key value
4. Dynamic programming
Definition 4.1: Dynamic programming is an optimisation in which we define a sequence of subproblems such that:
- The subproblems are ordered from smalles to largest
- The largest is the one we want to solve
- The optimal solution of a subproblem can be constructed from the optimal solutions of smaller subproblems (this property is optimal substructure)
We then solve from smallest to largest and store the solutions.
Example (Change-making): Suppose we have lots of coins of different denominations, , and want to give units of change.
Then .
More formally,
The running time is . This is polynomial in and (i.e. in the number of inputs), but is exponential in the size of the inputs.
Example (Knapsack problem): A burglar has a knapsack with capacity . There are itmes to pick from, of size/weight and value . We want to find the most valuable configuration of items to steal, assuming that there is either 1 of each item, or an unlimited quantity of each.
Assuming first an unlimited supply of each item:
Total running time is .
Where there is only one of each item:
The running time is .
But we can modify to make it not exponential, by rather than passing a set of indices, passing the largest index that is considered - since if we remove item , we should have an optimal solution for .
This formulation is stating that for each index , we either pick that item or discard it.
Now the running time is .
TODO edit distance
Example (Longest simple path): This is an example of a problem that does not have optimal substructure: the longest simple paths from to and from to might share a vertex, so cannot be combined. We would therefore need to store not only the length of the longest path of a subproblem, but also the path itself; so dynamic programming cannot be used for this.
However, if the graph is acyclic, the problem does have optimal substructure.
Example (Travelling salesperson problem): Given a complete undirected graphs with weights associated with each edge , find the Hamiltonian cycle with minimal total distance (sum of edge weights).
This is NP-complete, so we don’t know if there is a polynomial time solution, but we can verify a solution in polynomial time.
Brute force is very bad, but dynamic programming gives a better (but not polynomial) solution.
Subproblems: for every subset containing , and for every , find the shortest path that starts from , ends in , and passes only once through all the other nodes in . Define to be the length of that path.
Then
TODO
5. Graph decomposition
Definition 5.1: An adjacency matrix of a graph is a matrix where
TODO
Remark: Given an adjacency matrix , is the number of walk of length from to .
If represents an undirected graph, is the degree of .
Remark: An adjacency list for a graph is a list of linked lists, one per vertex: the list for vertex holds the indices of vertices to which has an outgoing edge.
This has size
TODO
Definition 5.2: Depth-first search (DFS) is a linear-time algorithm that tells us what parts of a graph are reachable from a given vertex. It works for both digraphs and undirected graphs.
As soon as a new vertex is discovered, explore from it. As DFS progresses, we assign each vertex a colour:
- not discovered yet (e.g. white)
- discovered, but not fully explored yet (e.g. grey)
- finished (e.g. black)
Input: A graph
Output: for each vertex , a backpointer (the predecessor of ) and two timestamps, the discovery time and finishing time .
DFS(V, E):
for u in V:
colour[u] = white
pi[u] = null
time = 0
for u in V:
if colours[u] = white:
DFSVisit(u)
DFSVisit(u):
time += 1
d[u] = time
colour[u] = grey
for v in neighbours(v):
if colour[v] = white:
pi[v] = u
DFSVisit(v)
time = time + 1
f[u] = time
colour[u] = black
Note that .
Note that are dependent upon the order in which the vertices in the graph are visited, and on the order of the vertices in the adjacency/neighbour lists.
This has running time because each vertex is visited once, and DFSVisit(v) takes where is the degree of ; .
If we map out the edges visited by DFS, we get a forest that we call the DFS forest, consisting of DFS trees (note that this is a slight abuse of notation because trees and forests are usually undirected).
Theorem 5.3 (Parenthesis theorem): For all , exactly one of the following holds:
- or and neither of and are descendants of each other in the DFS forest
- and is a descendant of in a DFS tree
- and is a descendant of in a DFS tree
Using shorthand TODO
Corollary 5.4: Vertex is a descendant of vertex iff .
TODO
Definition 5.5: We can classify edges of the searched graph:
- Tree edges are edges of the DFS forest; is white when is explored;
- Back edges lead from a node to an ancestor in the DFS tree. is grey when is explored;
- Forward edges lead from a node to a non-child descendant in the DFS tree, i.e. they lead to a descendant but are not edges in the DFS forest. is black when is explored;
- Cross edges do not lead to an ancestor or descendant; this could be between nodes in the same tree or in a different tree. is black when is explored.
TODO discovery and finishing times vs edge classification
Theorem 5.6: A directed graph has a cycle iff has a back edge.
Proof:
: If is a back edge, then there is a cycle consisting of the back edge and the path in the DFS tree from to .
: TODO
Definition 5.7: A topological sort of a DAG is a total ordering of vertices, , such that if then .
TopologicalSort(V, E):
f[v | v in V] = DFS(V, E)
return V sorted according to decreasing f[v]
TODO correctness of topological sort
Lemma 5.10: Given distinct strongly connected components of a digraph , , if there is a path from to in , then there is not a path from to in .
Definition 5.11: The SCC graph of a digraph is the graph , such that
- has one vertex for every SCC in
- iff there exist such that there is a path from to .
Definition 5.12: We extend discovery and finishing times for sets of vertices :
- , the earliest discovery time amongst ;
- , the latest finishing time amongst .
Lemma 5.14: Let be distinct SCCs in . If there is an edge from to , then .
Lemma 5.15: If and , then there cannot be an edge from to .
Definition 5.16: For a digraph , the tranpose of is where
Remark: For distinct SCCs in , if , then
- in there cannot be an edge from to
- in there cannot be an edge from to .
This means that running DFS on starting from the SCC with the largest finishing time, we will not find edges to any other SCC. TODO why? intuition
Definition 5.18: The strongly connected components algorithm identifies all SCCs in a digraph :
- run DFS on
- run DFS of , exploring in order of decreasing finishing time of the first DFS
- the trees in the DFS forest of the second DFS correspond to the SCCs of
6. Paths in graphs - finding shortest paths
Definition 6.1: Consider a digraph with weight function .
The weight or length of a path TODO
Definition 6.3: Considering first the simple case where all weights are equal to 1. Let be the minimum number of edges on a path from to , if such a path exists, otherwise .
Given a source vertex , breadth-first search (BFS) finds all vertices s reachable from , the shortest path lengths , and the shortest paths from to the s.
Idea:
- send out waves of increasing length from the source
- when a vertex is reached, out it in the (FIFO) queue
- when all of s neighbours have been reached, remove from
TODO the algorithm
Definition 6.4: The BFS tree consists of vertices reachable from and edges where . TODO what is ?
More formally, the BFS tree is the graph
Lemma 6.7: If is enqueued before , then . Furthermore, if is the queue at any given step of BFS, then
Theorem 6.9: .
Definition 6.11: Dijkstra’s algorithm is essentially BFS for weighted graphs. It solves the “single-source shortest path” problem for non-negative weights.
It uses a min-priority queue rather than a FIFO queue, with keys given by the shortest-path weight estimates .
At termination:
- is the distance from to
- is the TODO
TODO algorithm
Invariants for the while loop:
TODO initialisation
Then suppose that holds before an iteration of the while loop. For any vertex , either did not change, or it did change. If did not change, then by . Otherwise, let be the vertex selected by ExtractMin in this iteration of the loop. Then we have that
Now suppose that holds before an iteration. Then for any , either:
- , then by
- and there is no path from to , then
- and there is a path from to . Then there is also a shortest path TODO
TODO finish maintenance of
At termination, , so .
TODO remark about optimal substructure
Theorem 6.13: For , is the predecessor of on a shortest path from to .
Definition 6.14: The Bellman-Ford algorithm takes a graph with weight function (possibly negative), and returns false if there exists a negative-weight cycle reachable from , otherwise true, along with for each .
TODO algorithm
This is .
TODO invariants
TODO full topological sort algorithm for DAGs to find shortest path
TODO all-pairs shortest paths
Definition 6.15: Floyd-Warshall algorithm
Suppose we have vertex set .
Let be the length of the shortest path from to , all of whose intermediate nodes are in the interval
TODO
TODO algorithm (just dynamic programming as usual) TODO note why the is ok (because we also have a in the last argument so we’ve already calculated it)
Running time is .
7. Greedy algorithms
We will identify a tree within by its edge set .
Lemma 7.3:
An undirected graph is a tree iff every pair of vertices is connected by a unique (simple) path.
Lemma 7.4: If a graph is a tree, than .
Proof:
.
TODO more formally
Remark: A spanning tree is:
- a minimal connected subgraph (removing an edge disconnects it)
- a maximal acyclic subgraph (adding an edge creates a cycle).
The algorithm for building an MST:
- start from ; this is a (trivial) subset of an MST
- Add edges to , maintaining that is a subset of an MST
- Stop when no edge can be added to any more, then is an MST.
Let be a subset of an MST . We say that an edge is safe to add to iff is a subset of some MST.
Definition 7.6: A cut is a partition of the vertex set into and .
An edge crosses a cut if one endpoint is in and the other is in .
A cut respects if no edge in crosses the cut.
An edge crossing a cut is light if its weight is minimum over all edges that cross that cut.
Lemma 7.7: Let be a subset of some MST. If is a cut that respects , and is a light edge crossing the cut, then is safe for .
Proof: Let be an MST that includes . Since is a tree, it contains a unique path between and .
Path must cross the cut at least once. ( TODO why?) Let be an edge of that crosses the cut. Adding and deleting creates a tree .
is a tree because we create a cycle and then remove one edge from it, keeping the graph connected. is minimum because the weight of is . contains because was included in , and did not contain .
Definition 7.8: A disjoint-set data structure keeps track of a family of dynamic disjoint sets . Each set is identified by some arbitrary representative.
It has three operations:
MakeSet(x)adds toFindSet(u)returns the representative of the set containingUnion(x, y)removes from and adds to .
A good representation is as a disjoint-set forest, when elements are stored in an array with pointers to a parent leading to a root of each set. Keeping this balanced gives amortised complexity where is the number of operations and is the number of MakeSet operations, and is the inverse Ackermann function, which is extremely slowly-growing and is at most 4 for any feasible inputs.
Definition 7.9: Kruskal’s algorithm:
Start from . At each step, pick the edge with the smallest weight and add it to if it does not create a cycle.
To do this, we need to keep track of the connected components, for instance by using a disjoint-set data structure.
Kruskal(V, E, w):
A = nothing
for each v in V:
MakeSet(v)
sort E into increading order by weight w
for each edge (u, v) from sorted edge list:
if FindSet(u) != FindSet(v):
A = A union {(u,v)}
Union(u, v)
return A
There are MakeSet operations, and total disjoint-set operations. So the disjoint-set operations take (for a good disjoint-set implementation). So the sorting of the edges dominates. Therefore Kruskal’s algorithm has time complexity .
Definition 7.10:
picks a vertex and grows the tree from that vertex.
We set and . Then at every step, find a light edge connected to . Update to and to .
We use a min-heap/priority queue such that , the key of is the minimum weight of any edge where , and if no such edge exists set the key to be .
To find a light edge crossing the cut , take the minimum from the queue. If v = ExtractMin(Q), then there exists a light edge for some . The vertex can be retrieved by a backpointer: when the key of is to to , we define .
Prim(V, E, w, r):
for each u in V:
key[u] = oo
pi[u] = null
Insert(Q, u)
DecreaseKey(Q, r, 0)
while Q != nothing:
u = ExtractMin(Q)
for each v in Adj[u]:
if v in Q and w(u,v) < key[v]:
pi[v] = u
DecreaseKey(Q, v, w(u,v))
Then gives us the edges.