HT2026 Design & Analysis of Algorithms notes

Remaining TODOs: 61

Relevant reading:

T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein. Introduction to Algorithms

1. Program cost and asymptotic notation

Definition 1.1:An algorithm is a finite set of well-defined instructions to accomplish a specific task.

Definition 1.2:An efficient algorithm runs in polynomial time.

Definition 1.3: Insertion sort compares each the element and compares it with the previously sorted elements, inserting it in the correct place.

In CRLS-style pseudocode (as used in Introduction to Algorithms):

Input: An integer array A
Output: Array A sorted in non-decreasing order

for j = 1 to A.length - 1
  key = A[J + 1]
  // insert A[j + 1] into the sorted sequence A[1..j]
  i = j
  while i > 0 and A[i] > key
    A[i + 1] = A[i]
    i = i - 1
  A[i + 1] = key

Definition 1.4: The running time of a CLRS program is defined as:

Line takes constant time
When a loop exits normally, the test is executed one more time than the loop body

Remark: The running time of insertion sort as given is where is the number of times the test of the while loop is executed for a given value of .

Then in the worst case, , so for some . Hence is quadratic in .

In the best case, so is linear.

Definition 1.5: Let . Then

If then is an asymptotic upper bound for .

Proposition 1.6: The algorithm is correct.

Proof:

By a loop-invariant argument:

Initialisation - Prove the invariant holds prior to first iteration
Maintenance - Prove that if holds just before an iteration, then it holds just before the next iteration
Termination: Prove that, when the loop terminates, the invariant along with the reason the loop terminates imply the correctness of the program

This is similar to mathematical induction, but rather than proving for all numbers, we expect to exit the loop.

Invariant: At the start of the th iteration, A[1..j] is sorted.

When , is a singleton so is trivially sorted.

The outer loop terminates when j = A.length. So the loop invariant at termination says that A[1..A.length] = A is sorted.

To prove maintenance, we need to prove that, at the end of the while loop, the sequence A[1], ..., A[i], key, A[i+2], ..., A[j+1] are sorted.

The invariant of the while loop is: TODO

Remark: Some nice properties of insertion sort:

It is stable (preserves relative order of equal keys)
In-place
Online (can sort the list as it is recieved)

Lemma 1.7: Let . Then:

Definition 1.8: If , we say that is an asymptotic lower bound for .

Corollary 1.9: .

Definition 1.10: If , is an asymptotic tight bound for .

Remark: Big // are not closed under function composition.

2. Divide and conquer algorithms

Definition 2.1: A divide and conquer algorithm divides the problem into subproblems, solves each subproblem separately and combines the results.

Example (merge sort):

Split array into 2 ()
Carry out merge sort of on the subarrays (or for singletons, do nothing)
Merge the elements from the sorted sublists in order, which is easy as we just pick the smallest first element from each

Overall (worst case).

Psuedocode: TODO

if r > p + 1
  q = floor((p + r) / 2)

Pseudocode for merge subroutine: TODO

todo

Remark: Merging should be left-biased so that mergesort is a stable sort.

Merge sort is not in-place (requires extra space) and is not inline.

The running time of a divide-and-conquer algorithm can be analysed as follows:

Let be the running time on a problem of size .

If is small, , use constant-time brute force solution
Otherwise, divide the problem into subproblems, each the size of the original
Let the time to divide a size- problem be
Let the time to combine solutions be .

Then

Example (merge sort runtime analysis): We have , and .

Then

where , i.e.

TODO guess and check

TODO recursion tree unrolling

Theorem 2.2 (Master Theorem):

Suppose

then

Proof:

TODO

Example (Master theorem for merge sort):

For merge sort, , so we have , , so .

TODO integer multiplication, matrix multiplication examples

Remark: Suppose a recursion splits into two subproblem with size , and suppose that the division is and the combining is . Then we have , which isn’t directly expressible with the Master Theorem. We can substitute to get ; if we let then . We can then use the Master Theorem, with , giving that . Hence

TODO binary search

TODO th-order statistic

Theorem 2.3: The running time of every comparison-based sorting algorithm is , i.e. at best it is .

Proof:

In a descision tree of a comparison-based sorting algorithm on an input of length , there are leaves. Every binary tree of depth has at most leaves; so to get to we need a depth of at least .

Definition 2.4: If we know our elements come from a certain discrete interval, we can use counting sort: just count how often each element appears, then insert the relevant number of each elements. This is ( if ) but is not in-place; however we can design it to be stable.

Input: An array A of n elements with elements in the range [arrow(0)..k]
Output: An array B consisting of a sorted permutation of A

TODO

Definition 2.5: The Fast Fourier Transform (FFT) can be used to multiply two polynomials of degree in time.

Split each polynomial into 2 polynomials of even degree:

This is nice because each polynomial is an even function. We now have that
Represent each polynomial by a list of values at points, which can be computed in by Horner’s rule,

… TODO

Essentially, we split into two polynomials and recurse. Once we are in point-value form, we can multiply each point, all in , and then convert back to coefficient form (interpolation).

3. Data structures: heaps and queues

Remark: Unsorted arrays have insertion and finding the maximum.

Remark: Sorted arrays have insertion (find position by binary search but then shifting is linear) but finding the maximum.

Corollary 3.1: It is not possible to have a data structure with insertion and maximum extraction, as this would give a linear-time comparison-based sorting algorithm which we have seen isn’t possible.

Definition 3.2: A heap is a data structure that is a tree that is completely filled on all levels except the lowest, which is filled from the left to right.

The height of a tree is the longest simple path from the root to a leaf; a binary heap with nodes has height .

Heaps can be represented with arrays, with each level stored next to each other, read left-to-right and top-to-bottom. This is unambiguous because the tree is complete, so we know how many children each element has.

For a binary tree, if we have the root at A[0], the left child of A[i] is at A[2i + 1] and the right child is at A[2i + 2]. If arrays are 1-indexed, we instead get A[i]‘s children at A[2i] and A[2i + 1].

Definition 3.3: A max-heap is a heap that satisfies the max-heap property.

Definition 3.4: The max-heap property states that the key of any node (except the root) is less than or equal to the key of its parent.

Remark: The maximum element of a max-heap is at the root.

Remark: Min-heaps can be defined similarly.

Remark: Children of a node in a max-heap are not necessarily sorted.

To turn a heap into a max heap, we carry out the MaxHeapify algorithm.

Input: A 1-indexed array A and index i such that the subtrees of the node A[i] are max-heaps.
Output: A max heap with root at A[i].

MaxHeapify(A, i):
  n = A.size
  l = 2i
  r = 2i + 1
  if l <= n and A[l] > A[i]:
    largest = l
  else:
    largest = i
  if r <= n and A[r] > A[largest]:
    largest = r
  if largest != i:
    swap A[i] and A[largest]
    MaxHeapify(A, largest)

This is as we are bounded by the depth of the tree.

We carry out MaxHeapify on every node, starting from the bottom (skipping leaves - this skips about half of the vertices).

MakeMaxHeap(A):
  for i = ceil((n + 1) / 2) - 1 downto 1
    MaxHeapify(A, i)

TODO

Therefore the full procedure is definitely ; but in fact a tighter bound is TODO proof

To extract the maximum element from a max-heap, we extract the root, place the last vertex into the root, and carry out MakeMaxHeap. This is .

To insert an element, append to the end of the backing array, and carry out MaxHeapify recursively on the parents of the new vertex. This is .

Definition 3.5: Heap sort:

Build a max-heap
Starting from the root, swap the maximum with the final element in the backing array, and ignore the final node from now on (it remains in the backing array, but we decrement the heap size), then carry out MaxHeapify on the root (this is fine because the children are still max-heaps)
Repeat until the heap size is 1

This is worst case like merge sort, but is in-place like insertion sort.

Definition 3.6: A priority queue maintains a set of elements, each with an associated key. Max-priority queues give priority to elements with larger keys; min-priority keys are defined similarly.

Max-priority queues have the following operations:

insert(S, x, k) inserts element x with key k into S
maximuum(S) returns the element of S with the largest key
extractMax(S) removes and returns the element of S with the largest key
TODO

If we implement the priority queue as a heap, both insertion and extraction are .

TODO increasing the key value

4. Dynamic programming

Definition 4.1: Dynamic programming is an optimisation in which we define a sequence of subproblems such that:

The subproblems are ordered from smalles to largest
The largest is the one we want to solve
The optimal solution of a subproblem can be constructed from the optimal solutions of smaller subproblems (this property is optimal substructure)

We then solve from smallest to largest and store the solutions.

Example (Change-making): Suppose we have lots of coins of different denominations, , and want to give units of change.

Then .

More formally,

The running time is . This is polynomial in and (i.e. in the number of inputs), but is exponential in the size of the inputs.

Example (Knapsack problem): A burglar has a knapsack with capacity . There are itmes to pick from, of size/weight and value . We want to find the most valuable configuration of items to steal, assuming that there is either 1 of each item, or an unlimited quantity of each.

Assuming first an unlimited supply of each item:

Total running time is .

Where there is only one of each item:

The running time is .

But we can modify to make it not exponential, by rather than passing a set of indices, passing the largest index that is considered - since if we remove item , we should have an optimal solution for .

This formulation is stating that for each index , we either pick that item or discard it.

Now the running time is .

Example (Longest increasing subsequences): Task: find the longest increasing subsequences of a sequence; the elements in a subsequence don’t need to all be adjacent to each other in the original sequence. TODO

TODO edit distance

Example (Longest simple path): This is an example of a problem that does not have optimal substructure: the longest simple paths from to and from to might share a vertex, so cannot be combined. We would therefore need to store not only the length of the longest path of a subproblem, but also the path itself; so dynamic programming cannot be used for this.

However, if the graph is acyclic, the problem does have optimal substructure.

Example (Travelling salesperson problem): Given a complete undirected graphs with weights associated with each edge , find the Hamiltonian cycle with minimal total distance (sum of edge weights).

This is NP-complete, so we don’t know if there is a polynomial time solution, but we can verify a solution in polynomial time.

Brute force is very bad, but dynamic programming gives a better (but not polynomial) solution.

Subproblems: for every subset containing , and for every , find the shortest path that starts from , ends in , and passes only once through all the other nodes in . Define to be the length of that path.

Then

TODO

Example (A better algorithm for longest increasing subsequence): TODO

5. Graph decomposition

Remark: A directed acyclic graph can be abbreviated to dag.

Definition 5.1: An adjacency matrix of a graph is a matrix where

TODO

Remark: Given an adjacency matrix , is the number of walk of length from to .

If represents an undirected graph, is the degree of .

Remark: An adjacency list for a graph is a list of linked lists, one per vertex: the list for vertex holds the indices of vertices to which has an outgoing edge.

This has size

TODO

Definition 5.2: Depth-first search (DFS) is a linear-time algorithm that tells us what parts of a graph are reachable from a given vertex. It works for both digraphs and undirected graphs.

As soon as a new vertex is discovered, explore from it. As DFS progresses, we assign each vertex a colour:

not discovered yet (e.g. white)
discovered, but not fully explored yet (e.g. grey)
finished (e.g. black)

Input: A graph
Output: for each vertex , a backpointer (the predecessor of ) and two timestamps, the discovery time and finishing time .

DFS(V, E):
  for u in V:
    colour[u] = white
    pi[u] = null
  time = 0
  for u in V:
    if colours[u] = white:
      DFSVisit(u)

DFSVisit(u):
  time += 1
  d[u] = time
  colour[u] = grey
  for v in neighbours(v):
    if colour[v] = white:
      pi[v] = u
      DFSVisit(v)
  time = time + 1
  f[u] = time
  colour[u] = black

Note that .

Note that are dependent upon the order in which the vertices in the graph are visited, and on the order of the vertices in the adjacency/neighbour lists.

This has running time because each vertex is visited once, and DFSVisit(v) takes where is the degree of ; .

If we map out the edges visited by DFS, we get a forest that we call the DFS forest, consisting of DFS trees (note that this is a slight abuse of notation because trees and forests are usually undirected).

Theorem 5.3 (Parenthesis theorem): For all , exactly one of the following holds:

or and neither of and are descendants of each other in the DFS forest
and is a descendant of in a DFS tree
and is a descendant of in a DFS tree

Using shorthand TODO

Corollary 5.4: Vertex is a descendant of vertex iff .

TODO

Definition 5.5: We can classify edges of the searched graph:

Tree edges are edges of the DFS forest; is white when is explored;
Back edges lead from a node to an ancestor in the DFS tree. is grey when is explored;
Forward edges lead from a node to a non-child descendant in the DFS tree, i.e. they lead to a descendant but are not edges in the DFS forest. is black when is explored;
Cross edges do not lead to an ancestor or descendant; this could be between nodes in the same tree or in a different tree. is black when is explored.

TODO discovery and finishing times vs edge classification

Theorem 5.6: A directed graph has a cycle iff has a back edge.

Proof:

: If is a back edge, then there is a cycle consisting of the back edge and the path in the DFS tree from to .

: TODO

Definition 5.7: A topological sort of a DAG is a total ordering of vertices, , such that if then .

TopologicalSort(V, E):
  f[v | v in V] = DFS(V, E)
  return V sorted according to decreasing f[v]

TODO correctness of topological sort

Remark: When DFS is applied to an undirected graph, the DFS trees correspond to the connected components of the graph. These can be identified by the discovery and finishing times.

Definition 5.8: In a digraph, two vertices are strongly connected if there is a path from to and from to .

Definition 5.9: A strongly connected component (SCC) of a digraph is a maximal set of vertices such that for all , and are strongly connected.

Lemma 5.10: Given distinct strongly connected components of a digraph , , if there is a path from to in , then there is not a path from to in .

Proof: Suppose there is a path from to . Then for every , there exists a path from to via the path connecting and . If there were a path from to , there would be a path from to via that path, so and would be strongly connected and would be part of the same SCC; contradiction. Therefore there is no path from to .

Definition 5.11: The SCC graph of a digraph is the graph , such that

has one vertex for every SCC in
iff there exist such that there is a path from to .

Remark: From Lemma 5.10, it follows that any SCC graph is a DAG.

Definition 5.12: We extend discovery and finishing times for sets of vertices :

, the earliest discovery time amongst ;
, the latest finishing time amongst .

Definition 5.13: We say that there is an edge between SCCs if s.t. .

Lemma 5.14: Let be distinct SCCs in . If there is an edge from to , then .

Proof: TODO same as correctness of topological sort

Lemma 5.15: If and , then there cannot be an edge from to .

Proof: Equivalent to Lemma 5.14 by contraposition.

Definition 5.16: For a digraph , the tranpose of is where

Remark: We can create in time using adjacency lists.

Theorem 5.17: and have the same SCCs.

Remark: For distinct SCCs in , if , then

in there cannot be an edge from to
in there cannot be an edge from to .

This means that running DFS on starting from the SCC with the largest finishing time, we will not find edges to any other SCC. TODO why? intuition

Definition 5.18: The strongly connected components algorithm identifies all SCCs in a digraph :

run DFS on
run DFS of , exploring in order of decreasing finishing time of the first DFS
the trees in the DFS forest of the second DFS correspond to the SCCs of

6. Paths in graphs - finding shortest paths

Definition 6.1: Consider a digraph with weight function .

The weight or length of a path TODO

Definition 6.2: TODO FIFO queue

Definition 6.3: Considering first the simple case where all weights are equal to 1. Let be the minimum number of edges on a path from to , if such a path exists, otherwise .

Given a source vertex , breadth-first search (BFS) finds all vertices s reachable from , the shortest path lengths , and the shortest paths from to the s.

Idea:

send out waves of increasing length from the source
when a vertex is reached, out it in the (FIFO) queue
when all of s neighbours have been reached, remove from

TODO the algorithm

Remark: BFS takes time .

Definition 6.4: The BFS tree consists of vertices reachable from and edges where . TODO what is ?

More formally, the BFS tree is the graph

Lemma 6.5: TODO there exists a path of length

Corollary 6.6: TODO lower bound on

Lemma 6.7: If is enqueued before , then . Furthermore, if is the queue at any given step of BFS, then

Proof: TODO by induction

Lemma 6.8: TODO upper bound on

Theorem 6.9: .

Proof: TODO just follows from previous lemmas; also consider .

Theorem 6.10: If , then is the predecessor of TODO

Remark: BFS uses a queue but DFS uses a stack (possibly implicitly).

Definition 6.11: Dijkstra’s algorithm is essentially BFS for weighted graphs. It solves the “single-source shortest path” problem for non-negative weights.

It uses a min-priority queue rather than a FIFO queue, with keys given by the shortest-path weight estimates .

At termination:

is the distance from to
is the TODO

TODO algorithm

Invariants for the while loop:

TODO initialisation

Then suppose that holds before an iteration of the while loop. For any vertex , either did not change, or it did change. If did not change, then by . Otherwise, let be the vertex selected by ExtractMin in this iteration of the loop. Then we have that

Now suppose that holds before an iteration. Then for any , either:

, then by
and there is no path from to , then
and there is a path from to . Then there is also a shortest path TODO

TODO finish maintenance of

At termination, , so .

TODO remark about optimal substructure

Lemma 6.12 (Convergence property): TODO

Theorem 6.13: For , is the predecessor of on a shortest path from to .

Proof: TODO

Remark: Total running time of Dijkstra’s algorithm is . TODO proof/derivation

Definition 6.14: The Bellman-Ford algorithm takes a graph with weight function (possibly negative), and returns false if there exists a negative-weight cycle reachable from , otherwise true, along with for each .

TODO algorithm

This is .

TODO invariants

TODO full topological sort algorithm for DAGs to find shortest path

TODO all-pairs shortest paths

Definition 6.15: Floyd-Warshall algorithm

Suppose we have vertex set .

Let be the length of the shortest path from to , all of whose intermediate nodes are in the interval

TODO

TODO algorithm (just dynamic programming as usual) TODO note why the is ok (because we also have a in the last argument so we’ve already calculated it)

Running time is .

7. Greedy algorithms

Definition 7.1: In a greedy algorithm, at each step we greedily make the choice that offers the greatest immediate benefit (the greedy choice). This choice is not reconsidered at subsequent steps.

Remark: Dijkstra’s algorithm is an example of a greedy algorithm.

Remark: The greedy approach doesn’t always work, but when it does it’s nice because it’s simple and doesn’t require keeping track of subproblem solutions.

Definition 7.2: Given a connected undirected graph with weights , the minimum spanning tree (MST) is the connected acyclic subgraph that has minimum weight and connects all the vertices of . For positive weights, a minimum spanning tree always exists (because we can just look at all possible spanning trees and take the minimum).

We will identify a tree within by its edge set .

Lemma 7.3:

An undirected graph is a tree iff every pair of vertices is connected by a unique (simple) path.

Proof: A connected undirected graph has a cycle iff there exist two vertices connected by distinct paths.

Lemma 7.4: If a graph is a tree, than .

Proof:

TODO more formally

Definition 7.5: A spanning tree of a graph is a subgraph with edge set such that is a tree, reaches all vertices of , and TODO .

Remark: A spanning tree is:

a minimal connected subgraph (removing an edge disconnects it)
a maximal acyclic subgraph (adding an edge creates a cycle).

Remark: A graph can have multiple MSTs, but they must all have the same number of edges (because they are spanning trees).

The algorithm for building an MST:

start from ; this is a (trivial) subset of an MST
Add edges to , maintaining that is a subset of an MST
Stop when no edge can be added to any more, then is an MST.

Let be a subset of an MST . We say that an edge is safe to add to iff is a subset of some MST.

Definition 7.6: A cut is a partition of the vertex set into and .

An edge crosses a cut if one endpoint is in and the other is in .

A cut respects if no edge in crosses the cut.

An edge crossing a cut is light if its weight is minimum over all edges that cross that cut.

Lemma 7.7: Let be a subset of some MST. If is a cut that respects , and is a light edge crossing the cut, then is safe for .

Proof: Let be an MST that includes . Since is a tree, it contains a unique path between and .

Path must cross the cut at least once. ( TODO why?) Let be an edge of that crosses the cut. Adding and deleting creates a tree .

is a tree because we create a cycle and then remove one edge from it, keeping the graph connected. is minimum because the weight of is . contains because was included in , and did not contain .

Definition 7.8: A disjoint-set data structure keeps track of a family of dynamic disjoint sets . Each set is identified by some arbitrary representative.

It has three operations:

MakeSet(x) adds to
FindSet(u) returns the representative of the set containing
Union(x, y) removes from and adds to .

A good representation is as a disjoint-set forest, when elements are stored in an array with pointers to a parent leading to a root of each set. Keeping this balanced gives amortised complexity where is the number of operations and is the number of MakeSet operations, and is the inverse Ackermann function, which is extremely slowly-growing and is at most 4 for any feasible inputs.

Definition 7.9: Kruskal’s algorithm:

Start from . At each step, pick the edge with the smallest weight and add it to if it does not create a cycle.

To do this, we need to keep track of the connected components, for instance by using a disjoint-set data structure.

Kruskal(V, E, w):
  A = nothing
  for each v in V:
    MakeSet(v)
  sort E into increading order by weight w
  for each edge (u, v) from sorted edge list:
    if FindSet(u) != FindSet(v):
      A = A union {(u,v)}
      Union(u, v)
  return A

There are MakeSet operations, and total disjoint-set operations. So the disjoint-set operations take (for a good disjoint-set implementation). So the sorting of the edges dominates. Therefore Kruskal’s algorithm has time complexity .

Definition 7.10:

Definition 7.11:Prim’s algorithm

picks a vertex and grows the tree from that vertex.

We set and . Then at every step, find a light edge connected to . Update to and to .

We use a min-heap/priority queue such that , the key of is the minimum weight of any edge where , and if no such edge exists set the key to be .

To find a light edge crossing the cut , take the minimum from the queue. If v = ExtractMin(Q), then there exists a light edge for some . The vertex can be retrieved by a backpointer: when the key of is to to , we define .

Prim(V, E, w, r):
for each u in V:
  key[u] = oo
  pi[u] = null
  Insert(Q, u)
DecreaseKey(Q, r, 0)
while Q != nothing:
  u = ExtractMin(Q)
  for each v in Adj[u]:
    if v in Q and w(u,v) < key[v]:
      pi[v] = u
      DecreaseKey(Q, v, w(u,v))

Then gives us the edges.

Remark: TODO Prim vs Dijkstra

Remark: TODO running time