MT2025 Linear Algebra Lecture Notes


Remaining TODOs: 30

Relevant reading:

For chapters 1-5: Jim Hefferon. Linear Algebra.

For chapter 6: Lloyd N. Trefethen & David Bau. Numerical Linear Algebra.


1 Linear systems

1.i Solving Linear Systems

Definition 1.1.1: A linear combination of variables has the form with .
Definition 1.1.2: A linear equation has the form . An -tuple is a solution of that equation, or satisfies that equation, if substituting for gives a true statement.

In a system of equations , the th row is denoted with the Greek letter rho as .

Definition 1.1.3: Elementary row operations are operations that can be carried out on a system of linear equations without changing the solution set.

These operations, all affecting row , are:

  1. swapping two rows, , where ;
  2. scaling a row, where ;
  3. adding one row to another row, , where and .

Lemma 1.1.4: Elementary row operations preserve the solution set of a linear system.

Proof:

Consider some system of linear equations

For the tuple to satisfy this system of equations, we must have that and that , and so on; swapping the order of two of the linear equations in the system does not change the validity of this solution, as logical and is commutative. Hence operation (i) is an elementary row operation.

If we multiply a linear equation by some non-zero scalar , that equation still holds true, and so the solution set to the system is still valid. Hence operation (ii) is an elementary operation.

Any solution set that satisfies the system of linear equations must satisfy each individual equation. If we add two equations together, the solutions will still hold, so operation (iii) is an elementary row operation.

Definition 1.1.5: A leading variable is the first variable in an equation with a non-zero coefficient.
Definition 1.1.6: A system of equations is in echelon form if each row’s leading variable is to the right of the leading variable in the row immediately above it, with the exception of the first row, and any rows with all-zero coefficients are at the bottom of the system.
Definition 1.1.7: In an echelon form linear system, any variables that do not appear as leading variables are free variables . These can be used as parameters to describe a solution set.

Example: If we have a system of equations

then and are free variables. We can parametrise the solution set using and as parameters to say that

but it is more clear to write and as new variables:

Definition 1.1.8: A linear equation is homogeneous if the constant is zero.

The Gaussian elimination of a system of linear equations follows the same operations as the Gaussian elimination of its associated system of homogeneous linear equations.

It is helpful to study homogeneous systems because these always have a particular solution, the zero vector.

In general, we can say that the general solution to a system of linear equations is the sum of a particular solution and the solution to the associated system of homogeneous equations.

Lemma 1.1.9: The zero vector is always a particular solution of a homogeneous system of linear equations.

Proof:

Take substituting for in each equation in a homogeneous system gives which is trivially true and so is a particular solution of any homogeneous system of linear equations.

Lemma 1.1.10: For any homogeneous linear system of equations, there exist vectors such that the solution set of the system is

where is the number of free variables in an echelon form of the system.

This can be rephrased as: every leading variable in a homogeneous system can be expressed as a linear combination of free variables.

Proof:

Consider some homogeneous system that has been transformed into echelon form, ignoring any trivially true equations (). If the system consists entirely of tautologies, the lemma is trivially true as there are no leading variables which need be expressed.

We will denote the index of the leading variable in a row as .

Consider the bottom equation, row ,

where . Here, are free variables as there are no rows below this in which they could be leading. The leading variable can be expressed in terms of the free variables:

Suppose that every leading variable for for arbitrary can be expressed as a linear combination of free variables. Then the leading variable of the th equation can be expressed as a linear combination of free variables, since every other variable in the th equation is either a free variable, or is a leading variable in a lower equation and so can be expressed as a linear combination of free variables itself.

Hence if the th leading variable can be expressed as a linear combination of free variables, so can the th equation. Hence by induction, every leading variable can be expressed as a linear combination of free variables.

Definition 1.1.11: The set of vectors is generated by by or is spanned by by the set .

Lemma 1.1.12: For a linear system and a particular solution , the system is satisfied by the solution set

Proof:

Suppose that satisfies the system. Then satisfies the associated homogeneous system since for all

where is the constant of the th equation and and are the th components of and respectively.

Suppose that we have a vector that satisfies the associated homogeneous system, then satisfies the system because, for all ,

Theorem 1.1.13: The solution set of any linear system has the form

where is any particular solution and is the number of free variables in the echelon form of the system.

Proof:

This follows from Lemma 1.1.10 and Lemma 1.1.12.

Corollary 1.1.14: A linear system of equations has either no solutions, one solution, or infinitely many solutions.

1.ii Linear Geometry

Definition 1.2.1: A square matrix is nonsingular if it is the matrix of coefficients of a homogeneous system with a unique solution. Otherwise (i.e. the homogeneous system has infinitely many solutions), it is singular .

Definition 1.2.2: The line (in ), plane (in ), or -dimensional linear surface , -flat or -dimensional hyperplane (in is a set of the form

where and . More specifically, it is comprised of the endpoints of these vectors when their tips are at the origin.

Remark: That the column vectors associated with a parameter lie entirely within the line, not just their tips.

Definition 1.2.3: The length of a vector is defined as

Definition 1.2.4: For any nonzero vector , we normalise to unit length by carrying out .

Definition 1.2.5: The dot product , scalar product or inner product of two vectors and is defined as

Remark: That .

Theorem 1.2.6 (Cauchy-Schwartz inequality): For any ,

with equality only if there exists such that .

Proof:

which is true because is always positive, so is also always positive.

Equality occurs iff .

If , then .

If , then

Hence equality occurs iff .

Theorem 1.2.7 (Triangle inequality): For any , with equality iff .

Proof:

Both sides of the inequality are positive, so squaring both sides does not change the validity of the inequality.

which is true by the Cauchy-Schwartz inequality (Theorem 1.2.6).

Equality occurs iff , by the Cauchy-Schwartz inequality again.

Definition 1.2.8: The angle between two vectors is defined as

Remark: That this is always defined since, following from the Cauchy-Schwartz inequality,

Remark: That this always gives an angle in the range .
Remark: That this corresponds to our intuition about angles.

Remark: The formula in Definition 1.2.8 follows from the Law of Cosines.

Proof:

Consider two vectors , and call the angle between them . Using the Law of Cosines on the triangle with sides and gives that

The left side gives

while the right side gives

Combining the two sides and cancelling like terms gives

which simplifies and rearranges to give Definition 1.2.8.

Remark: That the Law of Cosines is just an extension of Pythagoras’ theorem:


Corollary 1.2.9: Vectors from are orthogonal (perpendicular) iff their dot product is zero (since ). They are parallel iff the absolute value of their dot product is the product of their length (since ).

1.iii Reduced echelon form

Definition 1.3.1: In reduced echelon form , every leading variable has a coefficient of 1, and is the only nonzero entry in its column.
Remark: That for free variables, there are no restrictions on coefficients, or other entries in that column.

This is useful because we can just read the solution off.

Example: Consider the following system matrix in reduced echelon form:

The solution set of this system is simply

To get to reduced echelon form, we use Gauss-Jordan reduction , whereby we first carry out Gaussian elimination, then scale each row such that the leading variables have coefficients of 1, then eliminate any non-leading and non-free variables in each row by working upwards.

Lemma 1.3.2: Elementary row operations are reversible.

Proof:

has the inverse of . has the inverse of for . has the inverse , where .

Definition 1.3.3: Matrices that reduce to each other (using elementary row operations) are interreducible , equivalent with respect to the relationship of row reducibility, or row equivalent . They are said to be in the same row equivalence class .
Remark: Row equivalence classes are disjoint; any matrix belongs to exactly one row equivalence class.

Lemma 1.3.4: Between matrices, ‘reduces to’ is an equivalence relation.

Proof:

To be an equivalence relation, ‘reduces to’ must satisfy reflexivity, symmetry and transitivity.

Reflexivity is trivial since we can reduce through carryig out no operations.

Symmetry follows from Lemma 1.3.2.

The relation is transitive because if we can reduce and , we can carry out the operations in order to go from to reduce .

Lemma 1.3.5 (Linear combination lemma): A linear combination of linear combinations is a linear combination.

Proof:

Given the set of linear combinations , consider the linear combination of these

which is a linear combination.

Corollary 1.3.6: Where one matrix reduces to another, each row of the second is a linear combination of the first.

Lemma 1.3.7: In an echelon form matrix, no nonzero row is a linear combination of the other nonzero rows.

Proof:

Consider any row of the matrix. The leading entry in that row is one, but the entry in that column in every other row is equal to zero (by the definition of reduced echelon form), so there is no linear combination of other rows that can make this row.

Remark: The thing that Gaussian elimination really eliminates is any linear relationships between rows.
Theorem 1.3.8: Each matrix is row equivalent to a unique (“canonical”) reduced echelon form matrix.

Proof:

We proceed by induction on the number of columns, , for a fixed arbitrary number of rows .

Let .

If is the zero matrix, then it is in reduced echelon form and is not row-equivalent to any other matrix (this follows from Corollary 1.3.6). Otherwise, when reduced to reduced-echelon form, the only nonzero entry must be a 1 in the first row. Regardless, the reduced echelon form matrix is unique.

Now assume that, for some , all matrices for have a unique reduced echelon form matrix, and consider .

Suppose that reduces to two distinct reduced echelon form matrices, and . Let be the matrix consisting of the first columns of . Any sequence of operations that reduces to reduced echelon form must also reduce to reduced echelon form, and by the inductive hypothesis, reduces to a unique reduced echelon form matrix. Therefore, and must differ in the th column.

We write the linear system with matrix of coefficients as

the systems with matrices of coefficients and are written similarly. These systems must all have the same solution sets, since carrying out elementary row operations on linear systems does not change the solution set. With and differing only in the th column, suppose they differ in row . We subtract row of the systems represented by and to get that . Since , .

Therefore, is not a free variable, so the th columns of and must contain the leading entry of some row, since any column that does not have a leading entry is associated with a free variable.

Because the first columns of and are equal, a leading variable in the th column must be in the same position in and , and must be equal to 1; therefore, and are equal, so reduces to a unique reduced echelon form matrix.


Remark: To decide if two matrices are row-equivalent , we reduce them to reduced echelon form and see if they are equivalent (this works because equivalence relations are transitive).

2 Vector spaces

2.i Definition of vector space

Definition 2.1.1: A vector space (over ) consists of a set along with two operations ‘’ and ‘’ subject to the condition that, for all :

  1. is closed under addition i.e.
  2. addition is commutative i.e.
  3. addition is associative
  4. there is an additive identity, i.e.
  5. every element of has an additive inverse, i.e.
  6. is closed under scalar multiplication i.e.
  7. scalar multiplication distributes over addition of scalars, i.e.
  8. scalar multiplication distributes over vector addition, i.e.
  9. ordinary scalar multiplication associates with scalar multiplication, i.e.
Remark: A vector space is space in which linear combinations can be made.
Example: For all , is a vector space under the natural operations.
Example: For all , the set of th degree polynomials, is a vector space under the natural operations.
Example: The set of matrices is a vector space.
Remark: That is not a vector space because it does not have an additive identity, but is a vector space.
Definition 2.1.2: A vector space with one element is a trivial vector space .

Lemma 2.1.3: In any vector space , for any and , we have

Proof:

For (i), note that . Add to both sides the additive inverse such that .

(ii): .

(iii)

Definition 2.1.4: For any vector space, a subspace is a subset that is itself a vector space, under the inherited operations.
Remark: Any vector space has a trivial subspace .
Remark: Any vector space has itself for a subspace.
Definition 2.1.5: A subspace that it not the entire space is a proper subset .

Lemma 2.1.6: For a nonempty subset of a vector space, under the inherited operations the following are equivalent statements

  1. is a subspace of that vector space
  2. is closed under linear combinations of pairs of vectors
  3. is closed under linear combinations of any number of vectors
Remark: It is often easier to verify closure if the solution set is parametrised, e.g. rather than , use .
Definition 2.1.7: The span (or linear closure ) of a nonempty subset of a vector space is the set of all linear combinations of vectors from . .
Definition 2.1.8: .

Lemma 2.1.9: In a vector space, the span of any subset is a subspace.

Proof:

If the subset is empty, its span is the trivial subspace by definition. Otherwise, for two elements of , , where and ,

which is a linear combination of elements of and so is a member of , hence is closed under linear combinations of pairs of elements; by Lemma 2.1.6, is a subspace.

Remark: To express a subspace as a span, parametrise then the span is of the set containing the vector coefficients of each parameter.
Remark: Spans are not unique.

2.ii Linear Independence

Definition 2.2.1: In any vector space, a set of vectors is linearly independent if none of its elements is a linear combination of the others from the set. Otherwise the set is linearly dependent .
Definition 2.2.2: If a set is linearly dependent, its vectors are in a linear relationship .

Lemma 2.2.3: A subset of a vector space is linearly independent iff among its elements the only linear relationship is the trivial one, .

Proof:

If is linearly independent, there is no vector that can be expressed as a linear combination of other elements of , so there is no linear relationship between elements of with non-zero coefficients, i.e. the only linear relationship is the trivial one.

If is linearly dependent, there exists a vector that can be represented as a linear combination of other elements of , so there is a linear relationship among the elements of with at least one non-zero coefficient, being that of . Hence by contraposition, if the only linear relationship among elements of is the trivial one, is linearly independent.

Lemma 2.2.4: Suppose is a vector space, is a subset of that space, and , then

Proof:

: Suppose , then since ; the implication follows from contraposition.

: Suppose , then for . Then for any ,

so . Therefore, . Note also that clearly , so by double inclusion .

Corollary 2.2.5: For , iff is dependent on other vectors in .
Corollary 2.2.6: A set is linearly independent iff for any , .

Lemma 2.2.7: Suppose that is linearly independent and that , then is linearly independent iff .

Proof:

Suppose that , so for . Since , it is not equal to any of the s so this is a nontrivial linear dependence among elements of . Therefore, is not linearly independent.

Now suppose that is linearly dependent, so there exists a nontrivial combination . By assumption, is linear independent, so . Therefore, we can rearrange the equation to find that , so .

Therefore we have that is linearly dependent iff , so is linear independent iff .

Corollary 2.2.8: In a vector space, any finite set has a linearly independent subset with the same span.

Proof:

For a finite set , if is linearly independent then the statement is trivially true as . Note that if is a singleton set, it is trivially linearly independent.

Otherwise, there exists an which can be expressed as a linear combination of the other elements of ; by Corollary 2.2.5, has the same span as . , and since is finite we can keep removing dependent elements until we either reach a nontrivial independent set with the same span, or we reach a singleton set, which is trivially independent.

Corollary 2.2.9: A subset is linearly dependent iff some can be represented as a linear combination of other elements in listed before it.

Proof:

Let . Then if is linearly dependent, there exists a first such that is linearly dependent, i.e. .

Lemma 2.2.10: Any subset of a linearly independent set is also linearly independent. Any superset of a linearly dependent set is linearly dependent.

2.iii Basis and dimension

Definition 2.3.1: A basis for a vector space is a sequence of vectors that is linearly independent and that spans the space. Denoted .

Definition 2.3.2: For any ,

is the standard basis or natural basis . We denote these vectors . (Sometimes the vectors in are denoted .)

Theorem 2.3.3: In any vector space, a subset is a basis iff each vector in the space can be expressed as a linear combination of elements of the subset in exactly one way.

Proof:

A sequence is a basis iff its elements form a spanning subset of the space, and is linearly independent. A subset is spanning iff each vector in the space can be expressed as a linear combination of elements in the subset, so we need only show that a spanning subset is linearly independent iff every vector in the space has a unique expression as a linear combination of elements of the subset.

Consider two such expressions of a vector , and for a subset . These are equal by transitivity of equality, so we can subtract them to get that . The s equal the s iff for , which holds iff is linearly independent, since the only linear dependence amongst its elements is the trivial one.

Definition 2.3.4: In a vector space with bases the representation of with respect to is the column vector of the coefficients used to express as a linear combination of the bases vectors:

where and .

are the coordinates of with respect to .

Remark: Any has

Lemma 2.3.5: If is an -element basis of a vector space , then for any set of vectors ,

Proof:

Take a basis and suppose is the th column of a matrix for such that etc.

Then we can substitute these into to give

Because elements of bases are linearly independent, this only holds if . This gives us a linear system which we can re-express with column vectors, giving

Note that not only does the relationship hold, but the coefficients are the same.

Definition 2.3.6: A vector space is finite-dimensional if it has a basis with only finitely many vectors.

Theorem 2.3.7: In any finite-dimensional vector space, all bases have the same number of elements.

Proof:

Consider some finite-dimensional vector space; by definition, it has a finite basis. Pick one finite basis of minimal size, and, for the sake of contradiction, take any other basis , with , noting that since is minimal.

Assume that the elements of are unique (otherwise it would not be a basis), then the only linear relationship amongst the s must be the trivial one. By Lemma 2.3.5, the only linear relationship amongst the representations of the s with respect to is the trivial one.

A linear relationship amongst the representations of the s with respect to can be represented by a homogeneous system of linear equations, with unknowns. Since there are more unknowns than equations, and the system is homogeneous, there are infinitely many solutions and so there must be more than just the trivial linear relationship among the s. Therefore, the s are not linearly independent, so is not a basis; this is a contradiction, so all bases of must be the same size.

Definition 2.3.8: The dimension of a vector space is the number of vectors in any of its bases.
Corollary 2.3.9: No linearly independent set can have a size greater than the dimension of the enclosing space, assuming a finite-dimensional space.

Corollary 2.3.10: Any linearly independent set can be expanded to make a basis.

Proof:

Take a linearly independent set of a vector space .

If is not already a basis for , then it must not span the space. Therefore we can add a vector that is not in to , and retain linear independence by Lemma 2.2.7. Keep adding until the resulting set does span the space, which will occur in a finite number of steps by Corollary 2.3.9.

Corollary 2.3.11: Any spanning set can be shrunk to a basis.

Proof:

Take a spanning set . If then it is already a basis (and the enclosing space must be trivial). If then it can be shrunk to the empty basis without changing its span.

Otherwise, s.t. and we can form a basis . If then we are done. Otherwise, s.t. , so we can append to . By Lemma 2.2.7, is still linearly independent. We can then check if , and keep repeating this process until the spans are equal, which must happen in a finite number of steps.

Corollary 2.3.12: In an -dimensional space, a set composed of vectors is linearly independent iff it spans the space.

Proof:

A set composed of vectors is linearly independent iff it is a basis: the “if” direction is clear, and the other directions holds because any linearly independent set can be expanded to a basis, but it already has elements so must already be a basis.

A subset of vectors spans the space iff it is a basis: again the “if” direction is clear, and the other direction holds because any spanning set can be shrunk to a basis, but it already has elements so must already be a basis.

Definition 2.3.13: The row space of a matrix is the span of the set of its rows. The row rank is the dimension of this space, the number of linearly independent rows.

Lemma 2.3.14: If two matrices and are related by a row operation, , or for , then their row spaces are equal.

Proof:

By Corollary 1.3.6, each row of is a linear combination of the rows of ; that is, each row of is in the rowspace of . It follows by the Linear Combination Lemma that Rowspace() Rowspace(), since each element of the rowspace of is a linear combination of the rows of , so is a linear combination of a linear combination of the rows of , i.e. just a linear combination of the rows of . Because elementary row operations are reversible (by Lemma 1.3.2), the same argument can be applied to show that Rowspace() Rowspace(); by double-inclusion, the rowspaces are equal.

Corollary 2.3.15: Row-equivalent matrices have the same row space and therefore the same row rank.

Lemma 2.3.16: The nonzero rows of an echelon form matrix make up a linearly independent set.

Proof:

This is a restatement of Lemma 1.3.7 using new terminology.

Definition 2.3.17: The column space of a matrix is the span of the set of its columns. The column rank is the dimension of the column space, the number of linearly independent columns.
Definition 2.3.18: The transpose of a matrix is the result of interchanging its rows and columns, so that the column of the matrix is row of and vice-versa.

Lemma 2.3.19: Row operations do not change the column rank.

Proof:

By Lemma 1.1.4, row operations do not change the solution space of a system, so they must preserve linear relationships among columns. Therefore, they do not change the number of linearly independent columns, so they preserve column rank.

Remark: Unlike the row space, the column space might change under row operations; it is only the column rank that is invariant under row operations.

Theorem 2.3.20: For any matrix, the row rank and column rank are equal.

Proof:

Any matrix is row-equivalent to a reduced echelon form matrix, with the same row and column rank as by Lemmas 2.3.15 and  2.3.19. The row rank of is the number of nonzero rows in the reduced echelon form matrix, which is equal to the number of leading entries in the reduced echelon form matrix; the column rank of is also equal to the number of leading entries, since the columns containing leading entries form a subset of the standard basis, so all other columns can be expressed as a linear combination of the rows containing leading entries.

Definition 2.3.21: The rank of a matrix is its row rank or column rank.

Theorem 2.3.22: For linear systems with unknowns and with matrix of coefficients , the rank of is iff the vector space of solutions of the associated homogeneous system has dimension .

Proof:

The rank of is iff the reduced echelon form matrix of has leading variables. There must be free variables in the reduced echelon form matrix, so the solution space of the associated homogeneous system has dimension .


Corollary 2.3.23: Where the matrix is , the following statements are equivalent:

  1. the rank of is
  2. is nonsingular
  3. the rows of form a linearly independent set
  4. the columns of form a linearly independent set
  5. any linear system whose matrix of coefficients is has one and only one solution.

Proof:

TODO

Definition 2.3.24: Where are subspaces of a vector space, their sum is the span of their union: .

Definition 2.3.25: The concatenation of the sequences adjoins them into a single sequence.

Definition 2.3.26: A collection of subspaces is independent if no nonzero vector from any is a linear combination of vectors from the other subspaces.

Lemma 2.3.27: let be a vector space that is the sum of some of its subspaces, . Let be bases for these subspaces. Then the following are equivalent:

  1. The expression of any as a combination where is unique;
  2. The concatenation is a basis for ;
  3. Among nonzero vectors from different , every linear relationship is trivial, i.e. the s are independent.
Proof: TODO ? possibly out of scope
Definition 2.3.28: A vector space is the direct sum (or internal direct sum ) of its subspaces if and the collection is independent. We write .

Corollary 2.3.29: The dimension of a direct sum is the sum of the dimensions of its summands.

Proof:

In Lemma 2.3.27, the number of vectors in the concatenated basis is the same as the sum of the number of vectors in each sub-basis.

Definition 2.3.30: When a vector space is the direct sum of two of its subspaces, they are complements .

Example: In , the - and -axes are complements.

Lemma 2.3.31: A vector space is the direct sum of two of its subspaces iff and i.e. their intersection is trivial.

Proof:

If , then by definition of direct sum, ; furthermore, since and are independent, the linear combination for can only have the trivial solution if we want a member of on the left side and a member of on the right, w.l.o.g.

Conversely, if then if we express as a linear combination of , then must also be in , as linear combinations of elements of a vector space are also in that vector space. But the intersection of and is trivial, so and satisfy the definition of independence; since they sum to , they form a direct sum of .

3 Maps between spaces

3.i Isomorphisms

Definition 3.1.1: An isomorphism between two vector spaces and is a map such that

  1. is a bijection/correspondence;
  2. preserves structure : if then

    and if and then

(We write , read “V is isomorphic to W”, when such a map exists.)

To verify that is an isomorphism, show that:

  • is injective/one-to-one, i.e. ;
  • is surjective/onto, i.e. ;
  • preserves addition, i.e. ;
  • preserves scalar multiplication, i.e. .
Remark: Structure preservation is special. Many bijections do not preserve addition and scalar multiplication.
Definition 3.1.2: An automorphism is an isomorphism of a space with itself.
Example: A dilation map that multiplies all vectors by a nonzero scalar is an automorphism of .
Example: A rotation or turning map that rotates all vectors through an angle is an automorphism.
Remark: Automorphisms are relevant to study, despite seeming trivial, because they formalise the intuitive notion that space near the -axis is like space near the -axis.

Lemma 3.1.3: An isomorphism maps the zero vector to the zero vector.

Proof: Where is an isomorphism, fix some . Then

Lemma 3.1.4: For any map between vector spaces, the following statements are equivalent:

  1. preserves structure
  2. preserves linear combinations of two vectors
  3. preserves linear combinations of any finite number of vectors

Proof:

TODO (in Hefferon)

Lemma 3.1.5: The inverse of an isomorphism is also an isomorphism.

Proof:

Suppose that by . Because isomorphisms are bijections, has an inverse .

Suppose that . Because it is an isomorphism, is surjective and . Then

Therefore preserves structure and hence satisfies the necessary conditions to be an isomorphism.

Theorem 3.1.6: Isomorphism is an equivalence relation between vector spaces.

Proof:

Isomorphism is reflexive, since the identity function is an isomorphism, since it is clearly a bijection and preserves structure:

Symmetry holds by Lemma 3.1.5.

For transitivity, suppose that and , and let and be isomorphisms. Then is bijective, and structure preserving:

Lemma 3.1.7: If spaces are isomorphic then they have the same dimension.

Proof:

TODO (in Hefferon)

Lemma 3.1.8: If spaces have the same dimension then they are isomorphic.

Proof:

TODO (in Hefferon)

Theorem 3.1.9: Vector spaces are isomorphic iff they have the same dimension.

Proof:

The proof follows from Lemma 3.1.7 and Lemma 3.1.8.

Corollary 3.1.10: Every finite-dimensional vector space is isomorphic to exactly one of the .

Thus the real space form a set of canonical representatives of the isomorphism classes.

3.ii Homomorphisms

Definition 3.2.1: A homomorphism or linear map between vector spaces is a function that preserves addition and scalar multiplication. Such a function is said to satisfy linearity .

Lemma 3.2.2: A linear map maps the zero vector to the zero vector.

Proof:

The proof is the same as the proof to Lemma 3.1.3.

Lemma 3.2.3: The following statements are equivalent for any map between vector spaces:

  1. is a homomorphism;
  2. ;
  3. .

Proof:

TODO (in Hefferon)

Example: The inclusion map is a homomorphism.
Example: The derivative is a homomorphism on polynomial spaces (and in general).

Definition 3.2.4: The trace of a square matrix is the sum down the upper-left to bottom-right diagonal. So is:

Remark: is a linear map.

Theorem 3.2.5: A homomorphism is determined by its action on a basis: if is a vector space basis , if is a vector space, and if (not necessarily distinct), then there exists a homomorphism from to sending each to , and that homomorphism is unique.

Proof:

TODO (in Hefferon)

Definition 3.2.6: Let and be vector spaces and let be a basis for . A function defined on that basis is extended linearly to a function if, for all s.t. , the action of the map is .

Example (Derivation of (rotation)):

Extending linearly:

Definition 3.2.7: A linear map from a space into itself is a linear transformation .

Lemma 3.2.8: For vector spaces and , the set of linear functions from to is itself a vector space, a subspace of the space of all functions from to . We denote the space of linear maps from to by .

Alternatively: a linear combination of homomorphisms is also a homomorphism.

Proof:

TODO (in Hefferon)

Lemma 3.2.9: Under a homomorphism, the image of any subspace of the domain is a subspace of the codomain. In particular, the image of the entire space, the range of the homomorphism, is a subspace of the codomain.

That is, given vector spaces and a homomorphism , where and are subspaces of and respectively.

Proof:

TODO (in Hefferon)

Definition 3.2.10: The range space of a homomorphism is

sometimes denoted . The dimension of the range space is the map’s rank .

Definition 3.2.11: For any function , the set of elements of that map to is the inverse image

Lemma 3.2.12: For any homomorphism the inverse image of a subspace of the range is a subspace of the domain. In particular, the inverse image of the trivial subspace of the range is a subspace of the domain.

Proof:

TODO (in Hefferon)

Definition 3.2.13: The null space of kernel of a linear map is the inverse image of .

Definition 3.2.14: The dimension of the null space is the map’s nullity .

Theorem 3.2.15 (rank-nullity theorem): Given a linear map ,

Proof:

TODO (in Hefferon)

Lemma 3.2.16: Under a linear map, the image of a linearly dependent set is linearly dependent.

Proof:

Suppose that with some nonzero. Apply to both sides: and . We still have some nonzero, so the set of the s is linearly dependent.

Theorem 3.2.17: Where is an -dimensional vector space, these are equivalent statements:

  1. is one-to-one/injective;
  2. has an inverse from its range to its domain that is a linear map;
  3. , that is, ;
  4. ;
  5. if is a basis for then is a basis for .

Proof:

TODO (in Hefferon)

Remark: A one-to-one homomorphism is an isomorphism.

3.iii Computing linear maps

Definition 3.3.1: Suppose that and are vector space of dimensions and with bases and , and that is a linear map. If

then

is the matrix representation of with respect to .

Theorem 3.3.2: Assume that and are vector spaces of dimensions and , and that is a linear map. If is represented by

and is represented by

then the representation of the image of is

Definition 3.3.3: The matrix-vector product of an matrix and an vector is

Theorem 3.3.4: Any matrix represents a homomorphism between vector spaces of appropriate dimensions, with respect to any pair of bases.

Proof:

TODO (in Hefferon)

Theorem 3.3.5: The rank of a matrix equals the rank of any map that it represents.

Proof:

TODO (in Hefferon)

Definition 3.3.7: A linear map that is one-to-one and onto is nonsingular , otherwise it is singular . That is, a linear map is nonsingular iff it is an isomorphism.

Lemma 3.3.8: A nonsingular map is represented by a square matrix. A square matrix represents nonsingular maps iff it is a nonsingular matrix. Thus, a matrix represents an isomorphism iff it is nonsingular.

Proof:

Assume that the homomorphism is nonsingular. Thus by Corollary 3.3.6, for any matrix representing , has full column rank and full row rank. Since a matrix’s row rank is equal to its column rank, the number of rows is equal to the number of columns and so is a square matrix; since it has full rank, it is nonsingular.

Conversely, assume that is a square nonsingular matrix. To be nonsingular, it must have full row and column rank, which is true iff is an isomorphism, by Lemma 3.1.7, since the domain and codomain have the same dimension.

3.iii.i Computing range and null spaces

TODO - rewatch end of lecture

3.iv Matrix operations

Theorem 3.4.1: Let be linear maps represented with respect to bases by the matrices and and let be a scalar. Then with respect to the linear map is represented by and the map is represented by .
Definition 3.4.2: A zero matrix has all entries zero. We write or or or just .

Lemma 3.4.3: The composition of linear maps is linear.

Proof:

Let and be linear.

hence is linear.

Theorem 3.4.4: A composition of linear maps is represented by the matrix product of the representations.

Proof:

TODO (in Hefferon)

( TODO : arrow diagrams)

Remark: Matrix multiplication is not commutative in general.

Theorem 3.4.5: If are matrices, and the matrix products are defined, then the product is associate and distributes over matrix addition.

Proof:

Associativity holds because matrix multiplication represents function composition, which is associative.

Distributivity is similar, coming from the linearity of linear maps.

Lemma 3.4.6: In a product of two matrices and , the columns of are formed by taking times the columns of and the rows of are formed by taking the rows of times .

Definition 3.4.7: A matrix with all zeros except for a in the th entry is an - unit matrix or matrix unit .

Left-multiplying the -unit matrix copies row of the multiplicand into row of the result; right-multiplying by the -unit copies column of the multiplicand into column of the result.

Scaling a unit matrix scales the result.

Definition 3.4.8: The main diagonal or leading diagonal or principle diagonal goes from the upper left to lower right.

Definition 3.4.9: An identity matrix is square and every entry is 0 except for 1s in the leading diagonal.

Definition 3.4.10: A diagonal matrix is square and has 0s off of the main diagonal.

Definition 3.4.11: A permutation matrix is square and is all 0s except for a single 1 in each row and column.

Left-multiplying by a permutation matrix permutes rows, right-multiplying permutes columns.

Definition 3.4.12: The elementary reduction matrices result from applying a single Gaussian operation to an identity matrix.

  1. for (think Multiply);

  2. for (think Permute);

  3. for (think Carry over?)

Lemma 3.4.13: Matrix multiplication can do Gaussian reduction.

Proof:

Clear - just left-multiply by the relevant elementary reduction matrix.

Corollary 3.4.14: For any matrix there are elementary reduction matrices such that is in reduced echelon form.
Definition 3.4.15: If we have , we say that is the right-inverse of , and likewise is the left-inverse of .

Definition 3.4.16: A function has a two-sided inverse if is a right-inverse and left-inverse. In this case, the inverse in unique.

Note: this means that is a bijection.

Note: recall that if a linear map has a two-sided inverse, the inverse is also a linear map.

Definition 3.4.17: A matrix is a left-inverse matrix of the matrix if is the identity matrix. It is a right inverse if is the identity matrix. A matrix with a two-sided inverse is an invertible matrix . The two-sided inverse is denoted .

Lemma 3.4.18: If a matrix has both a left inverse and right inverse then the two are equal.

Proof: See Theorem 3.4.19.

Theorem 3.4.19: A matrix is invertible iff it is nonsingular.

Proof:

Given a matrix , fix spaces of appropriate dimension for the domain and codomain and fix bases for these spaces. With respect to these bases, represents a map . The statements are true about the map and therefore they are true about the matrix.

This also proves Lemma 3.4.18.

Lemma 3.4.20: A product of invertible matrices is invertible. That is, if are invertible and is defined then is invertible and .
Lemma 3.4.21: A matrix is invertible iff it can be written as the product of elementary reduction matrices. We can compute the inverse by applying to the identity matrix the same row steps, in the same order, that Gauss-Jordan reduce .

Corollary 3.4.22: The inverse of a matrix exists and equals

iff .

Proof:

3.v Change of basis

Definition 3.5.1: The change in basis matrix for bases is the representation of the identity map with respect to those bases.

Lemma 3.5.2: .

Proof:

Trivial.

Lemma 3.5.3: A matrix changes base iff it is nonsingular.

Proof:

TODO : in Hefferon

Corollary 3.5.4: A matrix is nonsingular iff it represents the identity map with respect to some pair of bases.

Theorem 3.5.5: To convert from the matrix representing a map with respect to to the matrix representing w.r.t ,

Proof:

TODO by arrow diagram

Definition 3.5.6: Same-sized matrices and are matrix equivalent if there are nonsingular matrices s.t. , that is, we can change bases to get from .
Corollary 3.5.7: Matrix-equivalent matrices represent the same map, with respect to appropriate pairs of bases.

Theorem 3.5.8: Matrix equivalence is an equivalence relation.

Proof:

Matrix equivalence is transitive, since .

Matrix equivalence is symmetric, since if for nonsingular , then .

Matrix equivalence is transitive, since if and for nonsingular , then , with and nonsingular since the product of nonsingular matrices is nonsingular.

Theorem 3.5.9: Any matrix of rank is matrix equivalent to the matrix that is all zeros except that the first diagonal entries are 1.

Definition 3.5.10: This is a block partial-identity form , so called because you have an identity matrix in the top left of the matrix.
Remark: Block partial-identity matrices are projection maps between the same bases.
Remark: All matrices are projection maps with respect to some bases.

Corollary 3.5.11: Matrix equivalence classes are characterised by rank: two same-sized matrices are matrix equivalent iff they have the same rank.

Proof: Two same-sized matrices with the same rank are equivalent to the same block partial-identity matrix.

3.vi Projection

Definition 3.6.1: The orthogonal projection of a vector into the line spanned by a nonzero vector is

Remark:

is a scalar multiple of whose tip lies directly “underneath” the tip of , where the “up” direction is the vector orthogonal to that intersects ; it is orthogonal to .

Remark:

Another way of thinking about projection is that is the “shadow” of on , when a light source is shone orthogonally on to .

Remark:

We can derive the projection formula from the above remarks.

where . Then by Pythagoras, .

Hence

Therefore .

TODO : picture

Definition 3.6.2: Vectors are mutually orthogonal when any two are orthogonal: .

Theorem 3.6.3: If the vectors in a set are mutually orthogonal and nonzero then that set is linearly independent.

Proof:

Consider a linear relation among the s: . Taking the dot product of any with both sides of the equation gives . Since, for , due to orthogonality, the left side reduces to , so . Since , it must be that . This holds for all the s, so the only linear relationship among the s is the trivial one, so the set is linearly independent.

Corollary 3.6.4: In a -dimensional vector space, if the vectors in a size set are mutually orthogonal and nonzero then that set is a basis for the space.

Proof:

Any linearly independent subset of size in a dimensional space is a basis.

Definition 3.6.5: An orthogonal basis for a vector space is a basis of mutually orthogonal vectors.

Theorem 3.6.6 (Gram-Schmidt process): If is a basis for a subspace of then the vectors

form an orthogonal basis for the same subspace.

Proof:

We show by induction by each is nonzero, is in the span of , and is orthogonal to all the previous s, and thus by Corollary 3.6.4, is a basis for the same subspace for which is.

For , , so is clearly nonzero, is vacuously orthogonal to all previous s, and is clearly in the span of .

Now assume that is nonzero, orthogonal to the previous s and is in the span of for all less than some fixed .

Then . This sum can be iteratively reduced to be in terms of the s, so or else there would be a nontrivial linear dependence among the s, which would be a contradiction as we know them to be linearly independent as they are a basis. (Note also that the coefficients of the s must be nonzero as, by the inductive hypothesis, all previous s are nonzero.)

Because is a linear combination of s, it is in the span of . Also, it is orthogonal to all the previous s: for all ,

here the first term is zero since the projection is orthogonal, and the remaining terms are 0 by the inductive hypothesis (all previous s are mutually orthogonal).

By induction, all the s are nonzero, in the span of the initial basis, and are mutually orthogonal, so also form a basis of the subspace.

Definition 3.6.7: An orthonormal basis is an orthogonal basis with members normalised to length 1.

4 Determinants

4.i Definition

Definition 4.1.1: An determinant is a function s.t.

  1. for
  2. for
  3. for any
  4. where is an identity matrix.

Common notation is .

Remark: Condition (ii) is redundant because row swaps can be done by a series of row additions and multiplications.

Lemma 4.1.2: A matrix with two identical rows has a determinant of 0.

Proof: Swapping the rows multiplies the determinant by , but gives an identical matrix and so the determinant must be the same; therefore the determinant must be zero.

Lemma 4.1.3: A matrix with a zero row has a determinant of 0.

Proof: Multiplying the zero row by a scalar multiplies the determinant by , but does not change the matrix; therefore the determinant must be invariant under scalar multiplication and must be equal to .

Lemma 4.1.4: A matrix is nonsingular iff its determinant is nonzero.

Proof:

Take a matrix with determinant zero, and carry out Gauss-Jordan reduction to get to . By Conditions (i) - (iii) of the determinant function, also; if were nonsingular then it would reduce to the identity matrix, which has determinant equal to 1; therefore, is singular. If were nonsingular, it must have a nonzero determinant to begin with.

Lemma 4.1.5: The determinant of an echelon form matrix is the product down the leading diagonal.

Proof: If the echelon matrix is singular then it has a zero row, so the determinant is zero which is the product down the leading diagonal.

If the echelon form matrix is nonsingular, then none of its diagonal entries are zero, so we can divide by those entries and use Condition (iii) to get 1s on the diagonal. Then we eliminate the nondiagonal entries, which by Condition (i) does not change the determinant, to get to the identity matrix, which has determinant 1. So, in this case we also get the determinant being equal to the product down the diagonal.

Lemma 4.1.6: For each , if there is an determinant function then it is unique.

Proof:

Suppose there are two functions satisfying the properties of Definition 4.1.1 and its consequences Lemmas 4.1.2 -  4.1.5. Given a square matrix M, fix some way of performing Gauss’s method to bring the matrix to echelon form. By using this fixed reduction, we can compute the value that these two functions must return on , and they must return the same value (by Lemma 4.1.5). Since they give the same output on every input, they are the same function.

Definition 4.1.7: Let be a vector space. A map is multilinear if

Lemma 4.1.8: Determinants are multilinear.

Proof:

Property (ii) here is just Condition (iii) in Definition 4.1.1, so we need only verify property (i).

TODO

Corollary 4.1.9: Determinants can be expressed as a sum of determinants of matrices with only one nonzero value in each row, or as a linear combination of permutation matrices.

Example:

Definition 4.1.10: An -permutation is a bijective function on the first positive integers .

In a permutation, each number is an output associated with exactly one input. We sometimes denote a permutation as the sequence .

Definition 4.1.11: The permutation expansion for determinants is

where are all of the -permutations, and is the permutation matrix corresponding to the permutation .

This is derived from using both conditions of multilinearity to break up the matrix into multiple matrices corresponding to each permutation, and then extracting the relevant coefficients.

Theorem 4.1.12: For each there is an determinant functions.

Proof:

By Lemma 4.1.19.

Theorem 4.1.13: The determinant of a matrix is equal to the determinant of its tranpose.

Proof:

TODO (in Hefferon)

Corollary 4.1.14: A matrix with two equal columns is singular. Column swaps change the sign of a determinant. Determinants are multilinear in their columns.

Proof:

By transposition.

Definition 4.1.15: In a permuation , elements such that are in an inversion of their natural order.

Lemma 4.1.16: A row-swap in a permutation matrix changes the number of inversions from even to odd, or from odd to even; it changes the parity of the number of inversions.

Proof:

If we swap adjacent rows, all other row relations remain unchanged, and we either create or remove an inversion between the two rows we changed. Hence the parity changes.

To swap non-adjacent rows and (with ), we can instead swap adjacent rows times (), followed by row swaps back up; this is the sum of two consecutive numbers which is odd, so the parity changes an odd number of times, so the parity changes.

Definition 4.1.17: The signum of a permutation is −1 if the number of inversions in is odd, and +1 if the number of inversions is even.

Corollary 4.1.18: If a permutation matrix has an odd number of inversions then swapping it to the identity takes an odd number of swaps. If it has an even number of inversions that swapping to the identity takes an even number.

Proof:

The identity matrix has zero inversions. To change an odd number to zero requires an odd number of swaps, and to change an even number to zero requires an even number of swaps.

Lemma 4.1.19: The function s.t.

is a determinant.

Proof:

We check the four conditions from Definition 4.1.1.

Condition (iv) holds because in the expansion of , the only nonzero term is the one corresponding to the identity permutation, which has a signum of 1, hence the determinant is 1.

For Condition (iii), suppose . Then

For Condition (i), suppose . Then

since the first sum is the determinant of the matrix with row the same as row .

Condition (ii) follows from Condition (i).

Corollary 4.1.20: There exists a determinant function for every .

4.ii Geometry of determinants

Definition 4.2.1: In the box or parallelepiped formed by is the set .

Then the signed area of the box, denoted , is .

Definition 4.2.2: The sign of the determinant reflects the orientation or sense of the box.

Theorem 4.2.3: A transformation changes the size of all boxes by the same factor, namely the size of the image of a box is times the size of the box where is the matrix representing with respect to the standard basis.

That is, the determinant of a product is the product of the determinants; .

Proof:

Suppose is singular and thus does not have an inverse. Note that if is invertible then there is an s.t. , so , so is invertible. The contrapositive of that observation is that if is not invertible then neither is - if then .

Then suppose that is nonsingular. Any nonsingular matrix factorises into a product of elementary matrices . Then if for all matrices and elementary matrices , then .

TODO : prove that determinants of elementary matrices are multiplicative.

Corollary 4.2.4: If a matrix is invertible then the determinant of its inverse is the reciprocal of its inverse, i.e. .
Definition 4.2.5: The volume of a box is the absolute value of the determinant of a matrix with those vectors as columns.

Cramer’s rule

Think about a linear system as being a set of vectors forming a parallelogram, and a target point being the far point of a larger, stretched parallelogram. Then we want to find the factors by which we scale the parallelogram to get the larger one.

TODO : diagram

Theorem 4.2.6 (Cramer's rule): Let be an matrix with a nonzero determinant, let be an -tall column vector, and consider the linear system . For any let be the matrix obtained by substituting for … TODO

note: this is not very efficient.

4.iii Laplace’s expansion

Definition 4.3.1: For any matrix , the matrix formed by deleting row and column of is the - minor of .

The - cofactor of is times the determinant of the minor of .

Theorem 4.3.2 (Laplace’s formula): Where is an matrix, we can find the determinant by expanding by cofactors on any row or column .

Proof:

TODO (exercise in Hefferon)

Definition 4.3.3: The matrix adjoint (or adjugate ) to the square matrix is

Note the entries are transposed from what we’d expect - .

Theorem 4.3.4: Where is a square matrix, . Thus if has an inverse, i.e. , then .

Proof:

The diagonal elements of are all equal to , as they are Laplace expansions, .

Then an element in column and row , with , is equal to , which is the Laplace expansion for a matrix with row equal to row , which must be equal to zero.

Hence .

Remark: This is also not an efficient method of calculating matrix inverses, unless the matrix is sparse (i.e. mostly zeros).

5 Similarity

5.i Complex vector spaces

From now on, scalars will be complex.

Theorem 5.1.1:

Let be a polynomial. If is a non-zero polynomial then there are quotient and remainder polynomials s.t. where the degree of is strictly less than the degree of .

Corollary 5.1.2: The remainder when is divided by is the constant polynomial .

Proof:

The remainder must be a constant polynomial as it has a degree less than one. Note that ; substituting in gives , so ; but is a constant so for all .

Corollary 5.1.3: If is a root of the polynomial , then is a factor of .

Proof: By Corollary 5.1.2, , but since is a root of , ; hence .
Theorem 5.1.4: Any constant or linear polynomial is irreducible over the reals. A quadratic polynomial is irreducible over the reals iff its discriminant is negative. No cubic or higher-degree polynomial is irreducible over the reals.
Corollary 5.1.5: Any polynomial with real coefficients factors into a product of linear and irreducible quadratic polynomials with real coefficients. That factorisation is unique; any two factorisations have the same factors raised to the same powers.
Theorem 5.1.6 (Fundamental Theorem of Algebra): Polynomials with complex coefficients factor into linear polynomials with complex coefficients. The factorisation is unique.
Remark: The standard basis for is the same as the standard basis for .

5.ii Similarity

Definition 5.2.1: The matrices and are similar if there is a nonsingular s.t. .
Remark: The zero matrix is similar only to itself (since ); the identity matrix is also similar only to itself ().

Theorem 5.2.2: Similarity is an equivalence relation.

Proof:

Similarity is reflexive: .

Similarity is symmetric: if , then .

Similarity is transitive: if and , then .

Remark: Matrix similarity is a special case of matrix equivalence: similar equivalent, but the converse is not true.
Remark: Some (but not all) similarity classes have a canonical representation of a diagonal form.
Definition 5.2.3: A transformation is diagonalisable if it has a diagonal representation with respect to the same basis for the codomain as for the domain. A diagonalisable matrix is one that is similar to a diagonal matrix: is diagonalisable if there exists nonsingular such that is diagonal.

Example:

Proposition: This matrix is not diagonalisable:

Proof:

Suppose that there exists a diagonal matrix . Then consider

But , hence . Hence is the zero matrix, but the only matrix that is similar to the zero matrix is the zero matrix. Contradiction.

Lemma 5.2.4: A transformation is diagonalisable iff there is a basis and scalar s.t. for each .

Proof:

Consider a diagonal representation matrix

Then for a basis element ,

Definition 5.2.5: A transformation has a scalar eigenvalue if there is a nonzero eigenvector such that .
Definition 5.2.6: A square matrix has a scalar eigenvalue associated with the nonzero eigenvector if .
Remark: To find the eigenvalues of a matrix , consider . If is nonsingular, then there is exactly one solution which is the trivial one, . Therefore, we solve for , then for each solution of (the eigenvalues), we can substitute back in to to find an eigenvector .
Remark: If the matrix is upper diagonal or lower diagonal, the polynomial is easy to factorise as it is just the product along the diagonal.
Remark: If there is a repeated root with multiplicity , then that is an eigenvalue with algebraic multiplicity . In general is greater than or equal to the geometric multiplicity of the eigenvector, i.e. the dimension of the eigenvector space.
Remark: Matrices that are similar have the same eigenvalues, but not necessarily the same eigenvectors.
Definition 5.2.7: The characteristic polynomial of a square matrix is the determinant where is a scalar variable. The characteristic equation is . The characteristic polynomial of a transformation is the characteristic polynomial of any matrix representation .
Remark: The characteristic polynomial of an matrix, or of a transformation , is of degree . See problem sheet for proof that the characteristic polynomial is well-defined, i.e. is the same regardless of of the bases used to represent the transformation.

Lemma 5.2.8: A linear transformation on a nontrivial vector space has at least one eigenvalue.

Proof: Any root of the characteristic polynomial is an eigenvalue; every polynomial of degree has at least one root in .
Definition 5.2.9: The eigenspace of a transformation associated with the eigenvalue is = . The eigenspace of a matrix is analogous.

Lemma 5.2.10: An eigenspace is a nontrivial subspace of .

Proof:

Note that is nonempty; it contains the zero vector since .

Also note that contains a nonzero vector because by definition is an eigenvalue iff has a nontrivial solution for .

Finally, is closed under linear combinations:

hence maps elements of to elements of .

Theorem 5.2.11: For any set of distinct eigenvalues of a map or matrix, a set of associated eigenvectors, one per eigenvalue, is linearly independent.

Proof:

By induction on the number of eigenvalues.

For zero eigenvalues, this is trivially true as the set of associated eigenvectors is empty.

Assume true for distinct eigenvalues.

If there are distinct eigenvalues, , and let be associated eigenvectors. Suppose that . From this we derive two equations: by applying to both sides, ; and by multiplying by , giving . Subtract these equations to get . Then the inductive hypothesis tells us that these vectors are all linearly independent, so there is only a trivial solution to this equation, i.e. , hence must be zero, so the vectors are linearly independent.

Since true for if true for , and true for , true for any number of eigenvalues by principle of induction.

Corollary 5.2.12: An matrix with distinct eigenvalues is diagonalisable.

Proof:

Form a basis of eigenvalues and apply Lemma 5.2.4.

6 Matrix factorisation

6.i LU factorisation

Definition 6.1.1: For an matrix, we take a series of “macro” transformations (made up of the product of some elementary matrices) that reduce to echelon form, for now assuming that no row swaps are necessary. Let , , then the LU factorisation of is .

We end up with being lower triangular with along the diagonal and being upper triangular.


What follows is the general procedure for LU factorisation.

Let denote the th column of the matrix before (macro-)step . Then should be chosen so that

To do this, we subtract times row from row , where .

Then define . Then can be written

Remark:

This looks like the dot product of and , but is the other way round! This just defines to be the identity matrix, with the th column being .

By the sparsity pattern of , we have . Therefore

so - that is, inverting is done by negating the subdiagonal entries.

Remark:

Negating the subdiagonal entries does not in general compute the inverse of a lower-triangular matrix - but if the only non-zero subdiagonal entries are all in the same column, it will work.

Now considering and taking into consideration the sparsity patterns of the , and so , i.e. the entries in are just the nonzero subdiagonal entries in the s.

Definition 6.1.2:(the LU algorithm)

where denotes the subrow .

Remark: To minimise memory use, use only one array that contains both and , making the 1s on the diagonal of implicit.
Remark: There are two floating point operations (flops) per modified entry (one multiplication, one subtraction). So approximately in the limit ( from O(n^2) flops per each of macro steps).
Remark: But also note that this lends well to parallelism.

Corollary 6.1.3 (Solving linear systems via LU factorisation):

Suppose we have a linear system and an LU factorisation . Then

so we can solve the system by first solving and then .

Thanks to the sparsity pattern of , the system can be efficiently solved by forward substitution (in the natural way, top down) (?). Similarly, can be solved by backwards substitution (bottom-up).

Remark: Each of these requires about flops, much less than needed for computing the LU factorisation, hence the overall cost is that of the factorisation, in the limit.
Remark: This works well for multiple linear systems with the same matrix and different right-hand sides - don’t invert , just LU-factorise it.

Definition 6.1.4: The LU algorithm can give large errors when using floating point arithmetic when we restrict to no row swaps, even when not dividing by 0 (consider if we replaced a zero in the top left with , then we get something that is pretty close in the pure case, but the floating-point errors are unacceptable). So row swaps are necessary for stability. With row swaps (pivoting), we get PLU factorisation .

Definition 6.1.5: In each macro-step of an LU factorisation, is called the pivot .

But we might want a different pivot, if is zero or small; so carry out a row swap such that the largest value in is the pivot. Recall that a row swap is represented by a permutation matrix.

Now the multipliers in each macro step are at most by absolute value.

Then we have , but in fact we can separate the s and s and get that where , and is a lower triangular matrix related to the s, as described below.

Lemma 6.1.6:

Given , we can rearrange the row operations to give where . That is, the row operations can be reordered, with the row additions having their subdiagonal entries permuted.

Proof:

Remark: The product of the s is again lower-triangular and can inverted by negating the subdiagonal entries.
Corollary 6.1.7: With , we have that , which is a PLU factorisation of .
Remark: Row swaps are known as partial pivots . Complete pivoting would also involve column swaps.

Definition 6.1.8:(Algorithm for PLU factorisation with partial pivoting)

Remark: The asymptotic number of flops is still , but we now get better numerical stability.

Corollary 6.1.9 (Solving linear systems using PLU factorisation):

6.ii QR factorisation

Recall that if vectors in a set are nonzero and pairwise orthogonal, then that set is linearly independent.

Recall also that and .

We extend orthogonality to matrices.

Definition 6.2.1: A square matrix is orthogonal or orthonormal if its columns have length 1 and are pairwise orthogonal. Equivalently, is orthogonal iff , i.e. .

Lemma 6.2.2: Multiplication by an orthogonal matrix preserves the length of vectors.

Proof:

Lemma 6.2.3: Similarly, angles are preserved under multiplication by an orthonormal matrix.

Proof:

Remark:Geometrically, multiplication by an orthogonal matrix is a rotation/reflection.
Definition 6.2.4: A similar definition of orthogonality can be defined for non-square matrices if the columns are non-zero, pairwise orthogonal, and each sum to 1. This definition still satisfies (but! in general).

Theorem 6.2.5: Every matrix has a QR factorisation.

Proof: Recall the Gram-Schmidt process. We can define a slightly different process to produce an orthonormal basis for vectors :

Defining for , and , then we have

Rearranging gives

so we have now obtained a QR factorisation of , :

Definition 6.2.6: This is a reduced/thin QR factorisation. A full QR factorisation has additional columns in to make it square, and corresponding zeros in .

Remark: The algorithm can be written as:

for j := 1 to n
qj = aj
for i := 1 to (j - 1)
r_ij := qi^T aj
qj := qj - r_ij qi
r_jj := abs(qj)
qj := qj / r_jj
Remark: Each iteration of the innermost loop involved roughly (scalar) multiplications and additions/subtractions. Overall about flops, where in the shape of .

Remark: The classical Gram-Schmidt process is numerically unstable! So we can use the modified Gram-Schmidt process instead.

for i := 1 to n
vi := ai
for i := 1 to n
r_ii := abs(vi)
qi = vi / r_ii
for j := (i + 1) to n
r_ij := qi^T vj
vj := vj - r_ij qi

This is mathematically equivalent, and has the same flop count, but is more numerically stable: each starts as , and then has the (for ) components removed as soon as they become available, so the s aren’t “disturbed” by the presence of already removed components.

Remark: Gram-Schmidt (for thin QR factorisation) can be viewed as triangular orthogonalisation: , i.e. creating an orthogonal matrix via multiplication by triangular matrices.

Other methods (for full QR factorisation) proceed by orthogonal triangularisation: , i.e. creating a triangular matrix via multiplication by orthogonal matrices.

Such methods use Householder reflections or Givens rotations . They are numerically stable (because multiplying by orthogonal matrices tends to be numerically stable). Flop count (Householder and ) is about .

Solving linear systems via QR factorisation

Consider a square system with nonsingular, and a QR factorisation . Then

which can be solved easily by substitution as is upper triangular.

The overall cost is dominated by the factorisation; this is more expensive than LU, but has better numerical stability.

Least squares

Consider with : more equations than unknowns ( overdetermined ).

We assume that has full column rank . In general the system cannot be solved exactly.

The least squares problem is to find such that is minimised. I.e. we want to find such that is orthogonal to the (hyper-)plane spanned by the columns of ; if it is not orthogonal, then the residual will be longer than necessary. Alternatively, should run parallel to the plane spanned by the columns of A, directly “underneath” . Equivalently:

we call this the normal equation .

Lemma 6.2.7: If has full column rank , then is nonsingular.

Proof:

We show that the nullspace of is trivial.

Let with . Then , so . Since the columns of are linear independent, it follows that .

Hence the normal equation can be solved uniquely, so there is a unique solution to the least squares problem.

We can do this by QR factorisation.

If , then

Since has full column rank, the matrix is nonsingular. Thus, multiplying with the inverse of gives the system

which can be solved easily by back-substitution.

Remark: The least-squares method can be used to find a linear regression - we want to minimise the sum of the squares of the errors, where

to find the best regression line .

Although in two dimensions, QR factorisation probably isn’t the easiest way to solve it, for higher-dimension regressions, it might be useful. Useful for machine learning!

6.iii Norms and Singular Value Decomposition

Definition 6.3.1: The 2-norm of a vector is equal to the length, or Euclidean distance, of the vector:

Definition 6.3.2: More generally, a vector norm is a function that satisfies :

  1. with equality iff ;
  2. ;
  3. .
Remark: Unlike determinants, norms are not unique.

Definition 6.3.3: A commonly used norm is the -norm :

Definition 6.3.4: In , a unit ball is the region for which for some norm . In , for -norms: TODO images of unit balls for .

Definition 6.3.5: A vector norm induces an induced matrix norm :

where denotes the supremum of a set; that is, a maximal element which is not necessarily contained within the set. For finite sets, this is the same as .

Intuitively, this is the largest amount by which the norm of a vector can be scaled by multiplying with .

Lemma 6.3.6: For any , for a vector norm and its induced matrix norm .

Proof:

By the definition of an induced matrix norm, , hence .

Remark: Matrix norms need not be induced; since matrices can be thought of as vectors in the vector space of matrices, norms that satisfy the prior definition can be defined on matrices, including the element-wise Frobenius norm .

Lemma 6.3.7: Let denote an induced matrix norm. Then .

Proof:

Lemma 6.3.8: Let be an orthogonal matrix, and . Then

Proof:

This is a rephrasing of Lemma 6.2.2.

Definition 6.3.9: The singular value decomposition (SVD) of a matrix (where ) is a factorisation , where and are orthogonal, and is diagonal. The diagonal entries are called singular values of .

The (pairwise orthogonal) columns of are the left-singular vectors of the SVD, and the (pairwise orthogonal) columns of are the right-singular vectors of the factorisation.

The SVD describes the action of A as a series of 3 transformations:

  • rotation/reflection by ;
  • scaling along the axes by ;
  • another rotation/reflection, by .
Corollary 6.3.10: The image of a unit sphere under any matrix is a hyperelipse.
Remark: It can be shown that every matrix has an SVD (Theorem 6.3.13).

Lemma 6.3.11: The action of on the right-singular vectors can be described by for .

Proof:

Theorem 6.3.12: The map defined by a matrix is the map represented by the diagonal matrix , with respect to the domain and codomain bases consisting of the left-singular and right-singular vectors, respectively.

Proof:

Let for some .

Let be the representation of in the basis of the right singular vectors, , s.t. .

Similarly, let be the representation of in the basis of the left-singular vectors, , s.t. .

Then

Theorem 6.3.13: Every matrix has an SVD, with uniquely determined singular values s. Further, if is square and the s are distinct, the left- and right-singular vectors are uniquely determined up to signs.

Proof:

TODO (in Trefethon & Bau)

Remark: An approximate SVD for an matrix can be computed in flops. This typically involves an iterative phase, as the singular values are in general irrational, but this is typically not a bottleneck.
Remark: SVD looks similar to eigenvalue decomposition, but is not the same; if a diagonalisable matrix has an eigenvalue decomposition , then this is an SVD of iff is orthogonal, which is not generally the case - eigenvectors need not be orthogonal.

Theorems 6.3.14 -  6.3.20 link properties of a matrix to properties of its SVD, given that we let be the number of nonzero singular values of , and that we have .

Theorem 6.3.14: The rank of is .

Proof: The rank of a diagonal matrix is the number of nonzero diagonal entries. Since has full rank - since by definition all s are greater than 0 - the rank is .

Theorem 6.3.15: is an orthonormal basis of ‘s column space. Similarly, is an orthonormal basis of ‘s nullspace.

Proof:

This follows from the fact that and ; the left- and right-multiplications of and respectively give the desired bases, which are orthonormal because are orthogonal.

Remark: Column space and null space can be computed by QR factorisation, but SVD is more accurate.

Theorem 6.3.16: where is the largest singular value of .

Proof:

Theorem 6.3.17: The nonzero singular values of are the square roots of the nonzero eigenvalues of .

Proof:

hence is similar to so has the same eigenvalues as , which are .

Lemma 6.3.18: The determinant of any orthogonal matrix is 1.

Proof:

Theorem 6.3.19: Let be a square matrix. Then

Proof:

Theorem 6.3.20 (Low-rank approximation): For any , define

Then

where denotes the infimum (equivalent to minimum for finite sets); that is, is a low-rank approximation of , where the Euclidean distance between and is .

Proof:

TODO

Remark: if .