Remaining TODOs: 30
Relevant reading:
For chapters 1-5: Jim Hefferon. Linear Algebra.
For chapter 6: Lloyd N. Trefethen & David Bau. Numerical Linear Algebra.
1 Linear systems
1.i Solving Linear Systems
In a system of equations , the th row is denoted with the Greek letter rho as .
Definition 1.1.3: Elementary row operations are operations that can be carried out on a system of linear equations without changing the solution set.
These operations, all affecting row , are:
- swapping two rows, , where ;
- scaling a row, where ;
- adding one row to another row, , where and .
Lemma 1.1.4: Elementary row operations preserve the solution set of a linear system.
Proof:
Consider some system of linear equations
For the tuple to satisfy this system of equations, we must have that and that , and so on; swapping the order of two of the linear equations in the system does not change the validity of this solution, as logical and is commutative. Hence operation (i) is an elementary row operation.
If we multiply a linear equation by some non-zero scalar , that equation still holds true, and so the solution set to the system is still valid. Hence operation (ii) is an elementary operation.
Any solution set that satisfies the system of linear equations must satisfy each individual equation. If we add two equations together, the solutions will still hold, so operation (iii) is an elementary row operation.
Example: If we have a system of equations
then and are free variables. We can parametrise the solution set using and as parameters to say that
but it is more clear to write and as new variables:
The Gaussian elimination of a system of linear equations follows the same operations as the Gaussian elimination of its associated system of homogeneous linear equations.
It is helpful to study homogeneous systems because these always have a particular solution, the zero vector.
In general, we can say that the general solution to a system of linear equations is the sum of a particular solution and the solution to the associated system of homogeneous equations.
Lemma 1.1.9: The zero vector is always a particular solution of a homogeneous system of linear equations.
Proof:
Take substituting for in each equation in a homogeneous system gives which is trivially true and so is a particular solution of any homogeneous system of linear equations.
Lemma 1.1.10: For any homogeneous linear system of equations, there exist vectors such that the solution set of the system is
where is the number of free variables in an echelon form of the system.
This can be rephrased as: every leading variable in a homogeneous system can be expressed as a linear combination of free variables.
Proof:
Consider some homogeneous system that has been transformed into echelon form, ignoring any trivially true equations (). If the system consists entirely of tautologies, the lemma is trivially true as there are no leading variables which need be expressed.
We will denote the index of the leading variable in a row as .
Consider the bottom equation, row ,
where . Here, are free variables as there are no rows below this in which they could be leading. The leading variable can be expressed in terms of the free variables:
Suppose that every leading variable for for arbitrary can be expressed as a linear combination of free variables. Then the leading variable of the th equation can be expressed as a linear combination of free variables, since every other variable in the th equation is either a free variable, or is a leading variable in a lower equation and so can be expressed as a linear combination of free variables itself.
Hence if the th leading variable can be expressed as a linear combination of free variables, so can the th equation. Hence by induction, every leading variable can be expressed as a linear combination of free variables.
Lemma 1.1.12: For a linear system and a particular solution , the system is satisfied by the solution set
Proof:
Suppose that satisfies the system. Then satisfies the associated homogeneous system since for all
where is the constant of the th equation and and are the th components of and respectively.
Suppose that we have a vector that satisfies the associated homogeneous system, then satisfies the system because, for all ,
Theorem 1.1.13: The solution set of any linear system has the form
where is any particular solution and is the number of free variables in the echelon form of the system.
Proof:
This follows from Lemma 1.1.10 and Lemma 1.1.12.
1.ii Linear Geometry
Definition 1.2.2: The line (in ), plane (in ), or -dimensional linear surface , -flat or -dimensional hyperplane (in is a set of the form
where and . More specifically, it is comprised of the endpoints of these vectors when their tips are at the origin.
Definition 1.2.3: The length of a vector is defined as
Definition 1.2.5: The dot product , scalar product or inner product of two vectors and is defined as
Theorem 1.2.6 (Cauchy-Schwartz inequality): For any ,
with equality only if there exists such that .
Proof:
which is true because is always positive, so is also always positive.
Equality occurs iff .
If , then .
If , then
Hence equality occurs iff .
Theorem 1.2.7 (Triangle inequality): For any , with equality iff .
Proof:
Both sides of the inequality are positive, so squaring both sides does not change the validity of the inequality.
which is true by the Cauchy-Schwartz inequality (Theorem 1.2.6).
Equality occurs iff , by the Cauchy-Schwartz inequality again.
Definition 1.2.8: The angle between two vectors is defined as
Remark: That this is always defined since, following from the Cauchy-Schwartz inequality,
Remark: The formula in Definition 1.2.8 follows from the Law of Cosines.
Proof:
Consider two vectors , and call the angle between them . Using the Law of Cosines on the triangle with sides and gives that
The left side gives
while the right side gives
Combining the two sides and cancelling like terms gives
which simplifies and rearranges to give Definition 1.2.8.
Remark: That the Law of Cosines is just an extension of Pythagoras’ theorem:
1.iii Reduced echelon form
This is useful because we can just read the solution off.
Example: Consider the following system matrix in reduced echelon form:
The solution set of this system is simply
To get to reduced echelon form, we use Gauss-Jordan reduction , whereby we first carry out Gaussian elimination, then scale each row such that the leading variables have coefficients of 1, then eliminate any non-leading and non-free variables in each row by working upwards.
Lemma 1.3.2: Elementary row operations are reversible.
Proof:
has the inverse of . has the inverse of for . has the inverse , where .
Lemma 1.3.4: Between matrices, ‘reduces to’ is an equivalence relation.
Proof:
To be an equivalence relation, ‘reduces to’ must satisfy reflexivity, symmetry and transitivity.
Reflexivity is trivial since we can reduce through carryig out no operations.
Symmetry follows from Lemma 1.3.2.
The relation is transitive because if we can reduce and , we can carry out the operations in order to go from to reduce .
Lemma 1.3.5 (Linear combination lemma): A linear combination of linear combinations is a linear combination.
Proof:
Given the set of linear combinations , consider the linear combination of these
which is a linear combination.
Lemma 1.3.7: In an echelon form matrix, no nonzero row is a linear combination of the other nonzero rows.
Proof:
Consider any row of the matrix. The leading entry in that row is one, but the entry in that column in every other row is equal to zero (by the definition of reduced echelon form), so there is no linear combination of other rows that can make this row.
Proof:
We proceed by induction on the number of columns, , for a fixed arbitrary number of rows .
Let .
If is the zero matrix, then it is in reduced echelon form and is not row-equivalent to any other matrix (this follows from Corollary 1.3.6). Otherwise, when reduced to reduced-echelon form, the only nonzero entry must be a 1 in the first row. Regardless, the reduced echelon form matrix is unique.
Now assume that, for some , all matrices for have a unique reduced echelon form matrix, and consider .
Suppose that reduces to two distinct reduced echelon form matrices, and . Let be the matrix consisting of the first columns of . Any sequence of operations that reduces to reduced echelon form must also reduce to reduced echelon form, and by the inductive hypothesis, reduces to a unique reduced echelon form matrix. Therefore, and must differ in the th column.
We write the linear system with matrix of coefficients as
the systems with matrices of coefficients and are written similarly. These systems must all have the same solution sets, since carrying out elementary row operations on linear systems does not change the solution set. With and differing only in the th column, suppose they differ in row . We subtract row of the systems represented by and to get that . Since , .
Therefore, is not a free variable, so the th columns of and must contain the leading entry of some row, since any column that does not have a leading entry is associated with a free variable.
Because the first columns of and are equal, a leading variable in the th column must be in the same position in and , and must be equal to 1; therefore, and are equal, so reduces to a unique reduced echelon form matrix.
2 Vector spaces
2.i Definition of vector space
Definition 2.1.1: A vector space (over ) consists of a set along with two operations ‘’ and ‘’ subject to the condition that, for all :
- is closed under addition i.e.
- addition is commutative i.e.
- addition is associative
- there is an additive identity, i.e.
- every element of has an additive inverse, i.e.
- is closed under scalar multiplication i.e.
- scalar multiplication distributes over addition of scalars, i.e.
- scalar multiplication distributes over vector addition, i.e.
- ordinary scalar multiplication associates with scalar multiplication, i.e.
Lemma 2.1.3: In any vector space , for any and , we have
Proof:
For (i), note that . Add to both sides the additive inverse such that .
(ii): .
(iii)
Lemma 2.1.6: For a nonempty subset of a vector space, under the inherited operations the following are equivalent statements
- is a subspace of that vector space
- is closed under linear combinations of pairs of vectors
- is closed under linear combinations of any number of vectors
Lemma 2.1.9: In a vector space, the span of any subset is a subspace.
Proof:
If the subset is empty, its span is the trivial subspace by definition. Otherwise, for two elements of , , where and ,
which is a linear combination of elements of and so is a member of , hence is closed under linear combinations of pairs of elements; by Lemma 2.1.6, is a subspace.
2.ii Linear Independence
Lemma 2.2.3: A subset of a vector space is linearly independent iff among its elements the only linear relationship is the trivial one, .
Proof:
If is linearly independent, there is no vector that can be expressed as a linear combination of other elements of , so there is no linear relationship between elements of with non-zero coefficients, i.e. the only linear relationship is the trivial one.
If is linearly dependent, there exists a vector that can be represented as a linear combination of other elements of , so there is a linear relationship among the elements of with at least one non-zero coefficient, being that of . Hence by contraposition, if the only linear relationship among elements of is the trivial one, is linearly independent.
Lemma 2.2.4: Suppose is a vector space, is a subset of that space, and , then
Proof:
: Suppose , then since ; the implication follows from contraposition.
: Suppose , then for . Then for any ,
so . Therefore, . Note also that clearly , so by double inclusion .
Lemma 2.2.7: Suppose that is linearly independent and that , then is linearly independent iff .
Proof:
Suppose that , so for . Since , it is not equal to any of the s so this is a nontrivial linear dependence among elements of . Therefore, is not linearly independent.
Now suppose that is linearly dependent, so there exists a nontrivial combination . By assumption, is linear independent, so . Therefore, we can rearrange the equation to find that , so .
Therefore we have that is linearly dependent iff , so is linear independent iff .
Corollary 2.2.8: In a vector space, any finite set has a linearly independent subset with the same span.
Proof:
For a finite set , if is linearly independent then the statement is trivially true as . Note that if is a singleton set, it is trivially linearly independent.
Otherwise, there exists an which can be expressed as a linear combination of the other elements of ; by Corollary 2.2.5, has the same span as . , and since is finite we can keep removing dependent elements until we either reach a nontrivial independent set with the same span, or we reach a singleton set, which is trivially independent.
Corollary 2.2.9: A subset is linearly dependent iff some can be represented as a linear combination of other elements in listed before it.
Proof:
Let . Then if is linearly dependent, there exists a first such that is linearly dependent, i.e. .
2.iii Basis and dimension
Definition 2.3.2: For any ,
is the standard basis or natural basis . We denote these vectors . (Sometimes the vectors in are denoted .)
Theorem 2.3.3: In any vector space, a subset is a basis iff each vector in the space can be expressed as a linear combination of elements of the subset in exactly one way.
Proof:
A sequence is a basis iff its elements form a spanning subset of the space, and is linearly independent. A subset is spanning iff each vector in the space can be expressed as a linear combination of elements in the subset, so we need only show that a spanning subset is linearly independent iff every vector in the space has a unique expression as a linear combination of elements of the subset.
Consider two such expressions of a vector , and for a subset . These are equal by transitivity of equality, so we can subtract them to get that . The s equal the s iff for , which holds iff is linearly independent, since the only linear dependence amongst its elements is the trivial one.
Definition 2.3.4: In a vector space with bases the representation of with respect to is the column vector of the coefficients used to express as a linear combination of the bases vectors:
where and .
are the coordinates of with respect to .
Lemma 2.3.5: If is an -element basis of a vector space , then for any set of vectors ,
Proof:
Take a basis and suppose is the th column of a matrix for such that etc.
Then we can substitute these into to give
Because elements of bases are linearly independent, this only holds if . This gives us a linear system which we can re-express with column vectors, giving
Note that not only does the relationship hold, but the coefficients are the same.
Theorem 2.3.7: In any finite-dimensional vector space, all bases have the same number of elements.
Proof:
Consider some finite-dimensional vector space; by definition, it has a finite basis. Pick one finite basis of minimal size, and, for the sake of contradiction, take any other basis , with , noting that since is minimal.
Assume that the elements of are unique (otherwise it would not be a basis), then the only linear relationship amongst the s must be the trivial one. By Lemma 2.3.5, the only linear relationship amongst the representations of the s with respect to is the trivial one.
A linear relationship amongst the representations of the s with respect to can be represented by a homogeneous system of linear equations, with unknowns. Since there are more unknowns than equations, and the system is homogeneous, there are infinitely many solutions and so there must be more than just the trivial linear relationship among the s. Therefore, the s are not linearly independent, so is not a basis; this is a contradiction, so all bases of must be the same size.
Corollary 2.3.10: Any linearly independent set can be expanded to make a basis.
Proof:
Take a linearly independent set of a vector space .
If is not already a basis for , then it must not span the space. Therefore we can add a vector that is not in to , and retain linear independence by Lemma 2.2.7. Keep adding until the resulting set does span the space, which will occur in a finite number of steps by Corollary 2.3.9.
Corollary 2.3.11: Any spanning set can be shrunk to a basis.
Proof:
Take a spanning set . If then it is already a basis (and the enclosing space must be trivial). If then it can be shrunk to the empty basis without changing its span.
Otherwise, s.t. and we can form a basis . If then we are done. Otherwise, s.t. , so we can append to . By Lemma 2.2.7, is still linearly independent. We can then check if , and keep repeating this process until the spans are equal, which must happen in a finite number of steps.
Corollary 2.3.12: In an -dimensional space, a set composed of vectors is linearly independent iff it spans the space.
Proof:
A set composed of vectors is linearly independent iff it is a basis: the “if” direction is clear, and the other directions holds because any linearly independent set can be expanded to a basis, but it already has elements so must already be a basis.
A subset of vectors spans the space iff it is a basis: again the “if” direction is clear, and the other direction holds because any spanning set can be shrunk to a basis, but it already has elements so must already be a basis.
Lemma 2.3.14: If two matrices and are related by a row operation, , or for , then their row spaces are equal.
Proof:
By Corollary 1.3.6, each row of is a linear combination of the rows of ; that is, each row of is in the rowspace of . It follows by the Linear Combination Lemma that Rowspace() Rowspace(), since each element of the rowspace of is a linear combination of the rows of , so is a linear combination of a linear combination of the rows of , i.e. just a linear combination of the rows of . Because elementary row operations are reversible (by Lemma 1.3.2), the same argument can be applied to show that Rowspace() Rowspace(); by double-inclusion, the rowspaces are equal.
Lemma 2.3.16: The nonzero rows of an echelon form matrix make up a linearly independent set.
Proof:
This is a restatement of Lemma 1.3.7 using new terminology.
Lemma 2.3.19: Row operations do not change the column rank.
Proof:
By Lemma 1.1.4, row operations do not change the solution space of a system, so they must preserve linear relationships among columns. Therefore, they do not change the number of linearly independent columns, so they preserve column rank.
Theorem 2.3.20: For any matrix, the row rank and column rank are equal.
Proof:
Any matrix is row-equivalent to a reduced echelon form matrix, with the same row and column rank as by Lemmas 2.3.15 and 2.3.19. The row rank of is the number of nonzero rows in the reduced echelon form matrix, which is equal to the number of leading entries in the reduced echelon form matrix; the column rank of is also equal to the number of leading entries, since the columns containing leading entries form a subset of the standard basis, so all other columns can be expressed as a linear combination of the rows containing leading entries.
Theorem 2.3.22: For linear systems with unknowns and with matrix of coefficients , the rank of is iff the vector space of solutions of the associated homogeneous system has dimension .
Proof:
The rank of is iff the reduced echelon form matrix of has leading variables. There must be free variables in the reduced echelon form matrix, so the solution space of the associated homogeneous system has dimension .
Corollary 2.3.23: Where the matrix is , the following statements are equivalent:
- the rank of is
- is nonsingular
- the rows of form a linearly independent set
- the columns of form a linearly independent set
- any linear system whose matrix of coefficients is has one and only one solution.
Proof:
TODO
Definition 2.3.25: The concatenation of the sequences adjoins them into a single sequence.
Lemma 2.3.27: let be a vector space that is the sum of some of its subspaces, . Let be bases for these subspaces. Then the following are equivalent:
- The expression of any as a combination where is unique;
- The concatenation is a basis for ;
- Among nonzero vectors from different , every linear relationship is trivial, i.e. the s are independent.
Corollary 2.3.29: The dimension of a direct sum is the sum of the dimensions of its summands.
Proof:
In Lemma 2.3.27, the number of vectors in the concatenated basis is the same as the sum of the number of vectors in each sub-basis.
Definition 2.3.30: When a vector space is the direct sum of two of its subspaces, they are complements .
Lemma 2.3.31: A vector space is the direct sum of two of its subspaces iff and i.e. their intersection is trivial.
Proof:
If , then by definition of direct sum, ; furthermore, since and are independent, the linear combination for can only have the trivial solution if we want a member of on the left side and a member of on the right, w.l.o.g.
Conversely, if then if we express as a linear combination of , then must also be in , as linear combinations of elements of a vector space are also in that vector space. But the intersection of and is trivial, so and satisfy the definition of independence; since they sum to , they form a direct sum of .
3 Maps between spaces
3.i Isomorphisms
Definition 3.1.1: An isomorphism between two vector spaces and is a map such that
- is a bijection/correspondence;
-
preserves structure : if then
and if and then
(We write , read “V is isomorphic to W”, when such a map exists.)
To verify that is an isomorphism, show that:
- is injective/one-to-one, i.e. ;
- is surjective/onto, i.e. ;
- preserves addition, i.e. ;
- preserves scalar multiplication, i.e. .
Lemma 3.1.3: An isomorphism maps the zero vector to the zero vector.
Proof: Where is an isomorphism, fix some . Then
Lemma 3.1.4: For any map between vector spaces, the following statements are equivalent:
- preserves structure
- preserves linear combinations of two vectors
- preserves linear combinations of any finite number of vectors
Proof:
TODO (in Hefferon)
Lemma 3.1.5: The inverse of an isomorphism is also an isomorphism.
Proof:
Suppose that by . Because isomorphisms are bijections, has an inverse .
Suppose that . Because it is an isomorphism, is surjective and . Then
Therefore preserves structure and hence satisfies the necessary conditions to be an isomorphism.
Theorem 3.1.6: Isomorphism is an equivalence relation between vector spaces.
Proof:
Isomorphism is reflexive, since the identity function is an isomorphism, since it is clearly a bijection and preserves structure:
Symmetry holds by Lemma 3.1.5.
For transitivity, suppose that and , and let and be isomorphisms. Then is bijective, and structure preserving:
Lemma 3.1.7: If spaces are isomorphic then they have the same dimension.
Proof:
TODO (in Hefferon)
Lemma 3.1.8: If spaces have the same dimension then they are isomorphic.
Proof:
TODO (in Hefferon)
Theorem 3.1.9: Vector spaces are isomorphic iff they have the same dimension.
Proof:
The proof follows from Lemma 3.1.7 and Lemma 3.1.8.
Corollary 3.1.10: Every finite-dimensional vector space is isomorphic to exactly one of the .
Thus the real space form a set of canonical representatives of the isomorphism classes.
3.ii Homomorphisms
Lemma 3.2.2: A linear map maps the zero vector to the zero vector.
Proof:
The proof is the same as the proof to Lemma 3.1.3.
Lemma 3.2.3: The following statements are equivalent for any map between vector spaces:
- is a homomorphism;
- ;
- .
Proof:
TODO (in Hefferon)
Definition 3.2.4: The trace of a square matrix is the sum down the upper-left to bottom-right diagonal. So is:
Theorem 3.2.5: A homomorphism is determined by its action on a basis: if is a vector space basis , if is a vector space, and if (not necessarily distinct), then there exists a homomorphism from to sending each to , and that homomorphism is unique.
Proof:
TODO (in Hefferon)
Example (Derivation of (rotation)):
Extending linearly:
Lemma 3.2.8: For vector spaces and , the set of linear functions from to is itself a vector space, a subspace of the space of all functions from to . We denote the space of linear maps from to by .
Alternatively: a linear combination of homomorphisms is also a homomorphism.
Proof:
TODO (in Hefferon)
Lemma 3.2.9: Under a homomorphism, the image of any subspace of the domain is a subspace of the codomain. In particular, the image of the entire space, the range of the homomorphism, is a subspace of the codomain.
That is, given vector spaces and a homomorphism , where and are subspaces of and respectively.
Proof:
TODO (in Hefferon)
Definition 3.2.10: The range space of a homomorphism is
sometimes denoted . The dimension of the range space is the map’s rank .
Lemma 3.2.12: For any homomorphism the inverse image of a subspace of the range is a subspace of the domain. In particular, the inverse image of the trivial subspace of the range is a subspace of the domain.
Proof:
TODO (in Hefferon)
Definition 3.2.13: The null space of kernel of a linear map is the inverse image of .
Theorem 3.2.15 (rank-nullity theorem): Given a linear map ,
Proof:
TODO (in Hefferon)
Lemma 3.2.16: Under a linear map, the image of a linearly dependent set is linearly dependent.
Proof:
Suppose that with some nonzero. Apply to both sides: and . We still have some nonzero, so the set of the s is linearly dependent.
Theorem 3.2.17: Where is an -dimensional vector space, these are equivalent statements:
- is one-to-one/injective;
- has an inverse from its range to its domain that is a linear map;
- , that is, ;
- ;
- if is a basis for then is a basis for .
Proof:
TODO (in Hefferon)
3.iii Computing linear maps
Definition 3.3.1: Suppose that and are vector space of dimensions and with bases and , and that is a linear map. If
then
is the matrix representation of with respect to .
Theorem 3.3.2: Assume that and are vector spaces of dimensions and , and that is a linear map. If is represented by
and is represented by
then the representation of the image of is
Definition 3.3.3: The matrix-vector product of an matrix and an vector is
Theorem 3.3.4: Any matrix represents a homomorphism between vector spaces of appropriate dimensions, with respect to any pair of bases.
Proof:
TODO (in Hefferon)
Theorem 3.3.5: The rank of a matrix equals the rank of any map that it represents.
Proof:
TODO (in Hefferon)
Corollary 3.3.6: Let be a linear map resented by a matrix . Then is onto/surjective iff the rank of equals the number of its rows, and is one-to-one/injective iff the rank of equals the number of its columns.
Proof:
TODO (in Hefferon)
Lemma 3.3.8: A nonsingular map is represented by a square matrix. A square matrix represents nonsingular maps iff it is a nonsingular matrix. Thus, a matrix represents an isomorphism iff it is nonsingular.
Proof:
Assume that the homomorphism is nonsingular. Thus by Corollary 3.3.6, for any matrix representing , has full column rank and full row rank. Since a matrix’s row rank is equal to its column rank, the number of rows is equal to the number of columns and so is a square matrix; since it has full rank, it is nonsingular.
Conversely, assume that is a square nonsingular matrix. To be nonsingular, it must have full row and column rank, which is true iff is an isomorphism, by Lemma 3.1.7, since the domain and codomain have the same dimension.
3.iii.i Computing range and null spaces
TODO - rewatch end of lecture
3.iv Matrix operations
Lemma 3.4.3: The composition of linear maps is linear.
Proof:
Let and be linear.
hence is linear.
Theorem 3.4.4: A composition of linear maps is represented by the matrix product of the representations.
Proof:
TODO (in Hefferon)
( TODO : arrow diagrams)
Theorem 3.4.5: If are matrices, and the matrix products are defined, then the product is associate and distributes over matrix addition.
Proof:
Associativity holds because matrix multiplication represents function composition, which is associative.
Distributivity is similar, coming from the linearity of linear maps.
Definition 3.4.7: A matrix with all zeros except for a in the th entry is an - unit matrix or matrix unit .
Left-multiplying the -unit matrix copies row of the multiplicand into row of the result; right-multiplying by the -unit copies column of the multiplicand into column of the result.
Scaling a unit matrix scales the result.
Definition 3.4.9: An identity matrix is square and every entry is 0 except for 1s in the leading diagonal.
Definition 3.4.11: A permutation matrix is square and is all 0s except for a single 1 in each row and column.
Left-multiplying by a permutation matrix permutes rows, right-multiplying permutes columns.
Definition 3.4.12: The elementary reduction matrices result from applying a single Gaussian operation to an identity matrix.
-
for (think Multiply);
-
for (think Permute);
-
for (think Carry over?)
Lemma 3.4.13: Matrix multiplication can do Gaussian reduction.
Proof:
Clear - just left-multiply by the relevant elementary reduction matrix.
Definition 3.4.16: A function has a two-sided inverse if is a right-inverse and left-inverse. In this case, the inverse in unique.
Note: this means that is a bijection.
Note: recall that if a linear map has a two-sided inverse, the inverse is also a linear map.
Lemma 3.4.18: If a matrix has both a left inverse and right inverse then the two are equal.
Theorem 3.4.19: A matrix is invertible iff it is nonsingular.
Proof:
Given a matrix , fix spaces of appropriate dimension for the domain and codomain and fix bases for these spaces. With respect to these bases, represents a map . The statements are true about the map and therefore they are true about the matrix.
This also proves Lemma 3.4.18.
Corollary 3.4.22: The inverse of a matrix exists and equals
iff .
Proof:
3.v Change of basis
Definition 3.5.1: The change in basis matrix for bases is the representation of the identity map with respect to those bases.
Lemma 3.5.2: .
Proof:
Trivial.
Lemma 3.5.3: A matrix changes base iff it is nonsingular.
Proof:
TODO : in Hefferon
Theorem 3.5.5: To convert from the matrix representing a map with respect to to the matrix representing w.r.t ,
Proof:
TODO by arrow diagram
Theorem 3.5.8: Matrix equivalence is an equivalence relation.
Proof:
Matrix equivalence is transitive, since .
Matrix equivalence is symmetric, since if for nonsingular , then .
Matrix equivalence is transitive, since if and for nonsingular , then , with and nonsingular since the product of nonsingular matrices is nonsingular.
Theorem 3.5.9: Any matrix of rank is matrix equivalent to the matrix that is all zeros except that the first diagonal entries are 1.
Corollary 3.5.11: Matrix equivalence classes are characterised by rank: two same-sized matrices are matrix equivalent iff they have the same rank.
3.vi Projection
Definition 3.6.1: The orthogonal projection of a vector into the line spanned by a nonzero vector is
Remark:
is a scalar multiple of whose tip lies directly “underneath” the tip of , where the “up” direction is the vector orthogonal to that intersects ; it is orthogonal to .
Remark:
Another way of thinking about projection is that is the “shadow” of on , when a light source is shone orthogonally on to .
Remark:
We can derive the projection formula from the above remarks.
where . Then by Pythagoras, .
Hence
Therefore .
TODO : picture
Theorem 3.6.3: If the vectors in a set are mutually orthogonal and nonzero then that set is linearly independent.
Proof:
Consider a linear relation among the s: . Taking the dot product of any with both sides of the equation gives . Since, for , due to orthogonality, the left side reduces to , so . Since , it must be that . This holds for all the s, so the only linear relationship among the s is the trivial one, so the set is linearly independent.
Corollary 3.6.4: In a -dimensional vector space, if the vectors in a size set are mutually orthogonal and nonzero then that set is a basis for the space.
Proof:
Any linearly independent subset of size in a dimensional space is a basis.
Theorem 3.6.6 (Gram-Schmidt process): If is a basis for a subspace of then the vectors
form an orthogonal basis for the same subspace.
Proof:
We show by induction by each is nonzero, is in the span of , and is orthogonal to all the previous s, and thus by Corollary 3.6.4, is a basis for the same subspace for which is.
For , , so is clearly nonzero, is vacuously orthogonal to all previous s, and is clearly in the span of .
Now assume that is nonzero, orthogonal to the previous s and is in the span of for all less than some fixed .
Then . This sum can be iteratively reduced to be in terms of the s, so or else there would be a nontrivial linear dependence among the s, which would be a contradiction as we know them to be linearly independent as they are a basis. (Note also that the coefficients of the s must be nonzero as, by the inductive hypothesis, all previous s are nonzero.)
Because is a linear combination of s, it is in the span of . Also, it is orthogonal to all the previous s: for all ,
here the first term is zero since the projection is orthogonal, and the remaining terms are 0 by the inductive hypothesis (all previous s are mutually orthogonal).
By induction, all the s are nonzero, in the span of the initial basis, and are mutually orthogonal, so also form a basis of the subspace.
4 Determinants
4.i Definition
Definition 4.1.1: An determinant is a function s.t.
- for
- for
- for any
- where is an identity matrix.
Common notation is .
Lemma 4.1.2: A matrix with two identical rows has a determinant of 0.
Lemma 4.1.3: A matrix with a zero row has a determinant of 0.
Lemma 4.1.4: A matrix is nonsingular iff its determinant is nonzero.
Proof:
Take a matrix with determinant zero, and carry out Gauss-Jordan reduction to get to . By Conditions (i) - (iii) of the determinant function, also; if were nonsingular then it would reduce to the identity matrix, which has determinant equal to 1; therefore, is singular. If were nonsingular, it must have a nonzero determinant to begin with.
Lemma 4.1.5: The determinant of an echelon form matrix is the product down the leading diagonal.
Proof: If the echelon matrix is singular then it has a zero row, so the determinant is zero which is the product down the leading diagonal.
If the echelon form matrix is nonsingular, then none of its diagonal entries are zero, so we can divide by those entries and use Condition (iii) to get 1s on the diagonal. Then we eliminate the nondiagonal entries, which by Condition (i) does not change the determinant, to get to the identity matrix, which has determinant 1. So, in this case we also get the determinant being equal to the product down the diagonal.
Lemma 4.1.6: For each , if there is an determinant function then it is unique.
Proof:
Suppose there are two functions satisfying the properties of Definition 4.1.1 and its consequences Lemmas 4.1.2 - 4.1.5. Given a square matrix M, fix some way of performing Gauss’s method to bring the matrix to echelon form. By using this fixed reduction, we can compute the value that these two functions must return on , and they must return the same value (by Lemma 4.1.5). Since they give the same output on every input, they are the same function.
Definition 4.1.7: Let be a vector space. A map is multilinear if
Lemma 4.1.8: Determinants are multilinear.
Proof:
Property (ii) here is just Condition (iii) in Definition 4.1.1, so we need only verify property (i).
TODO
Corollary 4.1.9: Determinants can be expressed as a sum of determinants of matrices with only one nonzero value in each row, or as a linear combination of permutation matrices.
Example:
Definition 4.1.10: An -permutation is a bijective function on the first positive integers .
In a permutation, each number is an output associated with exactly one input. We sometimes denote a permutation as the sequence .
Definition 4.1.11: The permutation expansion for determinants is
where are all of the -permutations, and is the permutation matrix corresponding to the permutation .
This is derived from using both conditions of multilinearity to break up the matrix into multiple matrices corresponding to each permutation, and then extracting the relevant coefficients.
Theorem 4.1.12: For each there is an determinant functions.
Proof:
By Lemma 4.1.19.
Theorem 4.1.13: The determinant of a matrix is equal to the determinant of its tranpose.
Proof:
TODO (in Hefferon)
Corollary 4.1.14: A matrix with two equal columns is singular. Column swaps change the sign of a determinant. Determinants are multilinear in their columns.
Proof:
By transposition.
Lemma 4.1.16: A row-swap in a permutation matrix changes the number of inversions from even to odd, or from odd to even; it changes the parity of the number of inversions.
Proof:
If we swap adjacent rows, all other row relations remain unchanged, and we either create or remove an inversion between the two rows we changed. Hence the parity changes.
To swap non-adjacent rows and (with ), we can instead swap adjacent rows times (), followed by row swaps back up; this is the sum of two consecutive numbers which is odd, so the parity changes an odd number of times, so the parity changes.
Corollary 4.1.18: If a permutation matrix has an odd number of inversions then swapping it to the identity takes an odd number of swaps. If it has an even number of inversions that swapping to the identity takes an even number.
Proof:
The identity matrix has zero inversions. To change an odd number to zero requires an odd number of swaps, and to change an even number to zero requires an even number of swaps.
Lemma 4.1.19: The function s.t.
is a determinant.
Proof:
We check the four conditions from Definition 4.1.1.
Condition (iv) holds because in the expansion of , the only nonzero term is the one corresponding to the identity permutation, which has a signum of 1, hence the determinant is 1.
For Condition (iii), suppose . Then
For Condition (i), suppose . Then
since the first sum is the determinant of the matrix with row the same as row .
Condition (ii) follows from Condition (i).
4.ii Geometry of determinants
Definition 4.2.1: In the box or parallelepiped formed by is the set .
Then the signed area of the box, denoted , is .
Theorem 4.2.3: A transformation changes the size of all boxes by the same factor, namely the size of the image of a box is times the size of the box where is the matrix representing with respect to the standard basis.
That is, the determinant of a product is the product of the determinants; .
Proof:
Suppose is singular and thus does not have an inverse. Note that if is invertible then there is an s.t. , so , so is invertible. The contrapositive of that observation is that if is not invertible then neither is - if then .
Then suppose that is nonsingular. Any nonsingular matrix factorises into a product of elementary matrices . Then if for all matrices and elementary matrices , then .
TODO : prove that determinants of elementary matrices are multiplicative.
Cramer’s rule Think about a linear system as being a set of vectors forming a parallelogram, and a target point being the far point of a larger, stretched parallelogram. Then we want to find the factors by which we scale the parallelogram to get the larger one. TODO : diagram note: this is not very efficient.
4.iii Laplace’s expansion
Definition 4.3.1: For any matrix , the matrix formed by deleting row and column of is the - minor of .
The - cofactor of is times the determinant of the minor of .
Theorem 4.3.2 (Laplace’s formula): Where is an matrix, we can find the determinant by expanding by cofactors on any row or column .
Proof:
TODO (exercise in Hefferon)
Definition 4.3.3: The matrix adjoint (or adjugate ) to the square matrix is
Note the entries are transposed from what we’d expect - .
Theorem 4.3.4: Where is a square matrix, . Thus if has an inverse, i.e. , then .
Proof:
The diagonal elements of are all equal to , as they are Laplace expansions, .
Then an element in column and row , with , is equal to , which is the Laplace expansion for a matrix with row equal to row , which must be equal to zero.
Hence .
5 Similarity
5.i Complex vector spaces
From now on, scalars will be complex.
Theorem 5.1.1:
Let be a polynomial. If is a non-zero polynomial then there are quotient and remainder polynomials s.t. where the degree of is strictly less than the degree of .
Corollary 5.1.2: The remainder when is divided by is the constant polynomial .
Proof:
The remainder must be a constant polynomial as it has a degree less than one. Note that ; substituting in gives , so ; but is a constant so for all .
Corollary 5.1.3: If is a root of the polynomial , then is a factor of .
5.ii Similarity
Theorem 5.2.2: Similarity is an equivalence relation.
Proof:
Similarity is reflexive: .
Similarity is symmetric: if , then .
Similarity is transitive: if and , then .
Example: Proposition: This matrix is not diagonalisable: Proof: Suppose that there exists a diagonal matrix . Then consider But , hence . Hence is the zero matrix, but the only matrix that is similar to the zero matrix is the zero matrix. Contradiction.
Lemma 5.2.4: A transformation is diagonalisable iff there is a basis and scalar s.t. for each .
Proof:
Consider a diagonal representation matrix
Then for a basis element ,
Lemma 5.2.8: A linear transformation on a nontrivial vector space has at least one eigenvalue.
Lemma 5.2.10: An eigenspace is a nontrivial subspace of .
Proof:
Note that is nonempty; it contains the zero vector since .
Also note that contains a nonzero vector because by definition is an eigenvalue iff has a nontrivial solution for .
Finally, is closed under linear combinations:
hence maps elements of to elements of .
Theorem 5.2.11: For any set of distinct eigenvalues of a map or matrix, a set of associated eigenvectors, one per eigenvalue, is linearly independent.
Proof:
By induction on the number of eigenvalues.
For zero eigenvalues, this is trivially true as the set of associated eigenvectors is empty.
Assume true for distinct eigenvalues.
If there are distinct eigenvalues, , and let be associated eigenvectors. Suppose that . From this we derive two equations: by applying to both sides, ; and by multiplying by , giving . Subtract these equations to get . Then the inductive hypothesis tells us that these vectors are all linearly independent, so there is only a trivial solution to this equation, i.e. , hence must be zero, so the vectors are linearly independent.
Since true for if true for , and true for , true for any number of eigenvalues by principle of induction.
Corollary 5.2.12: An matrix with distinct eigenvalues is diagonalisable.
Proof:
Form a basis of eigenvalues and apply Lemma 5.2.4.
6 Matrix factorisation
6.i LU factorisation
Definition 6.1.1: For an matrix, we take a series of “macro” transformations (made up of the product of some elementary matrices) that reduce to echelon form, for now assuming that no row swaps are necessary. Let , , then the LU factorisation of is .
We end up with being lower triangular with along the diagonal and being upper triangular.
What follows is the general procedure for LU factorisation.
Let denote the th column of the matrix before (macro-)step . Then should be chosen so that
To do this, we subtract times row from row , where .
Then define . Then can be written
Remark:
This looks like the dot product of and , but is the other way round! This just defines to be the identity matrix, with the th column being .
By the sparsity pattern of , we have . Therefore
so - that is, inverting is done by negating the subdiagonal entries.
Remark:
Negating the subdiagonal entries does not in general compute the inverse of a lower-triangular matrix - but if the only non-zero subdiagonal entries are all in the same column, it will work.
Now considering and taking into consideration the sparsity patterns of the , and so , i.e. the entries in are just the nonzero subdiagonal entries in the s.
Definition 6.1.2:(the LU algorithm)
where denotes the subrow .
Corollary 6.1.3 (Solving linear systems via LU factorisation):
Suppose we have a linear system and an LU factorisation . Then
so we can solve the system by first solving and then .
Thanks to the sparsity pattern of , the system can be efficiently solved by forward substitution (in the natural way, top down) (?). Similarly, can be solved by backwards substitution (bottom-up).
Definition 6.1.4: The LU algorithm can give large errors when using floating point arithmetic when we restrict to no row swaps, even when not dividing by 0 (consider if we replaced a zero in the top left with , then we get something that is pretty close in the pure case, but the floating-point errors are unacceptable). So row swaps are necessary for stability. With row swaps (pivoting), we get PLU factorisation .
But we might want a different pivot, if is zero or small; so carry out a row swap such that the largest value in is the pivot. Recall that a row swap is represented by a permutation matrix.
Now the multipliers in each macro step are at most by absolute value.
Then we have , but in fact we can separate the s and s and get that where , and is a lower triangular matrix related to the s, as described below.
Lemma 6.1.6:
Given , we can rearrange the row operations to give where . That is, the row operations can be reordered, with the row additions having their subdiagonal entries permuted.
Proof:
Definition 6.1.8:(Algorithm for PLU factorisation with partial pivoting)
Corollary 6.1.9 (Solving linear systems using PLU factorisation):
6.ii QR factorisation
- factorises into where ‘s columns are orthogonal, and is upper-diagonal (or, Right diagonal?)
Recall that if vectors in a set are nonzero and pairwise orthogonal, then that set is linearly independent.
Recall also that and .
We extend orthogonality to matrices.
Lemma 6.2.2: Multiplication by an orthogonal matrix preserves the length of vectors.
Proof:
Lemma 6.2.3: Similarly, angles are preserved under multiplication by an orthonormal matrix.
Proof:
Theorem 6.2.5: Every matrix has a QR factorisation.
Proof: Recall the Gram-Schmidt process. We can define a slightly different process to produce an orthonormal basis for vectors :
Defining for , and , then we have
Rearranging gives
so we have now obtained a QR factorisation of , :
Remark: The algorithm can be written as:
for j := 1 to n
qj = aj
for i := 1 to (j - 1)
r_ij := qi^T aj
qj := qj - r_ij qi
r_jj := abs(qj)
qj := qj / r_jj
Remark: The classical Gram-Schmidt process is numerically unstable! So we can use the modified Gram-Schmidt process instead.
for i := 1 to n
vi := ai
for i := 1 to n
r_ii := abs(vi)
qi = vi / r_ii
for j := (i + 1) to n
r_ij := qi^T vj
vj := vj - r_ij qi
This is mathematically equivalent, and has the same flop count, but is more numerically stable: each starts as , and then has the (for ) components removed as soon as they become available, so the s aren’t “disturbed” by the presence of already removed components.
Remark: Gram-Schmidt (for thin QR factorisation) can be viewed as triangular orthogonalisation: , i.e. creating an orthogonal matrix via multiplication by triangular matrices.
Other methods (for full QR factorisation) proceed by orthogonal triangularisation: , i.e. creating a triangular matrix via multiplication by orthogonal matrices.
Such methods use Householder reflections or Givens rotations . They are numerically stable (because multiplying by orthogonal matrices tends to be numerically stable). Flop count (Householder and ) is about .
Solving linear systems via QR factorisation
Consider a square system with nonsingular, and a QR factorisation . Then
which can be solved easily by substitution as is upper triangular.
The overall cost is dominated by the factorisation; this is more expensive than LU, but has better numerical stability.
Least squares
Consider with : more equations than unknowns ( overdetermined ).
We assume that has full column rank . In general the system cannot be solved exactly.
The least squares problem is to find such that is minimised. I.e. we want to find such that is orthogonal to the (hyper-)plane spanned by the columns of ; if it is not orthogonal, then the residual will be longer than necessary. Alternatively, should run parallel to the plane spanned by the columns of A, directly “underneath” . Equivalently:
we call this the normal equation .
Lemma 6.2.7: If has full column rank , then is nonsingular.
Proof:
We show that the nullspace of is trivial.
Let with . Then , so . Since the columns of are linear independent, it follows that .
Hence the normal equation can be solved uniquely, so there is a unique solution to the least squares problem.
We can do this by QR factorisation.
If , then
Since has full column rank, the matrix is nonsingular. Thus, multiplying with the inverse of gives the system
which can be solved easily by back-substitution.
Remark: The least-squares method can be used to find a linear regression - we want to minimise the sum of the squares of the errors, where
to find the best regression line .
Although in two dimensions, QR factorisation probably isn’t the easiest way to solve it, for higher-dimension regressions, it might be useful. Useful for machine learning!
6.iii Norms and Singular Value Decomposition
Definition 6.3.1: The 2-norm of a vector is equal to the length, or Euclidean distance, of the vector:
Definition 6.3.2: More generally, a vector norm is a function that satisfies :
- with equality iff ;
- ;
- .
Definition 6.3.3: A commonly used norm is the -norm :
Definition 6.3.5: A vector norm induces an induced matrix norm :
where denotes the supremum of a set; that is, a maximal element which is not necessarily contained within the set. For finite sets, this is the same as .
Intuitively, this is the largest amount by which the norm of a vector can be scaled by multiplying with .
Lemma 6.3.6: For any , for a vector norm and its induced matrix norm .
Proof:
By the definition of an induced matrix norm, , hence .
Lemma 6.3.7: Let denote an induced matrix norm. Then .
Proof:
Definition 6.3.9: The singular value decomposition (SVD) of a matrix (where ) is a factorisation , where and are orthogonal, and is diagonal. The diagonal entries are called singular values of .
The (pairwise orthogonal) columns of are the left-singular vectors of the SVD, and the (pairwise orthogonal) columns of are the right-singular vectors of the factorisation.
The SVD describes the action of A as a series of 3 transformations:
- rotation/reflection by ;
- scaling along the axes by ;
- another rotation/reflection, by .
Lemma 6.3.11: The action of on the right-singular vectors can be described by for .
Proof:
Theorem 6.3.12: The map defined by a matrix is the map represented by the diagonal matrix , with respect to the domain and codomain bases consisting of the left-singular and right-singular vectors, respectively.
Proof:
Let for some .
Let be the representation of in the basis of the right singular vectors, , s.t. .
Similarly, let be the representation of in the basis of the left-singular vectors, , s.t. .
Then
Theorem 6.3.13: Every matrix has an SVD, with uniquely determined singular values s. Further, if is square and the s are distinct, the left- and right-singular vectors are uniquely determined up to signs.
Proof:
TODO (in Trefethon & Bau)
Theorems 6.3.14 - 6.3.20 link properties of a matrix to properties of its SVD, given that we let be the number of nonzero singular values of , and that we have .
Theorem 6.3.14: The rank of is .
Theorem 6.3.15: is an orthonormal basis of ‘s column space. Similarly, is an orthonormal basis of ‘s nullspace.
Proof:
This follows from the fact that and ; the left- and right-multiplications of and respectively give the desired bases, which are orthonormal because are orthogonal.
Theorem 6.3.16: where is the largest singular value of .
Proof:
Theorem 6.3.17: The nonzero singular values of are the square roots of the nonzero eigenvalues of .
Proof:
hence is similar to so has the same eigenvalues as , which are .
Lemma 6.3.18: The determinant of any orthogonal matrix is 1.
Proof:
Theorem 6.3.19: Let be a square matrix. Then
Proof:
Theorem 6.3.20 (Low-rank approximation): For any , define
Then
where denotes the infimum (equivalent to minimum for finite sets); that is, is a low-rank approximation of , where the Euclidean distance between and is .
Proof:
TODO