Large Linear Systems¶
This is the age of Big Data. Every second of every day, data is being recorded in countless systems over the world. Our shopping habits, book and movie preferences, key words typed into our email messages, medical records, NSA recordings of our telephone calls, genomic data - and none of it is any use without analysis.
Enormous data sets carry with them enormous challenges in data processing. Solving a system of
equations in
unknowns is easy, and one need not be terribly careful about methodolgy. But as the size of the system grows, algorithmic complexity and efficiency become critical.
Example: Netflix Competition (circa 2006-2009)¶
For a more complete description:
http://en.wikipedia.org/wiki/Netflix_Prize
The whole technical story
http://www.stat.osu.edu/~dmsl/GrandPrize2009_BPC_BigChaos.pdf
In 2006, Netflix opened a competition where it provided ratings of over
for
movies. The goal was to make predict a user’s rating of a movie, based on previous ratings and ratings of ‘similar’ users. The task amounted to analysis of a
matrix! The wikipedia link above describes the contest and the second link is a very detailed description of the method (which took into account important characteristics such as how tastes may change over time). Part of the analysis is related to matrix decomposition - we won’t go into the details of the winning algorithm, but we will spend some time on basic matrix decompositions.
Matrix Decompositions¶
Matrix decompositions are an important step in solving linear systems in a computationally efficient manner.
LU Decomposition and Gaussian Elimination¶
LU stands for ‘Lower Upper’, and so an LU decomposition of a matrix
is a decomposition so that
where
is lower triangular and
is upper triangular.
Now, LU decomposition is essentially gaussian elimination, but we work only with the matrix
(as opposed to the augmented matrix).
Let’s review how gaussian elimination (ge) works. We will deal with a
system of equations for conciseness, but everything here generalizes to the
case. Consider the following equation:
For simplicity, let us assume that the leftmost matrix
is non-singular. To solve the system using ge, we start with the ‘augmented matrix’:
We begin at the first entry,
. If
, then we divide the first row by
and then subtract the appropriate multiple of the first row from each of the other rows, zeroing out the first entry of all rows. (If
is zero, we need to permute rows. We will not go into detail of that here.) The result is as follows:
We repeat the procedure for the second row, first dividing by the leading entry, then subtracting the appropriate multiple of the resulting row from each of the third and first rows, so that the second entry in row 1 and in row 3 are zero. We could continue until the matrix on the left is the identity. In that case, we can then just ‘read off’ the solution: i.e., the vector
is the resulting column vector on the right. Usually, it is more efficient to stop at reduced row eschelon form (upper triangular, with ones on the diagonal), and then use back substitution to obtain the final answer.
Note that in some cases, it is necessary to permute rows to obtain reduced row eschelon form. This is called partial pivoting. If we also manipulate columns, that is called full pivoting.
It should be mentioned that we may obtain the inverse of a matrix using ge, by reducing the matrix
to the identity, with the identity matrix as the augmented portion.
Now, this is all fine when we are solving a system one time, for one outcome
. Many applications involve solutions to multiple problems, where the left-hand-side of our matrix equation does not change, but there are many outcome vectors
. In this case, it is more efficient to decompose
.
First, we start just as in ge, but we ‘keep track’ of the various multiples required to eliminate entries. For example, consider the matrix
We need to multiply row
by
and subtract from row
to eliminate the first entry in row
, and then multiply row
by
and subtract from row
. Instead of entering zeroes into the first entries of rows
and
, we record the multiples required for their elimination, as so:
And then we eliminate the second entry in the third row:
And now we have the decomposition:
We can solve the system by solving two back-substitution problems:
and
These are both
, so it is more efficient to decompose when there are multiple outcomes to solve for.
Let do this with numpy:
import numpy as np
import scipy.linalg as la
np.set_printoptions(suppress=True)
A = np.array([[1,3,4],[2,1,3],[4,1,2]])
print(A)
P, L, U = la.lu(A)
print(np.dot(P.T, A))
print
print(np.dot(L, U))
print(P)
print(L)
print(U)
[[1 3 4]
[2 1 3]
[4 1 2]]
[[ 4. 1. 2.]
[ 1. 3. 4.]
[ 2. 1. 3.]]
[[ 4. 1. 2.]
[ 1. 3. 4.]
[ 2. 1. 3.]]
[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]]
[[ 1. 0. 0. ]
[ 0.25 1. 0. ]
[ 0.5 0.1818 1. ]]
[[ 4. 1. 2. ]
[ 0. 2.75 3.5 ]
[ 0. 0. 1.3636]]
Note that the numpy decomposition uses partial pivoting (matrix rows are permuted to use the largest pivot). This is because small pivots can lead to numerical instability. Another reason why one should use library functions whenever possible!
Cholesky Decomposition¶
Recall that a square matrix
is positive definite if
for any non-zero n-dimensional vector
,
and a symmetric, positive-definite matrix
is a positive-definite matrix such that
Let
be a symmetric, positive-definite matrix. There is a unique decomposition such that
where
is lower-triangular with positive diagonal elements and
is its transpose. This decomposition is known as the Cholesky decompostion, and
may be interpreted as the ‘square root’ of the matrix
.
Algorithm:¶
Let
be an
matrix. We find the matri
using the following iterative procedure:
1.) Let
2.)
3.) Solve
for
Example:¶
And so we conclude that
.
This yields the decomposition:
Now, with numpy:
A = np.array([[1,3,5],[3,13,23],[5,23,42]])
L = la.cholesky(A)
print(np.dot(L.T, L))
print(L)
print(A)
[[ 1. 3. 5.]
[ 3. 13. 23.]
[ 5. 23. 42.]]
[[ 1. 3. 5.]
[ 0. 2. 4.]
[ 0. 0. 1.]]
[[ 1 3 5]
[ 3 13 23]
[ 5 23 42]]
Cholesky decomposition is about twice as fast as LU decomposition (though both scale as
).
Matrix Decompositions for PCA and Least Squares¶
Eigendecomposition¶
Eigenvectors and Eigenvalues¶
First recall that an eigenvector of a matrix
is a non-zero vector
such that
for some scalar
The value
is called an eigenvalue of
.
If an
matrix
has
linearly independent eigenvectors, then
may be decomposed in the following manner:
where
is a diagonal matrix whose diagonal entries are the eigenvalues of
and the columns of
are the corresponding eigenvectors of
.
Facts:
- An
matrix is diagonizable
it has
linearly independent eigenvectors.
- A symmetric, positive definite matrix has only positive eigenvalues and its eigendecomposition
is via an orthogonal transformation
. (I.e. its eigenvectors are an orthonormal set)
Calculating Eigenvalues¶
It is easy to see from the definition that if
is an eigenvector of an
matrix
with eigenvalue
, then
where
is the identity matrix of dimension
and
is an n-dimensional zero vector. Therefore, the eigenvalues of
satisfy:
The left-hand side above is a polynomial in
, and is called the characteristic polynomial of
. Thus, to find the eigenvalues of
, we find the roots of the characteristic polynomial.
Computationally, however, computing the characteristic polynomial and then solving for the roots is prohibitively expensive. Therefore, in practice, numerical methods are used - both to find eigenvalues and their corresponding eigenvectors. We won’t go into the specifics of the algorithms used to calculate eigenvalues, but here is a numpy example:
A = np.array([[0,1,1],[2,1,0],[3,4,5]])
u, V = la.eig(A)
print(np.dot(V,np.dot(np.diag(u), la.inv(V))))
print(u)
[[-0.+0.j 1.+0.j 1.+0.j]
[ 2.+0.j 1.+0.j 0.+0.j]
[ 3.+0.j 4.+0.j 5.+0.j]]
[ 5.8541+0.j -0.8541+0.j 1.0000+0.j]
NB: Many matrices are not diagonizable, and many have complex eigenvalues (even if all entries are real).
A = np.array([[0,1],[-1,0]])
print(A)
u, V = la.eig(A)
print(np.dot(V,np.dot(np.diag(u), la.inv(V))))
print(u)
[[ 0 1]
[-1 0]]
[[ 0.+0.j 1.+0.j]
[-1.+0.j 0.+0.j]]
[ 0.+1.j 0.-1.j]
# If you know the eigenvalues must be reeal
# because A is a positive definite (e.g. covariance) matrix
# use real_if_close
A = np.array([[0,1,1],[2,1,0],[3,4,5]])
u, V = la.eig(A)
print(u)
print np.real_if_close(u)
[ 5.8541+0.j -0.8541+0.j 1.0000+0.j]
[ 5.8541 -0.8541 1. ]
Singular Values¶
For any
matrix
, we define its singular values to be the square root of the eigenvalues of
. These are well-defined as
is always symmetric, positive-definite, so its eigenvalues are real and positive. Singular values are important properties of a matrix. Geometrically, a matrix
maps the unit sphere in
to an ellipse. The singular values are the lengths of the semi-axes.
Singular values also provide a measure of the stabilty of a matrix. We’ll revisit this in the end of the lecture.
QR decompositon¶
As with the previous decompositions,
decomposition is a method to write a matrix
as the product of two matrices of simpler form. In this case, we want:
where
is an
matrix with
(i.e.
is orthogonal) and
is an
upper-triangular matrix.
This is really just the matrix form of the Gram-Schmidt orthogonalization of the columns of
. The G-S algorithm itself is unstable, so various other methods have been developed to compute the QR decomposition. We won’t cover those in detail as they are a bit beyond our scope.
The first
columns of
are an orthonormal basis for the column space of the first
columns of
.
Iterative QR decomposition is often used in the computation of eigenvalues.
Singular Value Decomposition¶
Another important matrix decomposition is singular value decomposition or SVD. For any
matrix
, we may write:
where
is a unitary (orthogonal in the real case)
matrix,
is a rectangular, diagonal
matrix with diagonal entries
all non-negative.
is a unitary (orthogonal)
matrix. SVD is used in principle component analysis and in the computation of the Moore-Penrose pseudo-inverse.
Stabilty and Condition Number¶
It is important that numerical algorithms be stable and efficient. Efficiency is a property of an algorithm, but stability can be a property of the system itself.
Example¶
A = np.array([[8,6,4,1],[1,4,5,1],[8,4,1,1],[1,4,3,6]])
b = np.array([19,11,14,14])
la.solve(A,b)
b = np.array([19.01,11.05,14.07,14.05])
la.solve(A,b)
array([-2.34 , 9.745, -4.85 , -1.34 ])
Note that the tiny perturbations in the outcome vector
cause large differences in the solution! When this happens, we say that the matrix
ill-conditioned. This happens when a matrix is ‘close’ to being singular (i.e. non-invertible).
Condition Number¶
A measure of this type of behavior is called the condition number. It is defined as:
In general, it is difficult to compute.
Fact:
where
is the maximum singular value of
and
is the smallest. The higher the condition number, the more unstable the system. In general if there is a large discrepancy between minimal and maximal singular values, the condition number is large.
Example¶
U, s, V = np.linalg.svd(A)
print(s)
print(max(s)/min(s))
[ 15.5457 6.9002 3.8363 0.0049]
3198.6725812
Preconditioning¶
We can sometimes improve on this behavior by ‘pre-conditioning’. Instead of solving
we solve
itself.
Preconditioning is a very involved topic, quite out of the range of this course. It is mentioned here only to make you aware that such a thing exists, should you ever run into an ill-conditioned problem!
Exercises¶
1. Compute the LU decomposition of the following matrix by hand and using numpy
A = np.array([[1,2,3],[2,-4,6],[3,-9,-3]])
print(A)
P, L , U = la.lu(A)
print(P)
print(L)
print(U)
[[ 1 2 3]
[ 2 -4 6]
[ 3 -9 -3]]
[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]]
[[ 1. 0. 0. ]
[ 0.3333 1. 0. ]
[ 0.6667 0.4 1. ]]
[[ 3. -9. -3. ]
[ 0. 5. 4. ]
[ 0. 0. 6.4]]
2. Compute the Cholesky decomposition of the following matrix by hand and using numpy
# Your code here
A=np.array([[4,2,3],[2,4,5],[3,5,8]])
np.linalg.cholesky(A)
array([[ 2. , 0. , 0. ],
[ 1. , 1.7321, 0. ],
[ 1.5 , 2.0207, 1.291 ]])
3. Write a function in Python to solve a system
using SVD decomposition. Your function should take
and
as input and return
.
Your function should include the following:
- First, check that
is invertible - return error message if it is not
- Invert
using SVD and solve
- return
Test your function for correctness.
# Your code here
def svdsolver(A,b):
U, s, V = np.linalg.svd(A)
if np.prod(s) == 0:
print("Matrix is singular")
else:
return np.dot(np.dot((V.T).dot(np.diag(s**(-1))), U.T),b)
A = np.array([[1,1],[1,2]])
b = np.array([3,1])
print(np.linalg.solve(A,b))
print(svdsolver(A,b))