Large Linear Systems¶

This is the age of Big Data. Every second of every day, data is being recorded in countless systems over the world. Our shopping habits, book and movie preferences, key words typed into our email messages, medical records, NSA recordings of our telephone calls, genomic data - and none of it is any use without analysis.

Enormous data sets carry with them enormous challenges in data processing. Solving a system of

equations in

unknowns is easy, and one need not be terribly careful about methodolgy. But as the size of the system grows, algorithmic complexity and efficiency become critical.

Example: Netflix Competition (circa 2006-2009)¶

For a more complete description:

http://en.wikipedia.org/wiki/Netflix_Prize

The whole technical story

http://www.stat.osu.edu/~dmsl/GrandPrize2009_BPC_BigChaos.pdf

In 2006, Netflix opened a competition where it provided ratings of over

for

movies. The goal was to make predict a user’s rating of a movie, based on previous ratings and ratings of ‘similar’ users. The task amounted to analysis of a

matrix! The wikipedia link above describes the contest and the second link is a very detailed description of the method (which took into account important characteristics such as how tastes may change over time). Part of the analysis is related to matrix decomposition - we won’t go into the details of the winning algorithm, but we will spend some time on basic matrix decompositions.

Matrix Decompositions¶

Matrix decompositions are an important step in solving linear systems in a computationally efficient manner.

LU Decomposition and Gaussian Elimination¶

LU stands for ‘Lower Upper’, and so an LU decomposition of a matrix

is a decomposition so that

where

is lower triangular and

is upper triangular.

Now, LU decomposition is essentially gaussian elimination, but we work only with the matrix

(as opposed to the augmented matrix).

Let’s review how gaussian elimination (ge) works. We will deal with a

system of equations for conciseness, but everything here generalizes to the

case. Consider the following equation:

For simplicity, let us assume that the leftmost matrix

is non-singular. To solve the system using ge, we start with the ‘augmented matrix’:

We begin at the first entry,

. If

, then we divide the first row by

and then subtract the appropriate multiple of the first row from each of the other rows, zeroing out the first entry of all rows. (If

is zero, we need to permute rows. We will not go into detail of that here.) The result is as follows:

We repeat the procedure for the second row, first dividing by the leading entry, then subtracting the appropriate multiple of the resulting row from each of the third and first rows, so that the second entry in row 1 and in row 3 are zero. We could continue until the matrix on the left is the identity. In that case, we can then just ‘read off’ the solution: i.e., the vector

is the resulting column vector on the right. Usually, it is more efficient to stop at reduced row eschelon form (upper triangular, with ones on the diagonal), and then use back substitution to obtain the final answer.

Note that in some cases, it is necessary to permute rows to obtain reduced row eschelon form. This is called partial pivoting. If we also manipulate columns, that is called full pivoting.

It should be mentioned that we may obtain the inverse of a matrix using ge, by reducing the matrix

to the identity, with the identity matrix as the augmented portion.

Now, this is all fine when we are solving a system one time, for one outcome

. Many applications involve solutions to multiple problems, where the left-hand-side of our matrix equation does not change, but there are many outcome vectors

. In this case, it is more efficient to decompose

.

First, we start just as in ge, but we ‘keep track’ of the various multiples required to eliminate entries. For example, consider the matrix

We need to multiply row

by

and subtract from row

to eliminate the first entry in row

, and then multiply row

by

and subtract from row

. Instead of entering zeroes into the first entries of rows

and

, we record the multiples required for their elimination, as so:

And then we eliminate the second entry in the third row:

And now we have the decomposition:

We can solve the system by solving two back-substitution problems:

and

These are both

, so it is more efficient to decompose when there are multiple outcomes to solve for.

Let do this with numpy:

import numpy as np
import scipy.linalg as la
np.set_printoptions(suppress=True)
 
A = np.array([[1,3,4],[2,1,3],[4,1,2]])
 
print(A)
 
P, L, U = la.lu(A)
print(np.dot(P.T, A))
print
print(np.dot(L, U))
print(P)
print(L)
print(U)
[[1 3 4]
 [2 1 3]
 [4 1 2]]
[[ 4.  1.  2.]
 [ 1.  3.  4.]
 [ 2.  1.  3.]]
 
[[ 4.  1.  2.]
 [ 1.  3.  4.]
 [ 2.  1.  3.]]
[[ 0.  1.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]]
[[ 1.      0.      0.    ]
 [ 0.25    1.      0.    ]
 [ 0.5     0.1818  1.    ]]
[[ 4.      1.      2.    ]
 [ 0.      2.75    3.5   ]
 [ 0.      0.      1.3636]]

Note that the numpy decomposition uses partial pivoting (matrix rows are permuted to use the largest pivot). This is because small pivots can lead to numerical instability. Another reason why one should use library functions whenever possible!

Cholesky Decomposition¶

Recall that a square matrix

is positive definite if

for any non-zero n-dimensional vector

,

and a symmetric, positive-definite matrix

is a positive-definite matrix such that

Let

be a symmetric, positive-definite matrix. There is a unique decomposition such that

where

is lower-triangular with positive diagonal elements and

is its transpose. This decomposition is known as the Cholesky decompostion, and

may be interpreted as the ‘square root’ of the matrix

.

Algorithm:¶

Let

be an

matrix. We find the matri

using the following iterative procedure:

1.) Let

2.)

3.) Solve

for

Example:¶

And so we conclude that

.

This yields the decomposition:

Now, with numpy:

A = np.array([[1,3,5],[3,13,23],[5,23,42]])
L = la.cholesky(A)
print(np.dot(L.T, L))
 
print(L)
print(A)
[[  1.   3.   5.]
 [  3.  13.  23.]
 [  5.  23.  42.]]
[[ 1.  3.  5.]
 [ 0.  2.  4.]
 [ 0.  0.  1.]]
[[ 1  3  5]
 [ 3 13 23]
 [ 5 23 42]]

Cholesky decomposition is about twice as fast as LU decomposition (though both scale as

).

Matrix Decompositions for PCA and Least Squares¶

Eigendecomposition¶

Eigenvectors and Eigenvalues¶

First recall that an eigenvector of a matrix

is a non-zero vector

such that

for some scalar

The value

is called an eigenvalue of

.

If an

matrix

has

linearly independent eigenvectors, then

may be decomposed in the following manner:

where

is a diagonal matrix whose diagonal entries are the eigenvalues of

and the columns of

are the corresponding eigenvectors of

.

Facts:

  • An

matrix is diagonizable

it has

linearly independent eigenvectors.

  • A symmetric, positive definite matrix has only positive eigenvalues and its eigendecomposition

is via an orthogonal transformation

. (I.e. its eigenvectors are an orthonormal set)

Calculating Eigenvalues¶

It is easy to see from the definition that if

is an eigenvector of an

matrix

with eigenvalue

, then

where

is the identity matrix of dimension

and

is an n-dimensional zero vector. Therefore, the eigenvalues of

satisfy:

The left-hand side above is a polynomial in

, and is called the characteristic polynomial of

. Thus, to find the eigenvalues of

, we find the roots of the characteristic polynomial.

Computationally, however, computing the characteristic polynomial and then solving for the roots is prohibitively expensive. Therefore, in practice, numerical methods are used - both to find eigenvalues and their corresponding eigenvectors. We won’t go into the specifics of the algorithms used to calculate eigenvalues, but here is a numpy example:

A = np.array([[0,1,1],[2,1,0],[3,4,5]])
 
u, V = la.eig(A)
print(np.dot(V,np.dot(np.diag(u), la.inv(V))))
print(u)
[[-0.+0.j  1.+0.j  1.+0.j]
 [ 2.+0.j  1.+0.j  0.+0.j]
 [ 3.+0.j  4.+0.j  5.+0.j]]
[ 5.8541+0.j -0.8541+0.j  1.0000+0.j]

NB: Many matrices are not diagonizable, and many have complex eigenvalues (even if all entries are real).

A = np.array([[0,1],[-1,0]])
print(A)
 
u, V = la.eig(A)
print(np.dot(V,np.dot(np.diag(u), la.inv(V))))
print(u)
[[ 0  1]
 [-1  0]]
[[ 0.+0.j  1.+0.j]
 [-1.+0.j  0.+0.j]]
[ 0.+1.j  0.-1.j]
# If you know the eigenvalues must be reeal
# because A is a positive definite (e.g. covariance) matrix
# use real_if_close
 
A = np.array([[0,1,1],[2,1,0],[3,4,5]])
u, V = la.eig(A)
print(u)
print np.real_if_close(u)
[ 5.8541+0.j -0.8541+0.j  1.0000+0.j]
[ 5.8541 -0.8541  1.    ]

Singular Values¶

For any

matrix

, we define its singular values to be the square root of the eigenvalues of

. These are well-defined as

is always symmetric, positive-definite, so its eigenvalues are real and positive. Singular values are important properties of a matrix. Geometrically, a matrix

maps the unit sphere in

to an ellipse. The singular values are the lengths of the semi-axes.

Singular values also provide a measure of the stabilty of a matrix. We’ll revisit this in the end of the lecture.

QR decompositon¶

As with the previous decompositions,

decomposition is a method to write a matrix

as the product of two matrices of simpler form. In this case, we want:

where

is an

matrix with

(i.e.

is orthogonal) and

is an

upper-triangular matrix.

This is really just the matrix form of the Gram-Schmidt orthogonalization of the columns of

. The G-S algorithm itself is unstable, so various other methods have been developed to compute the QR decomposition. We won’t cover those in detail as they are a bit beyond our scope.

The first

columns of

are an orthonormal basis for the column space of the first

columns of

.

Iterative QR decomposition is often used in the computation of eigenvalues.

Singular Value Decomposition¶

Another important matrix decomposition is singular value decomposition or SVD. For any

matrix

, we may write:

where

is a unitary (orthogonal in the real case)

matrix,

is a rectangular, diagonal

matrix with diagonal entries

all non-negative.

is a unitary (orthogonal)

matrix. SVD is used in principle component analysis and in the computation of the Moore-Penrose pseudo-inverse.

Stabilty and Condition Number¶

It is important that numerical algorithms be stable and efficient. Efficiency is a property of an algorithm, but stability can be a property of the system itself.

Example¶

A = np.array([[8,6,4,1],[1,4,5,1],[8,4,1,1],[1,4,3,6]])
b = np.array([19,11,14,14])
la.solve(A,b)
b = np.array([19.01,11.05,14.07,14.05])
la.solve(A,b)
array([-2.34 ,  9.745, -4.85 , -1.34 ])

Note that the tiny perturbations in the outcome vector

cause large differences in the solution! When this happens, we say that the matrix

ill-conditioned. This happens when a matrix is ‘close’ to being singular (i.e. non-invertible).

Condition Number¶

A measure of this type of behavior is called the condition number. It is defined as:

In general, it is difficult to compute.

Fact:

where

is the maximum singular value of

and

is the smallest. The higher the condition number, the more unstable the system. In general if there is a large discrepancy between minimal and maximal singular values, the condition number is large.

Example¶

U, s, V = np.linalg.svd(A)
print(s)
print(max(s)/min(s))
[ 15.5457   6.9002   3.8363   0.0049]
3198.6725812

Preconditioning¶

We can sometimes improve on this behavior by ‘pre-conditioning’. Instead of solving

we solve

itself.

Preconditioning is a very involved topic, quite out of the range of this course. It is mentioned here only to make you aware that such a thing exists, should you ever run into an ill-conditioned problem!

Exercises¶

1. Compute the LU decomposition of the following matrix by hand and using numpy

A = np.array([[1,2,3],[2,-4,6],[3,-9,-3]])
print(A)
P, L , U = la.lu(A)
print(P)
print(L)
print(U)
[[ 1  2  3]
 [ 2 -4  6]
 [ 3 -9 -3]]
[[ 0.  1.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]]
[[ 1.      0.      0.    ]
 [ 0.3333  1.      0.    ]
 [ 0.6667  0.4     1.    ]]
[[ 3.  -9.  -3. ]
 [ 0.   5.   4. ]
 [ 0.   0.   6.4]]

2. Compute the Cholesky decomposition of the following matrix by hand and using numpy

# Your code here
 
A=np.array([[4,2,3],[2,4,5],[3,5,8]])
np.linalg.cholesky(A)
array([[ 2.    ,  0.    ,  0.    ],
       [ 1.    ,  1.7321,  0.    ],
       [ 1.5   ,  2.0207,  1.291 ]])

3. Write a function in Python to solve a system

using SVD decomposition. Your function should take

and

as input and return

.

Your function should include the following:

  • First, check that

is invertible - return error message if it is not

  • Invert

using SVD and solve

  • return

Test your function for correctness.

# Your code here
 
def svdsolver(A,b):
    U, s, V = np.linalg.svd(A)
    if np.prod(s) == 0:
       print("Matrix is singular")
    else:
       return np.dot(np.dot((V.T).dot(np.diag(s**(-1))), U.T),b)
A = np.array([[1,1],[1,2]])
b = np.array([3,1])
print(np.linalg.solve(A,b))
print(svdsolver(A,b))