3.1 Gaussian elimination#
Recall in a linear algebra course that a system of linear equations can be solved by three types of elementary row operations (ERO):
Exchange two rows
Multiply a row by a nonzero constant
Add a multiple of a row to another row
None of the ERO will change the solution to the linear system.
Now we use a couple of examples to show how linear systems can be solved with ERO.
Solve the following linear system by (naive) Gaussian Elimination:
Solution.
We use matrix form and start with the augmented matrix:
Here the notation for the first ERO means multiplying the first row by \(\frac{3}{2}\) and subtract it from the second row (or equivalently, multiplying the first row by \(-\frac{3}{2}\) and add it to the second row), the second ERO means dividing the second row by \(-\frac{17}{2}\) (or multiplying the second row by \(-\frac{2}{17}\), the third means multiplying the second row by \(3\) and subtracting it from the first (or multiplying the second row by \(-3\) and add it to the first row), and the last means dividing the first row by \(2\) (or multiplying the first row by \(\frac{1}{2}\)).
The resulting linear system is in reduced echelon form from which one can directly read the solution:
Now we look at a \(3\times 3\) system.
Solve the following linear system by (naive) Gaussian Elimination:
Solution.
In matrix form, the system is:
Perform Gaussian elimination:
So the equivalent system of equations is:
The solution is
Gaussian elimination involves identifying a pivot position (the \((i,i)\) position on the diagonal) at each step, multiplying the row where pivot is by separate factors, and subtract the products from the rows below. Note that no zeros were encountered in the Gaussian elimination processes in the two examples above, and hence such Gaussian elimination is called “naive”. In addition, if echelon form, instead of reduced echelon form, is obtained from Gaussian elimination, a back substitution algorithm is needed to solve the upper triangular system for \(x\)’s from bottom to top. The back substitution algorithm solves for \(x_n\) from the last equation, \(x_{n-1}\) from the second to last equation, and all the way up, until \(x_1\) is solved for from the first equation. The following code performs back substitution for an upper triangular system, and the solution to the system is stored in place in \(\boldsymbol{b}\).
import numpy as np
def back_sub(A, b):
"""
Backward substitution
The right hand side b is changed in place to store the solution
A: the coefficient matrix of size n x n
b: the right hand side of size n
"""
if A.shape[0] != A.shape[1]:
print('Error: the given coefficient matrix is not square')
return
if A.shape[0] != b.size:
print('Error: the shape of the coefficient matrix does not match the size of the RHS')
return
n = A.shape[0]
for i in range(n-1, -1, -1):
for j in range(i+1, n):
b[i] = b[i] - A[i,j]*b[j]
b[i] = b[i]/A[i,i]
The following Python function performs Gaussian elimination, where the final upper triangular matrix is stored in \(A\), and the final right hand side is stored in \(b\), so that no additional storage is needed. However, if the original coefficient matrix and right hand side are needed later, one should make a copy of them before calling the function.
def GaussianElim(A, b):
"""
Gaussian Elimination without pivoting.
The coefficient matrix A and the right hand side b are modified in place.
Input:
A: the coefficient matrix of size n x n
b: the right hand side of size n
Return:
None. The solution is stored in b
"""
if A.shape[0] != A.shape[1]:
print('Error: the given coefficient matrix is not square')
return
if A.shape[0] != b.size:
print('Error: the shape of the coefficient matrix does not match the size of the RHS')
return
# We hard code the epsilon here. It can be an input parameter.
eps = 1e-5
n = A.shape[0]
for j in range(n-1):
if np.abs(A[j,j]) < eps:
print('Error: zero pivot encountered!')
return
for i in range(j+1, n):
# The multiplier
mp = A[i,j]/A[j,j]
for k in range(j+1,n):
A[i,k] = A[i,k] - mp*A[j,k]
b[i] = b[i] - mp*b[j]
# No need to return. Both A and b are changed in place
Now combined with back_sub, GaussianElim can solve linear systems. We apply the code to the linear system in Example 19.
A = np.array([[2., 3, -1], [4, -2, 3], [2, -1, 2]])
b = np.array([-3, 12, 7])
print('Solve the linear system: ')
print(A, 'x = ', b)
GaussianElim(A, b)
back_sub(A, b)
print('The solution is ', b)
Solve the linear system:
[[ 2. 3. -1.]
[ 4. -2. 3.]
[ 2. -1. 2.]] x = [-3 12 7]
The solution is [ 1 -1 2]
The code produces the same results as we obtained analytically.
Computational cost of Gaussian elimination#
We now determine the costs of Gaussian elimination and the subsequent back substitution.
The cost of Gaussian elimination for an \(n\times n\) linear system is \(O\left(\frac{2}{3}n^3\right)\).
Proof. For a general linear system:
note in the first step, only the first two rows are involved:
The two rows are changed to
This involves \(1\) divison (\(a_{21}/a_{11}\)), \(n\) multiplications and \(n\) additions. The same number of operations are needed for the other rows to get
So to eliminate the first column, the total number of operations is
Similarly, to eliminate the second column (make \(a_{i2}, i>2\) zero), for each row, the total number of operations is \(2(n-1)+1\), comprising \(1\) division, \(n-1\) multiplications, and \(n-1\) additions. Therefore, the total number of operations adds up to
Repeat the same operation count for column \(3\) to column \(n\). The total number of operations is
where we used the following two identities:
We use \(O\left(\frac{2}{3}n^3\right)\) to denote “of the order of”, i.e., the cost of Gaussian elimination is dominated by the term \(\frac{2}{3}n^3\).
The cost of back substitution is \(n^2\).
Proof. Note that the last equation of an upper triangular system involves 1 division, the second to last equation involves one multiplication, one subtraction and one division, the third to last involves 5 total operations, etc., so the total number of operations of back substitution is
Therefore, the total number of operations of solving a linear system with Gaussian elimination and back substitution is
If it takes 1 second to run back substitution for a \(1000\times 1000\) upper triangular system, how long does it take to solve a linear system of the same size with Gaussian elimination?
Solution.
Since back substitution costs \(n^2=1000^2=10^6\) operations, one operation takes \(1/10^6 = 10^{-6}\) second. The total cost of Gaussian elimination is dominated by \(\frac{2}{3}n^3=\frac{2}{3}10^9\) operations, which costs \(\frac{2}{3}10^3\approx 667\) seconds.
Note that there is no need to consider the exact cost of \(\frac{2}{3}n^3 + \frac{3}{2}n^2 - \frac{7}{6}n\approx 668\), since the extra terms contribute only \(0.15\%\) of the total time.
LU factorization#
Gaussian elimination produces a byproduct, a factorization of the coefficient matrix \(A\), for free. The factorization takes the form of \(A=LU\), where \(L\) is a lower triangular matrix whose diagonal elements are all ones, and \(U\) is an upper triangular matrix. LU factorization can be utilized to solve a system of linear systems of the following form
All of the linear systems share the same coefficient matrix while the right hand sides are different. If Gaussian elimination were used to solve all linear systems, the total cost would be around \(\frac{2}{3}n^3K\), which could be computationally intractable if both \(n\) and \(K\) are large. However, by using \(A=LU\), a linear system can be rewritten as
which can further be written as a system of two linear systems:
The first linear system involves a forward substitution, which solves a lower triangular system from top to bottom, and the second can be solved by back substitution. Similar to back substitution, the cost of forward substitution is also \(n^2\). So the total cost of solving the system of linear systems is \(\frac{2}{3}n^3+2n^2K\), which is in general much more computationally efficient than solving all linear systems by Gaussian elimination.
Next, we use an example to show how LU factorization can be obtained from the Gaussian elimination process without any additional computation and storage.
Obtain the LU factorization of the coefficient matrix \(A\) in Example 19.
Solution.
Repeat the Gaussian elimination process, however, with the multipliers (factors) of elementary row operations stored in the corresponding lower triangular positions (the numbers in boxes)
So the multipliers in boxes define the lower triangular part of \(L\), and the upper triangular part including the diagonal of the row echelon form (without the right hand side) defines the matrix \(U\). In this example, we have
It is easy to verfity \(A=LU\).
Note that the lower triangular part of the coefficient matrix stores the multipliers, instead of zeros, so no additional storage is needed for LU factorization.
Use LU factorization to solve the linear system in Example 19.
Solution.
The two linear systems, \(L\boldsymbol{c} = \boldsymbol{b}\) and \(U\boldsymbol{x} = \boldsymbol{c}\), now become
and
Solving the first linear system using forward substitution, we obtain \(\boldsymbol{c}\):
Then solving for \(\boldsymbol{x}\) from the second system using back substitution, we have
which is identical to the solution obtained from Gaussian elimination.
The following Python function LU performs LU factorization. Note that the coefficient matrix itself is replaced by the entries of the matrices \(L\) and \(U\). At each step of Gaussian elimination, the pivot position is checked for its value. If the absolute value of the pivot is greater than a hard-coded threshold \(10^{-5}\), the pivot is considered nonzero. Otherwise, the algorithm stops.
def LU(A):
"""
LU factorization without pivoting.
The coefficient matrix A is modified in place.
The lower triangular part of A represents the L matrix, the upper triangular part
(including the diagonal) represents U
A: the coefficient matrix of size n x n
"""
if A.shape[0] != A.shape[1]:
print('Error: the given coefficient matrix is not square')
return
# We hard code the epsilon here. It can be an input parameter.
eps = 1e-5
n = A.shape[0]
for j in range(n-1):
if np.abs(A[j,j]) < eps:
print('Error: zero pivot encountered!')
return
for i in range(j+1, n):
# The multiplier
mp = A[i,j]/A[j,j]
A[i,j] = mp
for k in range(j+1,n):
A[i,k] = A[i,k] - mp*A[j,k]
# No need to return. A is changed in place.
We now apply LU to the coefficient matrix in Example 19, and the same result is obtained.
A = np.array([[2.,3,-1], [4,-2,3], [2, -1, 2]])
print('Perform LU decomposition of A = : ')
print(A)
LU(A)
print('The result is ')
print(A)
Perform LU decomposition of A = :
[[ 2. 3. -1.]
[ 4. -2. 3.]
[ 2. -1. 2.]]
The result is
[[ 2. 3. -1. ]
[ 2. -8. 5. ]
[ 1. 0.5 0.5]]
To use LU factorization to solve linear systems with Python, a function that implements forward substitution is needed. The following Python function forward_sub serves the purpose. The solution replaces the right hand side vector.
def forward_sub(A, b, A_from_LU):
"""
Forward substitution
The right hand side b is changed in place to store the solution
A: the coefficient matrix of size n x n
b: the right hand side of size n
A_from_LU: True, if the matrix A is from LU factorization (diagonals are 1).
False, if A is just a regular coefficient matrix
"""
if A.shape[0] != A.shape[1]:
print('Error: the given coefficient matrix is not square')
return
if A.shape[0] != b.size:
print('Error: the shape of the coefficient matrix does not match the size of the RHS')
return
n = A.shape[0]
if A_from_LU:
for j in range(0,n):
b[j+1:] = b[j+1:] - A[j+1:,j]*b[j]
else:
for j in range(0,n):
b[j] = b[j]/A[j,j]
b[j+1:] = b[j+1:] - A[j+1:,j]*b[j]
Now we use Python to solve the linear system in Example 19 with LU factorization.
A = np.array([[2.,3,-1], [4,-2,3], [2, -1, 2]])
b = np.array([-3,12,7])
# Perform LU factorization
LU(A)
print('The matrix A after LU decomposition is ')
print(A)
# Perform forward substitution
forward_sub(A,b,True)
print('After forward substition, the vector b is')
print(b)
# Perform back substitution
back_sub(A,b)
print('After backward substition, the vector b is')
print(b)
The matrix A after LU decomposition is
[[ 2. 3. -1. ]
[ 2. -8. 5. ]
[ 1. 0.5 0.5]]
After forward substition, the vector b is
[-3 18 1]
After backward substition, the vector b is
[ 1 -1 2]
Exercise 3.1-1
Let \(A\) be an \(n\times n\) matrix. \(I^{i,j}(m)\) is also an \(n\times n\) matrix that is similar to an identity matrix, but the \((i,j)\) position, where \(i>j\), has a value of \(m\). For example, supposing \(n=3\),
What is \(I^{i,j}(m)A\) for a general \(n\)?
Let \((i_1,j_1) \ne (i_2,j_2)\), what is \(I^{i_1,j_1}(m_1)I^{i_2,j_2}(m_2)\)?
What is the inverse of \(I^{i,j}(m)\)?
Exercise 3.1-2
Show the algorithm described in Example 21 leads to the LU factorization for a general matrix \(A\).
Gaussian elimination with partial pivoting#
To improve the stability of Gaussian elimination, pivots should be as large as possible, so that multipliers are as small as possible. The reason is that multiplying the row where the pivot is by a large multiplier and adding the product to another row will “overwhelm” the latter by losing significant information due to finite precision. There are various strategies to place large numbers at the pivot positions. One idea is to compare the absolute values of the pivot and all numbers below it, and the row where the largest number in magnitude is located is swapped with the row where pivot is, if they are not the same row. This variant of Gaussian elimination is called Gaussian elimination with partial pivoting. The algorithm is summarized below.
Partial pivoting#
The partial pivoting process works as follows:
The first step: instead of using \(a_{11}\) as the pivot, select the \(p\)th row, where
\[\begin{equation*} |a_{p1}|= \max_{1\le j\le n}\{|a_{j1}|\}. \end{equation*}\]If \(p\ne 1\), then exchange rows \(1\) and \(p\).
The second step: select the \(p\)th row, where
\[\begin{equation*} |a_{p2}|= \max_{2\le j\le n}\{|a_{j2}|\}. \end{equation*}\]If \(p\ne 2\), then exchange rows \(2\) and \(p\).
A general \(i\)th step: select the \(p\)th row, where
\[\begin{equation*} |a_{pi}|= \max_{i\le j\le n}\{|a_{ji}|\}. \end{equation*}\]If \(p\ne i\), then exchange rows \(i\) and \(p\).
Note, in this way, the multipliers are no greater than \(1\), so the problem of “swamping” is avoided.
Using the partial pivoting strategy, we can also treat the cases where zeros appear in the pivot positions, which would cause the naive Gaussian elimination to fail. Note that there must be at least one nonzero value below a zero pivot; otherwise the coefficient matrix is not invertible.
Apply Gaussian elimination with partial pivoting to the linear system in Example 19.
Solution.
As before, we start from the augmented matrix:
So the equivalent system is:
The solution is
Here, we marked the number that needs to be placed in the pivot position in red. The notation \(R_1\leftrightarrow R_2\) means swapping Rows \(R_1\) and \(R_2\). For this example, we only needed to switch rows once, since the \((3,2)\) position vanished when we used the first pivot to eliminate the \((3,1)\) position.
Apply Gaussian elimination with partial pivoting to the following linear system:
Solution.
So the equivalent system is:
The solution is
The following Python function GEPP performs Gaussian elimination with partial pivoting. Both the coefficient matrix and right-hand side vector \(A\) and \(b\) are changed in place and they together represent the echelon form of the augmented matrix.
def GEPP(A, b):
"""
Gaussian elimination with partial pivoting.
The coefficient matrix A is modified in place.
A: the coefficient matrix of size n x n
b: the right hand side of size n
Return:
None. The solution is stored in b
"""
if A.shape[0] != A.shape[1]:
print('Error: the given coefficient matrix is not square')
return
n = A.shape[0]
for j in range(n-1):
# find p
p = np.argmax(np.abs(A[j:,j]))
if p+j != j:
# change rows p and j of A and b:
A[[p+j, j]] = A[[j, p+j]]
b[[p+j, j]] = b[[j, p+j]]
for i in range(j+1, n):
# The multiplier
mp = A[i,j]/A[j,j]
A[i,j] = 0.
b[i] = b[i] - mp*b[j]
for k in range(j+1,n):
A[i,k] = A[i,k] - mp*A[j,k]
We apply GEPP to the linear system in Example 24, and print out the matrix \(A\) and vector \(b\) at the end. The final solution is the same as obtained earlier.
A = np.array([[3., -2, 1], [6, 1, -3], [-4, 3, -2]])
b = np.array([-1., 18, 3])
print('Solve the linear system: ')
print(A, 'x = ', b)
GEPP(A, b)
print('After Gaussian elimination with partial pivoting, A becomes:')
print(A)
print('After Gaussian elimination with partial pivoting, b becomes:')
print(b)
back_sub(A, b)
print('The solution is ', b)
Solve the linear system:
[[ 3. -2. 1.]
[ 6. 1. -3.]
[-4. 3. -2.]] x = [-1. 18. 3.]
After Gaussian elimination with partial pivoting, A becomes:
[[ 6. 1. -3. ]
[ 0. 3.66666667 -4. ]
[ 0. 0. -0.22727273]]
After Gaussian elimination with partial pivoting, b becomes:
[18. 15. 0.22727273]
The solution is [ 2. 3. -1.]
PA=LU factorization#
The partial pivoting algorithm leads to a new matrix factorization for \(A\), which takes the form of
where \(L\) and \(U\) have the same forms as defined in LU factorization, and \(P\) is a permutation matrix.
An \(n\times n\) permutation matrix is such that each column or row contains only one 1 and the rest are 0’s.
For example,
is a \(2\times 2\) permutation matrix and
is a \(3\times 3\) permutation matrix.
A permutation matrix can be obtained by a sequence of exchanges of rows or columns of the identity matrix of the same dimension. For example,
is obtained by first exchanging the first two rows of the \(3\times 3\) identity matrix \(I_3\), and then exchanging the first and third columns of the resulting matrix.
The algorithm for PA=LU factorization is similar to LU factorization. The multipliers of elementary row operations are stored in \(A\). To obtain the permutation matrix \(P\), we start from an identity matrix and perform row exchanges to the identity matrix and the subsequent matrices whenever row exchanges are needed for \(A\). The following example illustrates the process.
Find PA=LU factorization for the coefficient matrix in Example 24.
Solution.
Initially, let \(P=I_3\),
The first step of Gaussian elimination with partial pivoting involves exchanging the first two rows:
Accordingly, we update \(P\) by switching its first two rows:
Now eliminate the two numbers below the pivot (\((1,1)\) position), and store the multipliers in those positions:
The next step involves swapping the second and third rows of \(A\):
Correspondingly we update \(P\):
Now eliminate the \((3,2)\) position:
The algorithm is now completed with
It is easy to verify that \(PA=LU\).
Solving linear systems with PA=LU factorization#
The linear system \(A\boldsymbol{x} = \boldsymbol{b}\) can be rewritten as
Using the PA=LU factorization, the linear system reads
which can be written as two linear systems:
The following Python function PALU performs PA=LU factorization. The \(L\) and \(U\) factors are stored in \(A\) as in LU factorization, while the permutation matrix \(P\) is returned.
def PALU(A):
"""
PA=LU factorization with partial pivoting.
The coefficient matrix A is modified in place.
The lower triangular part of A represents the L matrix, the upper triangular part
(including the diagonal) represents U
A: the coefficient matrix of size n x n
output:
Permutation matrix $P$ is returned, along with $A$ that is changed in place
"""
if A.shape[0] != A.shape[1]:
print('Error: the given coefficient matrix is not square')
return
n = A.shape[0]
P = np.eye(n)
for j in range(n-1):
# find p
p = np.argmax(np.abs(A[j:,j]))
if p+j != j:
# change rows p and j of A. Update P:
A[[p+j, j]] = A[[j, p+j]]
P[[p+j, j]] = P[[j, p+j]]
for i in range(j+1, n):
# The multiplier
mp = A[i,j]/A[j,j]
A[i,j] = mp
for k in range(j+1,n):
A[i,k] = A[i,k] - mp*A[j,k]
return P
Applying PALU to the coefficient matrix in Example 24 gives the same result.
A = np.array([[3., -2, 1], [6, 1, -3], [-4, 3, -2]])
print('Perform PA=LU factorization: ')
print('Before PA=LU factorization, A = ')
print(A)
P = PALU(A)
print('The PA=LU factorization result is ')
print('After PA=LU factorization, A = ')
print(A)
print('The permutation matrix P = ')
print(P)
Perform PA=LU factorization:
Before PA=LU factorization, A =
[[ 3. -2. 1.]
[ 6. 1. -3.]
[-4. 3. -2.]]
The PA=LU factorization result is
After PA=LU factorization, A =
[[ 6. 1. -3. ]
[-0.66666667 3.66666667 -4. ]
[ 0.5 -0.68181818 -0.22727273]]
The permutation matrix P =
[[0. 1. 0.]
[0. 0. 1.]
[1. 0. 0.]]
We also use PA=LU factorization to solve the linear system in Example 24. Again, the same result is obtained.
A = np.array([[3., -2, 1], [6, 1, -3], [-4, 3, -2]])
b = np.array([-1., 18, 3])
P = PALU(A)
# First solve for c
Pb = np.dot(P,b)
forward_sub(A, Pb, True)
# Solve for x
back_sub(A, Pb)
print('The solution is: ')
print(Pb)
The solution is:
[ 2. 3. -1.]
Full pivoting and PAQ=LU factorization#
The partial pivoting strategy for Gaussian elimination only considers candidates that are in the same column as the pivots. However, for the \((k,k)\) pivot position, \(1\le k\le n\), if we consider the submatrix \((a_{ij}), i\ge k, j\ge k\), i.e., all positions that are not above and not to the left of the pivot, as candidates to be placed at the pivot position, the strategy is known as full pivoting. Full pivoting is more stable, since at each step, it considers a wider range of candidates for the pivot position, and the multipliers can be even smaller compared to partial pivoting.
The algorithm works as follows:
The first step: instead of using \(a_{11}\) as the pivot, select the \(p\)th row, and the \(q\)th column, where
\[\begin{equation*} |a_{pq}| = \max_{1\le i\le n, 1\le j\le n}\{|a_{ij}|\}, \end{equation*}\]exchange row \(p\) and row \(1\), and exchange column \(q\) and column \(1\), so now \(a_{pq}\) is in the \((1,1)\) pivot position. Apply Gaussian elimination.
The second step: instead of using \(a_{22}\) as the pivot, select the \(p\)th row, and the \(q\)th column, where
\[\begin{equation*} |a_{pq}| = \max_{2\le i\le n, 2\le j\le n}\{|a_{ij}|\}, \end{equation*}\]exchange row \(p\) and row \(2\), and exchange column \(q\) and column \(2\), so now \(a_{pq}\) is in the \((2,2)\) pivot position. Apply Gaussian elimination.
The rest of the pivots can be found in a similar fashion.
Now we will have two permutation matrices \(P\) and \(Q\), which correspond to the row and column exchanges. The factorization corresponding to Gaussian elimination with full pivoting is in the form of
which can be obtained in a similar fashion to the PA=LU factorization. The permutation matrix \(Q\) is obtained by performing the same sequence of column exchanges to an identity matrix.
Below is the Python function for PAQ=LU factorization. The returned outputs are the two permutation matrices \(P\) and \(Q\).
def PAQLU(A):
"""
PAQ=LU factorization with full pivoting.
The coefficient matrix A is modified in place.
The lower triangular part of A represents the L matrix, the upper triangular part
(including the diagonal) represents U
A: the coefficient matrix of size n x n
output:
Permutation matrices $P$, $Q$ are returned,
along with $A$ that is changed in place
"""
if A.shape[0] != A.shape[1]:
print('Error: the given coefficient matrix is not square')
return
n = A.shape[0]
P = np.eye(n)
Q = np.eye(n)
for j in range(n-1):
# find p and q so that the (p,q) position has the largest
# value of the |A|
p,q = np.unravel_index(np.argmax(np.abs(A[j:,j:])),
np.abs(A[j:,j:]).shape)
if p+j != j:
# exchange rows p and j of A. Update P:
A[[p+j, j]] = A[[j, p+j]]
P[[p+j, j]] = P[[j, p+j]]
if q+j != j:
# exchange columns q and j of A. Update Q:
A[:, [q+j, j]] = A[:, [j, q+j]]
Q[:, [q+j, j]] = Q[:, [j, q+j]]
for i in range(j+1, n):
# The multiplier
mp = A[i,j]/A[j,j]
A[i,j] = mp
for k in range(j+1,n):
A[i,k] = A[i,k] - mp*A[j,k]
return P, Q
Apply PAQ=LU factorization to the coefficient matrix in Example 24:
A = np.array([[3., -2, 1], [6, 1, -3], [-4, 3, -2]])
print('Perform PAQ=LU factorization: ')
print('Before PAQ=LU factorization, A = ')
print(A)
P,Q = PAQLU(A)
print('The PAQ=LU factorization result is ')
print('After PAQ=LU factorization, A = ')
print(A)
print('The permutation matrix P = ')
print(P)
print('The permutation matrix Q = ')
print(Q)
Perform PAQ=LU factorization:
Before PAQ=LU factorization, A =
[[ 3. -2. 1.]
[ 6. 1. -3.]
[-4. 3. -2.]]
The PAQ=LU factorization result is
After PAQ=LU factorization, A =
[[ 6. -3. 1. ]
[-0.66666667 -4. 3.66666667]
[ 0.5 -0.625 -0.20833333]]
The permutation matrix P =
[[0. 1. 0.]
[0. 0. 1.]
[1. 0. 0.]]
The permutation matrix Q =
[[1. 0. 0.]
[0. 0. 1.]
[0. 1. 0.]]
Analytically, the PAQ=LU factorization for \(A\) is:
Exercise 3.1-3
For the linear system
find the LU factorizaton, PA=LU factorization, and PAQ=LU factorization of the coefficient matrix. Use LU factorization and PA=LU factorization to solve the linear system.
Condition number#
For a function of one variable, \(y=f(x)\), where \(x\) is the independent variable, and \(y\) is the dependent variable, one is often interested in how \(y\) responds to a small change in \(x\). If a small change in \(x\) leads to a large change in \(y\), then \(y\) is said to be sensitive to \(x\); otherwise \(y\) is insensitive to \(x\).
In the problem of solving a linear system, similarly we are interested in how the change in the independent variable, \(\boldsymbol{b}\), affect the dependent variable \(\boldsymbol{x}\). Considering a perturbation of \(\boldsymbol{b}\), \(\hat{\boldsymbol{b}}=\boldsymbol{b}+\delta\boldsymbol{b}\), the ratio of the relative change in the magnitude (norm) of \(\boldsymbol{x}\) to the relative change in the magnitude of \(\boldsymbol{b}\) can be expressed as
Here \(||\delta\boldsymbol{x}||\) is called forward error, \(\frac{||\delta\boldsymbol{x}||}{||\boldsymbol{x}||}\) is called relative forward error, \(||\delta\boldsymbol{b}||\) is called backward error, \(\frac{||\delta\boldsymbol{b}||}{||\boldsymbol{b}||}\) is called relative backward error, and the quantity \(\frac{\frac{||\delta\boldsymbol{x}||}{||\boldsymbol{x}||}}{\frac{||\delta\boldsymbol{b}||}{||\boldsymbol{b}||}}\) is called error magnification factor.
To find the maximum of error magnification factor, notice that
where the notation \(||\cdot||_m\) denotes the \(||\cdot||\) (vector norm) induced matrix norm.
The quantity \(||A^{-1}||_m\cdot||A||_m\) is called the condition number of matrix \(A\).
A large condition number indicates that a small change in \(\boldsymbol{b}\) incurs a large change in \(\boldsymbol{x}\), the solution to the linear system. Such matrices are called ill-conditioned; otherwise, the matrices are called well-conditioned. Solving a linear system whose coefficient matrix is ill-conditioned is subject to large error. As a rule of thumb, supposing the condition number of matrix \(A\) is \(O(10^k)\), then up to \(k\) digits of accuracy is lost in solving a linear system that has \(A\) as the coefficient matrix. With double precision, this means about \(16-k\) digits of accuracy is expected.
Consider the \(8\times 8\) Hilbert matrix
In general, the position \((i,j)\) of the Hilbert matrix has the value \(\frac{1}{i+j-1}\).
Let the true solution be \(\boldsymbol{x}=[1,1,1,1,1,1,1,1]^T\), and create the right hand side vector \(\boldsymbol{b}=H\boldsymbol{x}\). Use Python to solve the system \(H\boldsymbol{x}=\boldsymbol{b}\). How accurate are the solutions?
# Verify the effect of condition number
n = 8
H = np.zeros((n,n))
for i in range(n):
for j in range(n):
H[i,j] = 1./((i+1)+(j+1)-1.)
b = np.dot(H,np.ones(n,))
print('The condition number of H is: {0:12.9e}'.format(np.linalg.cond(H)))
x = np.linalg.solve(H, b)
print('The solution x is: ')
print("[{0:19.16f} {1:19.16f} {2:19.16f} {3:19.16f} "
"{4:19.16f} {4:19.16f} {4:19.16f} {4:19.16f}]".format(x[0], x[1], x[2],
x[3], x[4], x[5], x[6], x[7]))
print('The error vector of the solution is: ')
print("[{0:19.16e} {1:19.16e} {2:19.16e} {3:19.16e} {4:19.16e} {4:19.16e} "
"{4:19.16e} {4:19.16e}]".format(np.abs(x[0]-1), np.abs(x[1]-1), np.abs(x[2]-1),
np.abs(x[3]-1), np.abs(x[4]-1), np.abs(x[5]-1),
np.abs(x[6]-1), np.abs(x[7]-1)))
The condition number of H is: 1.525757557e+10
The solution x is:
[ 1.0000000000124938 0.9999999992609565 1.0000000102814219 0.9999999418227916 1.0000001619251886 1.0000001619251886 1.0000001619251886 1.0000001619251886]
The error vector of the solution is:
[1.2493783785316737e-11 7.3904349306985750e-10 1.0281421936042534e-08 5.8177208384080359e-08 1.6192518859092786e-07 1.6192518859092786e-07 1.6192518859092786e-07 1.6192518859092786e-07]
The condition number of the \(8\times 8\) Hilbert matrix is about \(10^{10}\), so \(16-10=6\) digits of accuracy is expected, which is verified by the printed errors, the largest one of which is \(1.62\times 10^{-7}\).