3.1 Gaussian elimination#

Recall in a linear algebra course that a system of linear equations can be solved by three types of elementary row operations (ERO):

  1. Exchange two rows

  2. Multiply a row by a nonzero constant

  3. Add a multiple of a row to another row

None of the ERO will change the solution to the linear system.

Now we use a couple of examples to show how linear systems can be solved with ERO.

Example 18

Solve the following linear system by (naive) Gaussian Elimination:

\[\begin{align*} 2x + 3y &= 8 \\ 3x - 4y &= -5. \end{align*}\]

Solution.

We use matrix form and start with the augmented matrix:

\[\begin{align*} \left[\begin{array}{rrrr}{2} & {3} & { |} & {8} \\ {3} & {-4} & { |} & {-5}\end{array}\right] &\rightarrow R_2=R_2-\frac{3}{2}R_1 \rightarrow\left[\begin{array}{rrrr}{2} & {3} & {|} & {8} \\ {0} & {-\frac{17}{2}} & {|} & {-\frac{34}{2}}\end{array}\right]\\ &\rightarrow R_2=R_2/\left(-\frac{17}{2}\right) \rightarrow \left[\begin{array}{rrrr}{2} & {3} & {|} & {8} \\ {0} & {1} & {|} & {2}\end{array}\right] \\ &\rightarrow R_1=R_1-3R_2 \rightarrow \left[\begin{array}{rrrr}{2} & {0} & {|} & {2} \\ {0} & {1} & {|} & {2}\end{array}\right] \\ &\rightarrow R_1=R_1/2 \rightarrow \left[\begin{array}{rrrr}{1} & {0} & {|} & {1} \\ {0} & {1} & {|} & {2}\end{array}\right]. \end{align*}\]

Here the notation for the first ERO means multiplying the first row by \(\frac{3}{2}\) and subtract it from the second row (or equivalently, multiplying the first row by \(-\frac{3}{2}\) and add it to the second row), the second ERO means dividing the second row by \(-\frac{17}{2}\) (or multiplying the second row by \(-\frac{2}{17}\), the third means multiplying the second row by \(3\) and subtracting it from the first (or multiplying the second row by \(-3\) and add it to the first row), and the last means dividing the first row by \(2\) (or multiplying the first row by \(\frac{1}{2}\)).

The resulting linear system is in reduced echelon form from which one can directly read the solution:

\[\begin{equation*} x_1 = 1, \, x_2 = 2. \end{equation*}\]

Now we look at a \(3\times 3\) system.

Example 19

Solve the following linear system by (naive) Gaussian Elimination:

\[\begin{align*} 2x + 3y -z &= -3 \\ 4x - 2y +3z &= 12 \\ 2x -y + 2z &= 7. \end{align*}\]

Solution.

In matrix form, the system is:

\[\begin{equation*} \left[\begin{array}{rrrr}{2} & {3} & {-1} & {|} & {-3} \\ {4} & {-2} & {3} & { |} & {12} \\ {2} & {-1} & {2} & { |} & {7}\end{array}\right]. \end{equation*}\]

Perform Gaussian elimination:

\[\begin{align*} \left[\begin{array}{rrrr}{2} & {3} & {-1} & {|} & {-3} \\ {4} & {-2} & {3} & {|} & {12} \\ {2} & {-1} & {2} & { |} & {7}\end{array}\right] &\rightarrow R_2=R_2-2R_1 \rightarrow \left[\begin{array}{rrrrr}{2} & {3} & {-1} & { |} & {-3} \\ {0} & {-8} & {5} & { |} & {18} \\ {2} & {-1} & {2} & { |} & {7}\end{array}\right] \\ &\rightarrow R_3=R_3-R_1 \rightarrow \left[\begin{array}{rrrrr}{2} & {3} & {-1} & { |} & {-3} \\ {0} & {-8} & {5} & { |} & {18} \\ {0} & {-4} & {3} & { |} & {10}\end{array}\right] \\ &\rightarrow R_3=R_3-\frac{1}{2}R_2 \rightarrow\left[\begin{array}{rrrrr}{2} & {3} & {-1} & { |} & {-3} \\ {0} & {-8} & {5} & { |} & {18} \\ {0} & {0} & {\frac{1}{2}} & { |} & {1}\end{array}\right]. \end{align*}\]

So the equivalent system of equations is:

\[\begin{align*} 2x+3 y-z &=-3 \\ -8 y +5z &=18 \\ \frac{1}{2} z &=1. \end{align*}\]

The solution is

\[\begin{equation*} x = 1, \, y = -1, \, z =2. \end{equation*}\]

Gaussian elimination involves identifying a pivot position (the \((i,i)\) position on the diagonal) at each step, multiplying the row where pivot is by separate factors, and subtract the products from the rows below. Note that no zeros were encountered in the Gaussian elimination processes in the two examples above, and hence such Gaussian elimination is called “naive”. In addition, if echelon form, instead of reduced echelon form, is obtained from Gaussian elimination, a back substitution algorithm is needed to solve the upper triangular system for \(x\)’s from bottom to top. The back substitution algorithm solves for \(x_n\) from the last equation, \(x_{n-1}\) from the second to last equation, and all the way up, until \(x_1\) is solved for from the first equation. The following code performs back substitution for an upper triangular system, and the solution to the system is stored in place in \(\boldsymbol{b}\).

import numpy as np
def back_sub(A, b):
    """
    Backward substitution
    The right hand side b is changed in place to store the solution
    A: the coefficient matrix of size n x n
    b: the right hand side of size n
    """
    if A.shape[0] != A.shape[1]:
        print('Error: the given coefficient matrix is not square')
        return
    
    if A.shape[0] != b.size:
        print('Error: the shape of the coefficient matrix does not match the size of the RHS')
        return
     
    n = A.shape[0]
    
    for i in range(n-1, -1, -1):
        for j in range(i+1, n):
            b[i] = b[i] - A[i,j]*b[j]
        b[i] = b[i]/A[i,i]

The following Python function performs Gaussian elimination, where the final upper triangular matrix is stored in \(A\), and the final right hand side is stored in \(b\), so that no additional storage is needed. However, if the original coefficient matrix and right hand side are needed later, one should make a copy of them before calling the function.

def GaussianElim(A, b):
    """
    Gaussian Elimination without pivoting. 
    The coefficient matrix A and the right hand side b are modified in place.
    Input:
    A: the coefficient matrix of size n x n
    b: the right hand side of size n    
    Return:
    None. The solution is stored in b
    """
    if A.shape[0] != A.shape[1]:
        print('Error: the given coefficient matrix is not square')
        return
    
    if A.shape[0] != b.size:
        print('Error: the shape of the coefficient matrix does not match the size of the RHS')
        return
     
    # We hard code the epsilon here. It can be an input parameter.
    eps = 1e-5 
    n = A.shape[0]

    for j in range(n-1):
        if np.abs(A[j,j]) < eps:
            print('Error: zero pivot encountered!')
            return
        for i in range(j+1, n):
            # The multiplier
            mp = A[i,j]/A[j,j]
            for k in range(j+1,n):
                A[i,k] = A[i,k] - mp*A[j,k]
            b[i] = b[i] - mp*b[j]
    
    # No need to return. Both A and b are changed in place

Now combined with back_sub, GaussianElim can solve linear systems. We apply the code to the linear system in Example 19.

A = np.array([[2., 3, -1], [4, -2, 3], [2, -1, 2]])
b = np.array([-3, 12, 7])
print('Solve the linear system: ')
print(A, 'x = ', b)
GaussianElim(A, b)
back_sub(A, b)
print('The solution is ', b)
Solve the linear system: 
[[ 2.  3. -1.]
 [ 4. -2.  3.]
 [ 2. -1.  2.]] x =  [-3 12  7]
The solution is  [ 1 -1  2]

The code produces the same results as we obtained analytically.

Computational cost of Gaussian elimination#

We now determine the costs of Gaussian elimination and the subsequent back substitution.

Theorem 13

The cost of Gaussian elimination for an \(n\times n\) linear system is \(O\left(\frac{2}{3}n^3\right)\).

Proof. For a general linear system:

\[\begin{equation*} \left[\begin{array}{ccccc}{a_{11}} & {a_{12}} & {\dots} & {a_{1 n}} & { |} & {b_{1}} \\ {a_{21}} & {a_{22}} & {\dots} & {a_{2 n}} & { |} & {b_{2}} \\ {\vdots} & {\vdots} & {\ldots} & {\vdots} & { |} & {\vdots} \\ {a_{n 1}} & {a_{n 2}} & {\dots} & {a_{n n}} & { |} & {b_{n}}\end{array}\right] \end{equation*}\]

note in the first step, only the first two rows are involved:

\[\begin{equation*} \begin{array}{llllll}{a_{11}} & {a_{12}} & {\dots} & {a_{1 n}} & { |} & {b_{1}} \\ {a_{21}} & {a_{22}} & {\dots} & {a_{2 n}} & { |} & {b_{2}}.\end{array} \end{equation*}\]

The two rows are changed to

\[\begin{equation*} \begin{array}{llll} {a_{11}} & {a_{12}} & {\dots} & {a_{1 n}} & { |} & {b_{1}} \\ {0} & {a_{22}-\frac{a_{21}}{a_{11}} a_{12}} & {\dots} & {a_{2 n}-\frac{a_{21}}{a_{1 1}} a_{1 n}} & { |} & {b_{2}-\frac{a_{21}}{a_{11}} b_{1}}. \end{array} \end{equation*}\]

This involves \(1\) divison (\(a_{21}/a_{11}\)), \(n\) multiplications and \(n\) additions. The same number of operations are needed for the other rows to get

\[\begin{equation*} \begin{array}{llll} {a_{11}} & {a_{12}} & {\dots} & {a_{1 n}} & { |} & {b_{1}} \\ {\vdots} & {\vdots} & {\dots} & {\vdots} & { |} & {\vdots} \\ {0} & {a_{i2}-\frac{a_{i1}}{a_{11}} a_{12}} & {\dots} & {a_{i n}-\frac{a_{i1}}{a_{1 1}} a_{1 n}} & { |} & {b_{i}-\frac{a_{i1}}{a_{11}} b_{1}}. \end{array} \end{equation*}\]

So to eliminate the first column, the total number of operations is

\[\begin{equation*} (2n+1)(n-1). \end{equation*}\]

Similarly, to eliminate the second column (make \(a_{i2}, i>2\) zero), for each row, the total number of operations is \(2(n-1)+1\), comprising \(1\) division, \(n-1\) multiplications, and \(n-1\) additions. Therefore, the total number of operations adds up to

\[\begin{equation*} (2(n-1)+1)(n-2). \end{equation*}\]

Repeat the same operation count for column \(3\) to column \(n\). The total number of operations is

\[\begin{align*} &(2n+1)(n-1) + (2(n-1)+1)(n-2) + (2(n-2)+1)(n-3)+\cdots+(2(n-(n-2))+1)(n-(n-1)) \\ &= (2(n-1)+3)(n-1) + (2(n-2)+3)(n-2) + (2(n-3)+3)(n-3)+\cdots+(2(n-(n-1))+3)(n-(n-1)) \\ &= \sum_{j=1}^{n-1} (2(n-j)+3)(n-j) = \sum_{j=1}^{n-1} (2n^2+3n)-(4n+3)j+2j^2 \\ &= (2n^2+3n)(n-1) - (4n+3)\frac{n(n-1)}{2} + 2\frac{(n-1)(n)(2n-1)}{6} \\ &= \frac{2}{3}n^3 + \frac{1}{2}n^2 - \frac{7}{6}n, \end{align*}\]

where we used the following two identities:

\[\begin{align*} 1+2+3+4+\dots+n &=n (n+1) / 2, \\ 1^{2}+2^{2}+3^{2}+4^{2}+\dots+n^{2} &= n(n+1)(2 n+1) / 6. \end{align*}\]

We use \(O\left(\frac{2}{3}n^3\right)\) to denote “of the order of”, i.e., the cost of Gaussian elimination is dominated by the term \(\frac{2}{3}n^3\).

Theorem 14

The cost of back substitution is \(n^2\).

Proof. Note that the last equation of an upper triangular system involves 1 division, the second to last equation involves one multiplication, one subtraction and one division, the third to last involves 5 total operations, etc., so the total number of operations of back substitution is

\[\begin{equation*} 1+3+5+\cdots+(2n-1) = \frac{[1+(2n-1)]n}{2} = n^2. \end{equation*}\]

Therefore, the total number of operations of solving a linear system with Gaussian elimination and back substitution is

\[\begin{equation*} \frac{2}{3}n^3 + \frac{1}{2}n^2 - \frac{7}{6}n + n^2 = \frac{2}{3}n^3 + \frac{3}{2}n^2 - \frac{7}{6}n = O\left(\frac{2}{3}n^3\right). \end{equation*}\]

Example 20

If it takes 1 second to run back substitution for a \(1000\times 1000\) upper triangular system, how long does it take to solve a linear system of the same size with Gaussian elimination?

Solution.

Since back substitution costs \(n^2=1000^2=10^6\) operations, one operation takes \(1/10^6 = 10^{-6}\) second. The total cost of Gaussian elimination is dominated by \(\frac{2}{3}n^3=\frac{2}{3}10^9\) operations, which costs \(\frac{2}{3}10^3\approx 667\) seconds.

Note that there is no need to consider the exact cost of \(\frac{2}{3}n^3 + \frac{3}{2}n^2 - \frac{7}{6}n\approx 668\), since the extra terms contribute only \(0.15\%\) of the total time.

LU factorization#

Gaussian elimination produces a byproduct, a factorization of the coefficient matrix \(A\), for free. The factorization takes the form of \(A=LU\), where \(L\) is a lower triangular matrix whose diagonal elements are all ones, and \(U\) is an upper triangular matrix. LU factorization can be utilized to solve a system of linear systems of the following form

\[\begin{equation*} A\boldsymbol{x}_i = \boldsymbol{b}_i, \, i=1,2,\dots,K. \end{equation*}\]

All of the linear systems share the same coefficient matrix while the right hand sides are different. If Gaussian elimination were used to solve all linear systems, the total cost would be around \(\frac{2}{3}n^3K\), which could be computationally intractable if both \(n\) and \(K\) are large. However, by using \(A=LU\), a linear system can be rewritten as

\[\begin{equation*} LU\boldsymbol{x} = \boldsymbol{b}, \end{equation*}\]

which can further be written as a system of two linear systems:

\[\begin{align*} L\boldsymbol{c} &= \boldsymbol{b} \\ U\boldsymbol{x} &= \boldsymbol{c}. \end{align*}\]

The first linear system involves a forward substitution, which solves a lower triangular system from top to bottom, and the second can be solved by back substitution. Similar to back substitution, the cost of forward substitution is also \(n^2\). So the total cost of solving the system of linear systems is \(\frac{2}{3}n^3+2n^2K\), which is in general much more computationally efficient than solving all linear systems by Gaussian elimination.

Next, we use an example to show how LU factorization can be obtained from the Gaussian elimination process without any additional computation and storage.

Example 21

Obtain the LU factorization of the coefficient matrix \(A\) in Example 19.

Solution.

Repeat the Gaussian elimination process, however, with the multipliers (factors) of elementary row operations stored in the corresponding lower triangular positions (the numbers in boxes)

\[\begin{align*} \left[\begin{array}{rrrr}{2} & {3} & {-1} & {|} & {-3} \\ {4} & {-2} & {3} & {|} & {12} \\ {2} & {-1} & {2} & { |} & {7}\end{array}\right] &\rightarrow R_2=R_2-2R_1 \rightarrow \left[\begin{array}{rrrrr}{2} & {3} & {-1} & { |} & {-3} \\ \fbox{2} & {-8} & {5} & { |} & {18} \\ {2} & {-1} & {2} & { |} & {7}\end{array}\right] \\ &\rightarrow R_3=R_3-R_1 \rightarrow \left[\begin{array}{rrrrr}{2} & {3} & {-1} & { |} & {-3} \\ \fbox{2} & {-8} & {5} & { |} & {18} \\ \fbox{1} & {-4} & {3} & { |} & {10}\end{array}\right] \\ &\rightarrow R_3=R_3-\frac{1}{2}R_2 \rightarrow\left[\begin{array}{rrrrr}{2} & {3} & {-1} & { |} & {-3} \\ \fbox{2} & {-8} & {5} & { |} & {18} \\ \fbox{1} & \fbox{1/2} & {1/2} & { |} & {1}\end{array}\right]. \end{align*}\]

So the multipliers in boxes define the lower triangular part of \(L\), and the upper triangular part including the diagonal of the row echelon form (without the right hand side) defines the matrix \(U\). In this example, we have

\[\begin{equation*} L = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 1 & 1/2 & 1 \end{bmatrix},\, U = \begin{bmatrix} 2 & 3 & -1 \\ 0 & -8 & 5 \\ 0 & 0 & 1/2 \end{bmatrix}. \end{equation*}\]

It is easy to verfity \(A=LU\).

Note that the lower triangular part of the coefficient matrix stores the multipliers, instead of zeros, so no additional storage is needed for LU factorization.

Example 22

Use LU factorization to solve the linear system in Example 19.

Solution.

The two linear systems, \(L\boldsymbol{c} = \boldsymbol{b}\) and \(U\boldsymbol{x} = \boldsymbol{c}\), now become

\[\begin{equation*} \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 1 & 1/2 & 1 \end{bmatrix} \begin{bmatrix} c_1 \\ c_2 \\ c_3 \end{bmatrix} = \begin{bmatrix} -3 \\ 12 \\ 7 \end{bmatrix}, \end{equation*}\]

and

\[\begin{equation*} \begin{bmatrix} 2 & 3 & -1 \\ 0 & -8 & 5 \\ 0 & 0 & 1/2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} c_1 \\ c_2 \\ c_3 \end{bmatrix}. \end{equation*}\]

Solving the first linear system using forward substitution, we obtain \(\boldsymbol{c}\):

\[\begin{equation*} \boldsymbol{c} = \begin{bmatrix} c_1 \\ c_2 \\ c_3 \end{bmatrix} = \begin{bmatrix} -3 \\ 18 \\ 1 \end{bmatrix}. \end{equation*}\]

Then solving for \(\boldsymbol{x}\) from the second system using back substitution, we have

\[\begin{equation*} \boldsymbol{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}, \end{equation*}\]

which is identical to the solution obtained from Gaussian elimination.

The following Python function LU performs LU factorization. Note that the coefficient matrix itself is replaced by the entries of the matrices \(L\) and \(U\). At each step of Gaussian elimination, the pivot position is checked for its value. If the absolute value of the pivot is greater than a hard-coded threshold \(10^{-5}\), the pivot is considered nonzero. Otherwise, the algorithm stops.

def LU(A):
    """
    LU factorization without pivoting. 
    The coefficient matrix A is modified in place.
    The lower triangular part of A represents the L matrix, the upper triangular part 
    (including the diagonal) represents U
    A: the coefficient matrix of size n x n   
    """
    if A.shape[0] != A.shape[1]:
        print('Error: the given coefficient matrix is not square')
        return   
     
    # We hard code the epsilon here. It can be an input parameter.
    eps = 1e-5 
    n = A.shape[0]

    for j in range(n-1):
        if np.abs(A[j,j]) < eps:
            print('Error: zero pivot encountered!')
            return
        for i in range(j+1, n):
            # The multiplier
            mp = A[i,j]/A[j,j]
            A[i,j] = mp
            for k in range(j+1,n):
                A[i,k] = A[i,k] - mp*A[j,k]
    
    # No need to return. A is changed in place.

We now apply LU to the coefficient matrix in Example 19, and the same result is obtained.

A = np.array([[2.,3,-1], [4,-2,3], [2, -1, 2]])
print('Perform LU decomposition of A = : ')
print(A)
LU(A)
print('The result is ')
print(A)
Perform LU decomposition of A = : 
[[ 2.  3. -1.]
 [ 4. -2.  3.]
 [ 2. -1.  2.]]
The result is 
[[ 2.   3.  -1. ]
 [ 2.  -8.   5. ]
 [ 1.   0.5  0.5]]

To use LU factorization to solve linear systems with Python, a function that implements forward substitution is needed. The following Python function forward_sub serves the purpose. The solution replaces the right hand side vector.

def forward_sub(A, b, A_from_LU):
    """
    Forward substitution
    The right hand side b is changed in place to store the solution
    A: the coefficient matrix of size n x n
    b: the right hand side of size n
    A_from_LU: True, if the matrix A is from LU factorization (diagonals are 1).
               False, if A is just a regular coefficient matrix
    """
    if A.shape[0] != A.shape[1]:
        print('Error: the given coefficient matrix is not square')
        return
    
    if A.shape[0] != b.size:
        print('Error: the shape of the coefficient matrix does not match the size of the RHS')
        return
     
    n = A.shape[0]
    
    if A_from_LU:
        for j in range(0,n):       
            b[j+1:] = b[j+1:] - A[j+1:,j]*b[j]
    else:
        for j in range(0,n):  
            b[j] = b[j]/A[j,j]
            b[j+1:] = b[j+1:] - A[j+1:,j]*b[j]

Now we use Python to solve the linear system in Example 19 with LU factorization.

A = np.array([[2.,3,-1], [4,-2,3], [2, -1, 2]])
b = np.array([-3,12,7])
# Perform LU factorization
LU(A)
print('The matrix A after LU decomposition is ')
print(A)
# Perform forward substitution
forward_sub(A,b,True)
print('After forward substition, the vector b is')
print(b)
# Perform back substitution
back_sub(A,b)
print('After backward substition, the vector b is')
print(b)
The matrix A after LU decomposition is 
[[ 2.   3.  -1. ]
 [ 2.  -8.   5. ]
 [ 1.   0.5  0.5]]
After forward substition, the vector b is
[-3 18  1]
After backward substition, the vector b is
[ 1 -1  2]

Exercise 3.1-1

Let \(A\) be an \(n\times n\) matrix. \(I^{i,j}(m)\) is also an \(n\times n\) matrix that is similar to an identity matrix, but the \((i,j)\) position, where \(i>j\), has a value of \(m\). For example, supposing \(n=3\),

\[\begin{equation*} I^{2,1}(-2)= \begin{bmatrix} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \end{equation*}\]
  1. What is \(I^{i,j}(m)A\) for a general \(n\)?

  2. Let \((i_1,j_1) \ne (i_2,j_2)\), what is \(I^{i_1,j_1}(m_1)I^{i_2,j_2}(m_2)\)?

  3. What is the inverse of \(I^{i,j}(m)\)?

Exercise 3.1-2

Show the algorithm described in Example 21 leads to the LU factorization for a general matrix \(A\).

Gaussian elimination with partial pivoting#

To improve the stability of Gaussian elimination, pivots should be as large as possible, so that multipliers are as small as possible. The reason is that multiplying the row where the pivot is by a large multiplier and adding the product to another row will “overwhelm” the latter by losing significant information due to finite precision. There are various strategies to place large numbers at the pivot positions. One idea is to compare the absolute values of the pivot and all numbers below it, and the row where the largest number in magnitude is located is swapped with the row where pivot is, if they are not the same row. This variant of Gaussian elimination is called Gaussian elimination with partial pivoting. The algorithm is summarized below.

Partial pivoting#

The partial pivoting process works as follows:

  1. The first step: instead of using \(a_{11}\) as the pivot, select the \(p\)th row, where

    \[\begin{equation*} |a_{p1}|= \max_{1\le j\le n}\{|a_{j1}|\}. \end{equation*}\]

    If \(p\ne 1\), then exchange rows \(1\) and \(p\).

  2. The second step: select the \(p\)th row, where

    \[\begin{equation*} |a_{p2}|= \max_{2\le j\le n}\{|a_{j2}|\}. \end{equation*}\]

    If \(p\ne 2\), then exchange rows \(2\) and \(p\).

  3. A general \(i\)th step: select the \(p\)th row, where

    \[\begin{equation*} |a_{pi}|= \max_{i\le j\le n}\{|a_{ji}|\}. \end{equation*}\]

    If \(p\ne i\), then exchange rows \(i\) and \(p\).

Note, in this way, the multipliers are no greater than \(1\), so the problem of “swamping” is avoided.

Using the partial pivoting strategy, we can also treat the cases where zeros appear in the pivot positions, which would cause the naive Gaussian elimination to fail. Note that there must be at least one nonzero value below a zero pivot; otherwise the coefficient matrix is not invertible.

Example 23

Apply Gaussian elimination with partial pivoting to the linear system in Example 19.

Solution.

As before, we start from the augmented matrix:

\[\begin{align*} \left[\begin{array}{rrrr}{2} & {3} & {-1} & {|} & {-3} \\ \color{red}{4} & {-2} & {3} & {|} & {12} \\ {2} & {-1} & {2} & { |} & {7}\end{array}\right] &\rightarrow R_2\leftrightarrow R_1 \rightarrow \left[\begin{array}{rrrr} {4} & {-2} & {3} & {|} & {12} \\ {2} & {3} & {-1} & {|} & {-3} \\ {2} & {-1} & {2} & { |} & {7}\end{array}\right] \\ &\rightarrow R_2=R_2-\frac{1}{2}R_1 \rightarrow \left[\begin{array}{rrrrr}{4} & {-2} & {3} & {|} & {12} \\ {0} & {4} & {-\frac{5}{2}} & { |} & {-9} \\ {2} & {-1} & {2} & { |} & {7}\end{array}\right] \\ &\rightarrow R_3=R_3-\frac{1}{2}R_1 \rightarrow \left[\begin{array}{rrrrr}{4} & {-2} & {3} & {|} & {12} \\ {0} & {4} & {-\frac{5}{2}} & { |} & {-9} \\ {0} & {0} & {\frac{1}{2}} & { |} & {1}\end{array}\right]. \end{align*}\]

So the equivalent system is:

\[\begin{align*} 4x-2 y+3z &=12 \\ 4 y -\frac{5}{2}z &=-9 \\ \frac{1}{2} z &=1. \end{align*}\]

The solution is

\[\begin{equation*} x = 1, \, y = -1, \, z =2. \end{equation*}\]

Here, we marked the number that needs to be placed in the pivot position in red. The notation \(R_1\leftrightarrow R_2\) means swapping Rows \(R_1\) and \(R_2\). For this example, we only needed to switch rows once, since the \((3,2)\) position vanished when we used the first pivot to eliminate the \((3,1)\) position.

Example 24

Apply Gaussian elimination with partial pivoting to the following linear system:

\[\begin{align*} 3x - 2y + z &= -1 \\ 6x + y -3 z &= 18 \\ -4x +3 y - 2 z &= 3. \end{align*}\]

Solution.

\[\begin{align*} \left[\begin{array}{rrrr}{3} & {-2} & {1} & {|} & {-1} \\ \color{red}{6} & {1} & {-3} & {|} & {18} \\ {-4} & {3} & {-2} & { |} & {3}\end{array}\right] &\rightarrow R_2\leftrightarrow R_1 \rightarrow \left[\begin{array}{rrrr} {6} & {1} & {-3} & {|} & {18} \\ {3} & {-2} & {1} & {|} & {-1} \\ {-4} & {3} & {-2} & { |} & {3}\end{array}\right] \\ &\rightarrow R_2=R_2-\frac{1}{2}R_1 \rightarrow \left[\begin{array}{rrrrr}{6} & {1} & {-3} & {|} & {18} \\ {0} & {-\frac{5}{2}} & {\frac{5}{2}} & { |} & {-10} \\ {-4} & {3} & {-2} & { |} & {3}\end{array}\right] \\ &\rightarrow R_3=R_3-\left(-\frac{2}{3}\right)R_1 \rightarrow \left[\begin{array}{rrrrr}{6} & {1} & {-3} & {|} & {18} \\ {0} & {-\frac{5}{2}} & {\frac{5}{2}} & { |} & {-10} \\ {0} & \color{red}{\frac{11}{3}} & {-4} & { |} & {15}\end{array}\right]\\ &\rightarrow R_3\leftrightarrow R_2 \rightarrow \left[\begin{array}{rrrrr}{6} & {1} & {-3} & {|} & {18} \\ {0} & {\frac{11}{3}} & {-4} & { |} & {15} \\ {0} & {-\frac{5}{2}} & {\frac{5}{2}} & { |} & {-10}\end{array}\right]\\ &\rightarrow R_3=R_3-\left(-\frac{15}{22}\right)R_2 \rightarrow \left[\begin{array}{rrrrr}{6} & {1} & {-3} & {|} & {18} \\ {0} & {\frac{11}{3}} & {-4} & { |} & {15} \\ {0} & {0} & {-\frac{5}{22}} & { |} & {\frac{5}{22}}\end{array}\right]. \end{align*}\]

So the equivalent system is:

\[\begin{align*} 6x +y-3z &=18 \\ \frac{11}{3} y -4z &=15 \\ -\frac{5}{22} z &=\frac{5}{22}. \end{align*}\]

The solution is

\[\begin{equation*} x = 2, \, y = 3, \, z =-1. \end{equation*}\]

The following Python function GEPP performs Gaussian elimination with partial pivoting. Both the coefficient matrix and right-hand side vector \(A\) and \(b\) are changed in place and they together represent the echelon form of the augmented matrix.

def GEPP(A, b):
    """
    Gaussian elimination with partial pivoting. 
    The coefficient matrix A is modified in place.
    A: the coefficient matrix of size n x n
    b: the right hand side of size n    
    Return:
    None. The solution is stored in b
    """
    if A.shape[0] != A.shape[1]:
        print('Error: the given coefficient matrix is not square')
        return  
     
    n = A.shape[0]

    for j in range(n-1):
        # find p
        p = np.argmax(np.abs(A[j:,j]))        
        if p+j != j:
            # change rows p and j of A and b:
            A[[p+j, j]] = A[[j, p+j]]
            b[[p+j, j]] = b[[j, p+j]]
        for i in range(j+1, n):
            # The multiplier
            mp = A[i,j]/A[j,j]
            A[i,j] = 0.
            b[i] = b[i] - mp*b[j]
            for k in range(j+1,n):
                A[i,k] = A[i,k] - mp*A[j,k]

We apply GEPP to the linear system in Example 24, and print out the matrix \(A\) and vector \(b\) at the end. The final solution is the same as obtained earlier.

A = np.array([[3., -2, 1], [6, 1, -3], [-4, 3, -2]])
b = np.array([-1., 18, 3])
print('Solve the linear system: ')
print(A, 'x = ', b)
GEPP(A, b)
print('After Gaussian elimination with partial pivoting, A becomes:')
print(A)
print('After Gaussian elimination with partial pivoting, b becomes:')
print(b)
back_sub(A, b)
print('The solution is ', b)
Solve the linear system: 
[[ 3. -2.  1.]
 [ 6.  1. -3.]
 [-4.  3. -2.]] x =  [-1. 18.  3.]
After Gaussian elimination with partial pivoting, A becomes:
[[ 6.          1.         -3.        ]
 [ 0.          3.66666667 -4.        ]
 [ 0.          0.         -0.22727273]]
After Gaussian elimination with partial pivoting, b becomes:
[18.         15.          0.22727273]
The solution is  [ 2.  3. -1.]

PA=LU factorization#

The partial pivoting algorithm leads to a new matrix factorization for \(A\), which takes the form of

\[\begin{equation*} PA=LU \end{equation*}\]

where \(L\) and \(U\) have the same forms as defined in LU factorization, and \(P\) is a permutation matrix.

Definition 7

An \(n\times n\) permutation matrix is such that each column or row contains only one 1 and the rest are 0’s.

For example,

\[\begin{equation*} \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \end{equation*}\]

is a \(2\times 2\) permutation matrix and

\[\begin{equation*} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} \end{equation*}\]

is a \(3\times 3\) permutation matrix.

A permutation matrix can be obtained by a sequence of exchanges of rows or columns of the identity matrix of the same dimension. For example,

\[\begin{equation*} \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix} \end{equation*}\]

is obtained by first exchanging the first two rows of the \(3\times 3\) identity matrix \(I_3\), and then exchanging the first and third columns of the resulting matrix.

The algorithm for PA=LU factorization is similar to LU factorization. The multipliers of elementary row operations are stored in \(A\). To obtain the permutation matrix \(P\), we start from an identity matrix and perform row exchanges to the identity matrix and the subsequent matrices whenever row exchanges are needed for \(A\). The following example illustrates the process.

Example 25

Find PA=LU factorization for the coefficient matrix in Example 24.

Solution.

Initially, let \(P=I_3\),

\[\begin{equation*} P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \end{equation*}\]

The first step of Gaussian elimination with partial pivoting involves exchanging the first two rows:

\[\begin{equation*} \left[\begin{array}{rrrr}{3} & {-2} & {1} \\ \color{red}{6} & {1} & {-3} \\ {-4} & {3} & {-2} \end{array}\right] \rightarrow R_2\leftrightarrow R_1 \rightarrow \left[\begin{array}{rrrr} {6} & {1} & {-3} \\ {3} & {-2} & {1} \\ {-4} & {3} & {-2} \end{array}\right]. \end{equation*}\]

Accordingly, we update \(P\) by switching its first two rows:

\[\begin{equation*} P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \rightarrow R_2\leftrightarrow R_1 \rightarrow \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \end{equation*}\]

Now eliminate the two numbers below the pivot (\((1,1)\) position), and store the multipliers in those positions:

\[\begin{align*} \left[\begin{array}{rrrr} {6} & {1} & {-3} \\ {3} & {-2} & {1} \\ {-4} & {3} & {-2} \end{array}\right]&\rightarrow R_2=R_2-\frac{1}{2}R_1 \rightarrow \left[\begin{array}{rrrrr}{6} & {1} & {-3} \\ {\fbox{1/2}} & {-\frac{5}{2}} & {\frac{5}{2}} \\ {-4} & {3} & {-2} \end{array}\right] \\ &\rightarrow R_3=R_3-\left(-\frac{2}{3}\right)R_1 \rightarrow \left[\begin{array}{rrrrr}{6} & {1} & {-3} \\ \fbox{1/2} & {-\frac{5}{2}} & {\frac{5}{2}} \\ {\fbox{-2/3}} & \color{red}{\frac{11}{3}} & {-4} \end{array}\right]. \end{align*}\]

The next step involves swapping the second and third rows of \(A\)

\[\begin{equation*} \left[\begin{array}{rrrrr}{6} & {1} & {-3} \\ \fbox{1/2} & {-\frac{5}{2}} & {\frac{5}{2}} \\ {\fbox{-2/3}} & \color{red}{\frac{11}{3}} & {-4} \end{array}\right] \rightarrow R_2\leftrightarrow R_3 \rightarrow \left[\begin{array}{rrrrr}{6} & {1} & {-3} \\ {\fbox{-2/3}} & {\frac{11}{3}} & {-4} \\ \fbox{1/2} & {-\frac{5}{2}} & {\frac{5}{2}} \end{array}\right]. \end{equation*}\]

Correspondingly we update \(P\):

\[\begin{equation*} P = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \rightarrow R_2\leftrightarrow R_3 \rightarrow \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix}. \end{equation*}\]

Now eliminate the \((3,2)\) position:

\[\begin{equation*} \left[\begin{array}{rrrrr}{6} & {1} & {-3} \\ {\fbox{-2/3}} & {\frac{11}{3}} & {-4} \\ \fbox{1/2} & {-\frac{5}{2}} & {\frac{5}{2}} \end{array}\right] \rightarrow R_3=R_3-\left(-\frac{15}{22}\right)R_2 \rightarrow \left[\begin{array}{rrrrr}{6} & {1} & {-3} \\ {\fbox{-2/3}} & {\frac{11}{3}} & {-4} \\ \fbox{1/2} & \fbox{-15/22} & {-\frac{5}{22}} \end{array}\right]. \end{equation*}\]

The algorithm is now completed with

\[\begin{equation*} P = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix}, \, L = \begin{bmatrix} 1 & 0 & 0 \\ {-2/3} & 1 & 0 \\ 1/2 & -15/22 & 1 \end{bmatrix}, \, U = \left[\begin{array}{rrrrr}{6} & {1} & {-3} \\ {0} & {\frac{11}{3}} & {-4} \\ 0 & 0 & {-\frac{5}{22}} \end{array}\right]. \end{equation*}\]

It is easy to verify that \(PA=LU\).

Solving linear systems with PA=LU factorization#

The linear system \(A\boldsymbol{x} = \boldsymbol{b}\) can be rewritten as

\[\begin{equation*} PA\boldsymbol{x} = P\boldsymbol{b}. \end{equation*}\]

Using the PA=LU factorization, the linear system reads

\[\begin{equation*} LU\boldsymbol{x} = P\boldsymbol{b}, \end{equation*}\]

which can be written as two linear systems:

\[\begin{align*} L\boldsymbol{c} &= P\boldsymbol{b} \\ U\boldsymbol{x} &= \boldsymbol{c}. \end{align*}\]

The following Python function PALU performs PA=LU factorization. The \(L\) and \(U\) factors are stored in \(A\) as in LU factorization, while the permutation matrix \(P\) is returned.

def PALU(A):
    """
    PA=LU factorization with partial pivoting. 
    The coefficient matrix A is modified in place.
    The lower triangular part of A represents the L matrix, the upper triangular part 
    (including the diagonal) represents U
    A: the coefficient matrix of size n x n
    output:
    Permutation matrix $P$ is returned, along with $A$ that is changed in place
    """
    if A.shape[0] != A.shape[1]:
        print('Error: the given coefficient matrix is not square')
        return   
     
    n = A.shape[0]
    P = np.eye(n)

    for j in range(n-1):
        # find p
        p = np.argmax(np.abs(A[j:,j]))        
        if p+j != j:
            # change rows p and j of A. Update P:
            A[[p+j, j]] = A[[j, p+j]]
            P[[p+j, j]] = P[[j, p+j]]   
        for i in range(j+1, n):
            # The multiplier
            mp = A[i,j]/A[j,j]
            A[i,j] = mp
            for k in range(j+1,n):
                A[i,k] = A[i,k] - mp*A[j,k]
    return P

Applying PALU to the coefficient matrix in Example 24 gives the same result.

A = np.array([[3., -2, 1], [6, 1, -3], [-4, 3, -2]])
print('Perform PA=LU factorization: ')
print('Before PA=LU factorization, A = ')
print(A)

P = PALU(A)
print('The PA=LU factorization result is ')
print('After PA=LU factorization, A = ')
print(A)
print('The permutation matrix P = ')
print(P)
Perform PA=LU factorization: 
Before PA=LU factorization, A = 
[[ 3. -2.  1.]
 [ 6.  1. -3.]
 [-4.  3. -2.]]
The PA=LU factorization result is 
After PA=LU factorization, A = 
[[ 6.          1.         -3.        ]
 [-0.66666667  3.66666667 -4.        ]
 [ 0.5        -0.68181818 -0.22727273]]
The permutation matrix P = 
[[0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]]

We also use PA=LU factorization to solve the linear system in Example 24. Again, the same result is obtained.

A = np.array([[3., -2, 1], [6, 1, -3], [-4, 3, -2]])
b = np.array([-1., 18, 3])
P = PALU(A)
# First solve for c
Pb = np.dot(P,b)
forward_sub(A, Pb, True)
# Solve for x
back_sub(A, Pb)
print('The solution is: ')
print(Pb)
The solution is: 
[ 2.  3. -1.]

Full pivoting and PAQ=LU factorization#

The partial pivoting strategy for Gaussian elimination only considers candidates that are in the same column as the pivots. However, for the \((k,k)\) pivot position, \(1\le k\le n\), if we consider the submatrix \((a_{ij}), i\ge k, j\ge k\), i.e., all positions that are not above and not to the left of the pivot, as candidates to be placed at the pivot position, the strategy is known as full pivoting. Full pivoting is more stable, since at each step, it considers a wider range of candidates for the pivot position, and the multipliers can be even smaller compared to partial pivoting.

The algorithm works as follows:

  1. The first step: instead of using \(a_{11}\) as the pivot, select the \(p\)th row, and the \(q\)th column, where

    \[\begin{equation*} |a_{pq}| = \max_{1\le i\le n, 1\le j\le n}\{|a_{ij}|\}, \end{equation*}\]

    exchange row \(p\) and row \(1\), and exchange column \(q\) and column \(1\), so now \(a_{pq}\) is in the \((1,1)\) pivot position. Apply Gaussian elimination.

  2. The second step: instead of using \(a_{22}\) as the pivot, select the \(p\)th row, and the \(q\)th column, where

    \[\begin{equation*} |a_{pq}| = \max_{2\le i\le n, 2\le j\le n}\{|a_{ij}|\}, \end{equation*}\]

    exchange row \(p\) and row \(2\), and exchange column \(q\) and column \(2\), so now \(a_{pq}\) is in the \((2,2)\) pivot position. Apply Gaussian elimination.

  3. The rest of the pivots can be found in a similar fashion.

Now we will have two permutation matrices \(P\) and \(Q\), which correspond to the row and column exchanges. The factorization corresponding to Gaussian elimination with full pivoting is in the form of

\[\begin{equation*} PAQ = LU \end{equation*}\]

which can be obtained in a similar fashion to the PA=LU factorization. The permutation matrix \(Q\) is obtained by performing the same sequence of column exchanges to an identity matrix.

Below is the Python function for PAQ=LU factorization. The returned outputs are the two permutation matrices \(P\) and \(Q\).

def PAQLU(A):
    """
    PAQ=LU factorization with full pivoting. 
    The coefficient matrix A is modified in place.
    The lower triangular part of A represents the L matrix, the upper triangular part 
    (including the diagonal) represents U
    A: the coefficient matrix of size n x n
    output:
    Permutation matrices $P$, $Q$ are returned, 
    along with $A$ that is changed in place
    """
    if A.shape[0] != A.shape[1]:
        print('Error: the given coefficient matrix is not square')
        return   
     
    n = A.shape[0]
    P = np.eye(n)
    Q = np.eye(n)

    for j in range(n-1):
        # find p and q so that the (p,q) position has the largest
        # value of the |A|
        p,q = np.unravel_index(np.argmax(np.abs(A[j:,j:])), 
                               np.abs(A[j:,j:]).shape)         
        if p+j != j:
            # exchange rows p and j of A. Update P:
            A[[p+j, j]] = A[[j, p+j]]
            P[[p+j, j]] = P[[j, p+j]]   
        if q+j != j:
            # exchange columns q and j of A. Update Q:
            A[:, [q+j, j]] = A[:, [j, q+j]]
            Q[:, [q+j, j]] = Q[:, [j, q+j]] 
        for i in range(j+1, n):
            # The multiplier
            mp = A[i,j]/A[j,j]
            A[i,j] = mp
            for k in range(j+1,n):
                A[i,k] = A[i,k] - mp*A[j,k]
    return P, Q

Apply PAQ=LU factorization to the coefficient matrix in Example 24:

A = np.array([[3., -2, 1], [6, 1, -3], [-4, 3, -2]])
print('Perform PAQ=LU factorization: ')
print('Before PAQ=LU factorization, A = ')
print(A)

P,Q = PAQLU(A)
print('The PAQ=LU factorization result is ')
print('After PAQ=LU factorization, A = ')
print(A)
print('The permutation matrix P = ')
print(P)
print('The permutation matrix Q = ')
print(Q)
Perform PAQ=LU factorization: 
Before PAQ=LU factorization, A = 
[[ 3. -2.  1.]
 [ 6.  1. -3.]
 [-4.  3. -2.]]
The PAQ=LU factorization result is 
After PAQ=LU factorization, A = 
[[ 6.         -3.          1.        ]
 [-0.66666667 -4.          3.66666667]
 [ 0.5        -0.625      -0.20833333]]
The permutation matrix P = 
[[0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]]
The permutation matrix Q = 
[[1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]]

Analytically, the PAQ=LU factorization for \(A\) is:

\[\begin{equation*} \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} 3 & -2 & 1 \\ 6 & 1 & -3 \\ -4 & 3 & -2 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ -\frac{2}{3} & 1 & 0 \\ \frac{1}{2} & -\frac{5}{8} & 1 \end{bmatrix} \begin{bmatrix} 6 & -3 & 1 \\ 0 & -4 & \frac{11}{3} \\ 0 & 0 & -\frac{5}{24} \end{bmatrix}. \end{equation*}\]

Exercise 3.1-3

For the linear system

\[\begin{align*} 2x +2y -3z &= 1 \\ 4x -2y + 2z &= 4 \\ 2x +4y -z &=5 \end{align*}\]

find the LU factorizaton, PA=LU factorization, and PAQ=LU factorization of the coefficient matrix. Use LU factorization and PA=LU factorization to solve the linear system.

Condition number#

For a function of one variable, \(y=f(x)\), where \(x\) is the independent variable, and \(y\) is the dependent variable, one is often interested in how \(y\) responds to a small change in \(x\). If a small change in \(x\) leads to a large change in \(y\), then \(y\) is said to be sensitive to \(x\); otherwise \(y\) is insensitive to \(x\).

In the problem of solving a linear system, similarly we are interested in how the change in the independent variable, \(\boldsymbol{b}\), affect the dependent variable \(\boldsymbol{x}\). Considering a perturbation of \(\boldsymbol{b}\), \(\hat{\boldsymbol{b}}=\boldsymbol{b}+\delta\boldsymbol{b}\), the ratio of the relative change in the magnitude (norm) of \(\boldsymbol{x}\) to the relative change in the magnitude of \(\boldsymbol{b}\) can be expressed as

\[\begin{equation*} \frac{\frac{||\delta\boldsymbol{x}||}{||\boldsymbol{x}||}}{\frac{||\delta\boldsymbol{b}||}{||\boldsymbol{b}||}} = \frac{\frac{||A^{-1}\hat{\boldsymbol{b}}-A^{-1}\boldsymbol{b}||}{||\boldsymbol{x}||}}{\frac{||\delta\boldsymbol{b}||}{||\boldsymbol{b}||}} = \frac{\frac{||A^{-1}\delta\boldsymbol{b}||}{||\boldsymbol{x}||}}{\frac{||\delta\boldsymbol{b}||}{||\boldsymbol{b}||}} = \frac{||A^{-1}\delta\boldsymbol{b}||}{||\delta\boldsymbol{b}||}\frac{||\boldsymbol{b}||}{||\boldsymbol{x}||} = \frac{||A^{-1}\delta\boldsymbol{b}||}{||\delta\boldsymbol{b}||}\frac{||\boldsymbol{b}||}{||A^{-1}\boldsymbol{b}||}. \end{equation*}\]

Here \(||\delta\boldsymbol{x}||\) is called forward error, \(\frac{||\delta\boldsymbol{x}||}{||\boldsymbol{x}||}\) is called relative forward error, \(||\delta\boldsymbol{b}||\) is called backward error, \(\frac{||\delta\boldsymbol{b}||}{||\boldsymbol{b}||}\) is called relative backward error, and the quantity \(\frac{\frac{||\delta\boldsymbol{x}||}{||\boldsymbol{x}||}}{\frac{||\delta\boldsymbol{b}||}{||\boldsymbol{b}||}}\) is called error magnification factor.

To find the maximum of error magnification factor, notice that

\[\begin{align*} \max_{\boldsymbol{b}, \delta\boldsymbol{b}}\frac{\frac{||\delta\boldsymbol{x}||}{||\boldsymbol{x}||}}{\frac{||\delta\boldsymbol{b}||}{||\boldsymbol{b}||}} &= \max_{\boldsymbol{b}, \delta\boldsymbol{b}}\frac{||A^{-1}\delta\boldsymbol{b}||}{||\delta\boldsymbol{b}||}\frac{||\boldsymbol{b}||}{||A^{-1}\boldsymbol{b}||} = \max_{\delta\boldsymbol{b}}\frac{||A^{-1}\delta\boldsymbol{b}||}{||\delta\boldsymbol{b}||} \max_{\boldsymbol{b}}\frac{||\boldsymbol{b}||}{||A^{-1}\boldsymbol{b}||} \\ &= \max_{\delta\boldsymbol{b}}\frac{||A^{-1}\delta\boldsymbol{b}||}{||\delta\boldsymbol{b}||} \max_{\boldsymbol{x}}\frac{||\boldsymbol{A}\boldsymbol{x}||}{||\boldsymbol{x}||} = ||A^{-1}||_m\cdot||A||_m \end{align*}\]

where the notation \(||\cdot||_m\) denotes the \(||\cdot||\) (vector norm) induced matrix norm.

Definition 8

The quantity \(||A^{-1}||_m\cdot||A||_m\) is called the condition number of matrix \(A\).

A large condition number indicates that a small change in \(\boldsymbol{b}\) incurs a large change in \(\boldsymbol{x}\), the solution to the linear system. Such matrices are called ill-conditioned; otherwise, the matrices are called well-conditioned. Solving a linear system whose coefficient matrix is ill-conditioned is subject to large error. As a rule of thumb, supposing the condition number of matrix \(A\) is \(O(10^k)\), then up to \(k\) digits of accuracy is lost in solving a linear system that has \(A\) as the coefficient matrix. With double precision, this means about \(16-k\) digits of accuracy is expected.

Example 26

Consider the \(8\times 8\) Hilbert matrix

\[\begin{equation*} H = \begin{bmatrix} 1 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8}\\ \frac{1}{2} & \frac{1}{3} & \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9}\\ \frac{1}{3} & \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} \\ \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} \\ \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} \\ \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} \\ \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} \\ \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} \end{bmatrix}. \end{equation*}\]

In general, the position \((i,j)\) of the Hilbert matrix has the value \(\frac{1}{i+j-1}\).

Let the true solution be \(\boldsymbol{x}=[1,1,1,1,1,1,1,1]^T\), and create the right hand side vector \(\boldsymbol{b}=H\boldsymbol{x}\). Use Python to solve the system \(H\boldsymbol{x}=\boldsymbol{b}\). How accurate are the solutions?

# Verify the effect of condition number
n = 8
H = np.zeros((n,n))
for i in range(n):
    for j in range(n):
        H[i,j] = 1./((i+1)+(j+1)-1.)
b = np.dot(H,np.ones(n,))

print('The condition number of H is: {0:12.9e}'.format(np.linalg.cond(H)))

x = np.linalg.solve(H, b)
print('The solution x is: ')
print("[{0:19.16f} {1:19.16f} {2:19.16f} {3:19.16f} "
      "{4:19.16f} {4:19.16f} {4:19.16f} {4:19.16f}]".format(x[0], x[1], x[2], 
      x[3], x[4], x[5], x[6], x[7]))
print('The error vector of the solution is: ')
print("[{0:19.16e} {1:19.16e} {2:19.16e} {3:19.16e} {4:19.16e} {4:19.16e} "
      "{4:19.16e} {4:19.16e}]".format(np.abs(x[0]-1), np.abs(x[1]-1), np.abs(x[2]-1), 
                                      np.abs(x[3]-1), np.abs(x[4]-1), np.abs(x[5]-1),
                                      np.abs(x[6]-1), np.abs(x[7]-1)))
The condition number of H is: 1.525757557e+10
The solution x is: 
[ 1.0000000000124938  0.9999999992609565  1.0000000102814219  0.9999999418227916  1.0000001619251886  1.0000001619251886  1.0000001619251886  1.0000001619251886]
The error vector of the solution is: 
[1.2493783785316737e-11 7.3904349306985750e-10 1.0281421936042534e-08 5.8177208384080359e-08 1.6192518859092786e-07 1.6192518859092786e-07 1.6192518859092786e-07 1.6192518859092786e-07]

The condition number of the \(8\times 8\) Hilbert matrix is about \(10^{10}\), so \(16-10=6\) digits of accuracy is expected, which is verified by the printed errors, the largest one of which is \(1.62\times 10^{-7}\).