3.3 Cholesky decomposition#
We now consider a special form of LU decomposition, when the matrix satisfies certain properties.
A real matrix \(A\) is positive definite if \(\boldsymbol{v}^TA\boldsymbol{v}>0\) for any nonzero vector \(\boldsymbol{v}\).
For a symmetric positive definite (SPD) matrix \(A\), the LU decomposition has the particular form of
where \(R\) is an upper triangular matrix. Such a decomposition is called Cholesky decomposition.
An SPD matrix has the following properties:
Eigenvalues of an SPD matrix are positive
The determinant of an SPD matrix is positive
Any principal submatrix of an SPD matrix is SPD. (A principal submatrix is a submatrix whose diagonal elements are also those of the original matrix.)
We use a \(2 \times 2\) example to inspire us to the algorithm for a general \(n \times n\) SPD matrix.
For a general \(2 \times 2\) SPD matrix,
the factor \(R\) in the Cholesky decomposition \(A=R^TR\) has the following general form:
Then we have
An elementwise comparison of the elements in \(R^TR\) and \(A\) gives the following system of equations
which leads to
Note that \(R\) is not unique, since a negative sign can be chosen for each of the two square roots in Eqns. (21) and (22).
Find the Cholesky decomposition for the \(2 \times 2\) SPD matrix
Solution.
So the Cholesky decomposition is
The Cholesky decomposition algorithm for a \(2\times 2\) SPD matrix can be extended to a general \(n \times n\) SPD matrix. The extension is done by partitioning the matrix into blocks in an iterative way and the algorithm is summarized below:
Partition the matrix
\[\begin{equation*} A = \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{bmatrix} = \begin{bmatrix} a_{11} & \boldsymbol{b}^T \\ \boldsymbol{b} & C \end{bmatrix} \end{equation*}\]where
\[\begin{equation*} \boldsymbol{b} = \begin{bmatrix} a_{21} \\ \vdots \\ a_{n1} \end{bmatrix} = [a_{12}, \dots, a_{1n}]^T \end{equation*}\]due to the symmetry of \(A\), and
\[\begin{equation*} C = \begin{bmatrix} a_{22} & a_{23} & \dots & a_{2n} \\ a_{32} & a_{33} & \cdots & a_{3n} \\ \vdots & \vdots & \vdots & \vdots \\ a_{n2} & a_{n3} & \dots & a_{nn} \end{bmatrix} \end{equation*}\]is the symmetric lower \((n-1) \times (n-1)\) submatrix of \(A\).
Update the \((1,1)\) position of \(R\) by setting \(R_{11} = \alpha = \sqrt{a_{11}}\) and update \([R_{12},\cdots,R_{1n}]=\boldsymbol{\beta}\) with \(\boldsymbol{b}^T/ \sqrt{a_{11}}\).
Repeat Steps 1 and 2 on the \((n-1) \times (n-1)\) matrix \(A^{(1)}=C-\boldsymbol{\beta}\boldsymbol{\beta}^T\) to update \(R_{22}\) and \([R_{23},\dots, R_{2n}]\). Note that \(\boldsymbol{\beta}\boldsymbol{\beta}^T\) is an outer product that results in an \((n-1) \times (n-1)\) matrix, and \(C-\boldsymbol{\beta}\boldsymbol{\beta}^T\) resembles \(c-\beta^2\) in Eqn. (22) in the \(2\times 2\) case.
The algorithm above for Cholesky decomposition has a cost of \(\frac{1}{3}n^3+\frac{1}{2}n^2+\frac{1}{6}n\) floating point operations (flops).
Proof. The first step involves computing the square root of \(a_{11}\) (1 flop), dividing the vector \([a_{12}, \dots, a_{1n}]\) by the square root (n-1 flops), and computing \(A^{(1)}\), where calculating \(\boldsymbol{\beta}\boldsymbol{\beta}^T\) costs \(\frac{(n-1)n}{2}\) flops, since only the upper or lower triangular part including the diagonal needs to be computed due to symmetry, and so does calculating \(C-\boldsymbol{\beta}\boldsymbol{\beta}^T\). So the total cost of performing the first step is
flops. In general, Step \(k\) costs
Summing the costs of all steps leads to:
where we used the formula
Cholesky decomposition is not unique, unless the restriction that diagonal elements of \(R\) are positive is posed.
Find the Cholesky decomposition for the \(3 \times 3\) SPD matrix
Solution.
Step 1. Update \(R_{11}\) and \(\boldsymbol{\beta}=[R_{12},\dots,R_{1n}]\) using \(A_{11}=1\) and \(\boldsymbol{b}^T=[-3,-2]\):
where \(R^{(1)}\) is the lower \((n-1) \times (n-1)\) submatrix of \(R\).
Step 2. Calculate
Step 3. Repeat Step 1 on \(A^{(1)}\) to update \(R^{(1)}\), i.e., update \(R^{(1)}_{11}\) and \(R^{(1)}_{12}\) by \(1\) and \(3/1=3\):
where \(R^{(2)}\) is a scalar that needs to be updated next. Now \(R\) becomes
Step 4. Calculate \(A^{(2)}=25-3(3)=16\), based on which \(R^{(2)}\) can be computed by \(R^{(2)}=\sqrt{16}=4\). So the final Cholesky factor \(R\) for \(A\) is:
The following Python function chol implements Cholesky factorization for symmetric postive-definite matrices.
import numpy as np
def chol(A):
"""
Perform Cholesky algorithm for a symmetric positive-definite
matrix A
A: symmetric positive-definite matrix. A is changed in place in this code
Return: R, an upper-triangular matrix such that A=R^TR
"""
if not np.array_equal(A.T,A):
print('Error: the matrix A is not symmetric!')
return
n = A.shape[0]
R = np.zeros((n,n))
for k in range(n):
if A[k,k]<0:
print('Error: the matrix is no positive-definite!')
return
R[k,k] = np.sqrt(A[k,k])
u = A[k,k+1:n]/R[k,k]
R[k,k+1:n] = u
A[k+1:n, k+1:n] -= np.outer(u,u)
return R
Apply chol to the following matrix:
A = np.array([[4., -2], [-2, 10]])
R = chol(A)
print('R = ', R)
print('RT x R = ', R.T.dot(R))
R = [[ 2. -1.]
[ 0. 3.]]
RT x R = [[ 4. -2.]
[-2. 10.]]
For the following matrix
the Python function gives
A = np.array([[1., 1, 2], [1, 5, 0], [2, 0, 9]])
R = chol(A)
print('R = ', R)
print('RT x R = ', R.T.dot(R))
R = [[ 1. 1. 2.]
[ 0. 2. -1.]
[ 0. 0. 2.]]
RT x R = [[1. 1. 2.]
[1. 5. 0.]
[2. 0. 9.]]
Exercise 3.3-1
Find the Cholesky decomposition for the following matrix
Exercise 3.3-2
Find the Cholesky decomposition for the following matrix
Exercise 3.3-3
Find the Cholesky decomposition for the following matrix. Does the algorithm work? If not, what is the problem?
Use Cholesky decomposition to solve linear systems#
Cholesky decomposition can be used to solve linear systems in the same fashion as how we used LU decomposition. Let the linear system be \(A\boldsymbol{x}=\boldsymbol{b}\), where \(A\) is symmetric positive-definite, with a Cholesky decomposition \(A=R^TR\). Then the system can be rewritten as \(R^TR\boldsymbol{x}=\boldsymbol{b}\), which can be split into two separate linear systems:
Solving the two linear systems in turn using forward and back substitutions leads to the solution to the original linear system.
Solve the system of linear equations
Solution.
From Example 28, we obtained
So the two systems of linear equations to be solved are:
The first linear system gives
The second leads to
Cholesky decomposition has a wide range of applications, such as generating samples from a general multivariate normal distribution, non-linear optimization, etc., in addition to solving systems of linear equations.
Exercise 3.3-4
Solve the following linear system using Cholesky decomposition:
Conjugate gradient method#
Let \(A\) be an \(n \times n\) symmetric positive definite matrix. Two vectors \(\boldsymbol{u}\in \mathbb{R}^{n}\) and \(\boldsymbol{v}\in \mathbb{R}^{n}\) are A-orthogonal or A-conjugate if \(\boldsymbol{u}^TA\boldsymbol{v}=0\).
Note that the symmetric positive definite matrix \(A\) induces an inner product on \(\mathbb{R}^n\), \(\langle\boldsymbol{u}, \boldsymbol{v} \rangle_A=\boldsymbol{u}^TA\boldsymbol{v}\).
Let \(V\) be a vector space. An inner product on \(V\), is a function, \(\langle \cdot, \cdot \rangle\),
such that the following properties hold:
For any \(\boldsymbol{u}\in V\), \(\langle\boldsymbol{u}, \boldsymbol{u}\rangle \ge 0\) and the equality holds only when \(\boldsymbol{u}=\boldsymbol{0}\)
For any \(\boldsymbol{u}, \boldsymbol{v}, \boldsymbol{w}\in V\), \(a, b\in \mathbb{F}\), \(\langle a\boldsymbol{u}+b\boldsymbol{v}, \boldsymbol{w} \rangle = a\langle \boldsymbol{u}, \boldsymbol{w} \rangle + b\langle \boldsymbol{v}, \boldsymbol{w} \rangle\)
For any \(\boldsymbol{u}, \boldsymbol{v}\in V\), \(\langle\boldsymbol{u}, \boldsymbol{v}\rangle = \overline{\langle\boldsymbol{v}, \boldsymbol{u}\rangle}\), where \(\overline{\boldsymbol{u}}\) denotes the conjugate of \(\boldsymbol{u}\).
Now consider solving the linear system
where \(A\in\mathbb{R}^{n \times n}\) and \(\boldsymbol{b}\in\mathbb{R}^{n}\). Suppose \(\boldsymbol{u}_1,\boldsymbol{u}_2, \dots, \boldsymbol{u}_n\) is an A-orthogonal basis of \(\mathbb{R}^{n}\). Let \(\boldsymbol{x}^*\) be the solution to Eqn. (23). Then \(\boldsymbol{x}^*\) can be written as a linear combination of the basis as
The coefficients \(a_i\), \(1\le i\le n\), can be obtained by
Before we get to the conjugate gradient algorithm, we first talk about the well-known Gram-Schmidt Orthogonalization algorithm.
Given a set of linearly independent vectors \(\boldsymbol{v}_1, \boldsymbol{v}_2, \cdots, \boldsymbol{v}_n\in \mathbb{R}^n\), the following process generates a set of orthogonal vectors:
\(\boldsymbol{u}_1 = \boldsymbol{v}_1\)
\(\boldsymbol{u}_2 = \boldsymbol{v}_2 - \frac{\langle \boldsymbol{v}_2, \boldsymbol{u}_1 \rangle}{\langle \boldsymbol{u}_1, \boldsymbol{u}_1 \rangle} \boldsymbol{u}_1 = \boldsymbol{v}_2 - \frac{\boldsymbol{v}_2 \cdot \boldsymbol{u}_1}{||\boldsymbol{u}_1||_2^2}\boldsymbol{u}_1\)
\(\boldsymbol{u}_3 = \boldsymbol{v}_3 - \frac{\langle \boldsymbol{v}_3, \boldsymbol{u}_1 \rangle}{\langle \boldsymbol{u}_1, \boldsymbol{u}_1 \rangle} \boldsymbol{u}_1 - \frac{\langle \boldsymbol{v}_3, \boldsymbol{u}_2 \rangle}{\langle \boldsymbol{u}_2, \boldsymbol{u}_2 \rangle} \boldsymbol{u}_2 = \boldsymbol{v}_3 - \frac{\boldsymbol{v}_3 \cdot \boldsymbol{u}_1}{||\boldsymbol{u}_1||_2^2}\boldsymbol{u}_1 - \frac{\boldsymbol{v}_3 \cdot \boldsymbol{u}_2}{||\boldsymbol{u}_2||_2^2}\boldsymbol{u}_2\)
Similarly, \(\boldsymbol{u}_4, \dots, \boldsymbol{u}_n\) can be obtained, and \(\boldsymbol{u}_1,\dots, \boldsymbol{u}_n\) is a set of orthogonal vectors.
Orthogonalize the following set of vectors
Solution.
- \[\begin{equation*} \boldsymbol{u_1} = \boldsymbol{v}_1 = \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix}. \end{equation*}\]
- \[\begin{equation*} \boldsymbol{u}_2 = \boldsymbol{v}_2 - \frac{\langle \boldsymbol{v}_2, \boldsymbol{u}_1 \rangle}{\langle \boldsymbol{u}_1, \boldsymbol{u}_1 \rangle} \boldsymbol{u}_1 = \begin{bmatrix} -1 \\ 2 \\ 3 \end{bmatrix} - \frac{[-1,2,3] \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix} }{1^2+2^2+2^2} \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix} = \begin{bmatrix} -2 \\ 0 \\ 1 \end{bmatrix}. \end{equation*}\]
- \[\begin{equation*} \boldsymbol{u}_3 = \boldsymbol{v}_3 - \frac{\langle \boldsymbol{v}_3, \boldsymbol{u}_1 \rangle}{\langle \boldsymbol{u}_1, \boldsymbol{u}_1 \rangle} \boldsymbol{u}_1 - \frac{\langle \boldsymbol{v}_3, \boldsymbol{u}_2 \rangle}{\langle \boldsymbol{u}_2, \boldsymbol{u}_2 \rangle} \boldsymbol{u}_2 = \begin{bmatrix} 3 \\ 0 \\ 3 \end{bmatrix} - \frac{[3,0,3] \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix} }{1^2+2^2+2^2} \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix} - \frac{[3,0,3] \begin{bmatrix} -2 \\ 0 \\ 1 \end{bmatrix} }{(-2)^2+0^2+1^2} \begin{bmatrix} -2 \\ 0 \\ 1 \end{bmatrix} = \begin{bmatrix} \frac{4}{5} \\ -2 \\ \frac{8}{5} \end{bmatrix}. \end{equation*}\]
Similarly, we can generate a set of A-orthogonal vectors from a set of linearly independent vectors by slightly modifying the algorithm above.
Given a set of linearly independent vectors \(\boldsymbol{v}_1, \boldsymbol{v}_2, \cdots, \boldsymbol{v}_n\in \mathbb{R}^n\), the following process generates a set of A-orthogonal vectors:
\(\boldsymbol{u}_1 = \boldsymbol{v}_1\)
\(\boldsymbol{u}_2 = \boldsymbol{v}_2 - \frac{\langle \boldsymbol{v}_2, \boldsymbol{u}_1 \rangle_A}{\langle \boldsymbol{u}_1, \boldsymbol{u}_1 \rangle_A} \boldsymbol{u}_1 = \boldsymbol{v}_2 - \frac{\boldsymbol{v}_2^T A \boldsymbol{u}_1}{\boldsymbol{u}_1^TA\boldsymbol{u}_1}\boldsymbol{u}_1\)
\(\boldsymbol{u}_3 = \boldsymbol{v}_3 - \frac{\langle \boldsymbol{v}_3, \boldsymbol{u}_1 \rangle_A}{\langle \boldsymbol{u}_1, \boldsymbol{u}_1 \rangle_A} \boldsymbol{u}_1 - \frac{\langle \boldsymbol{v}_3, \boldsymbol{u}_2 \rangle_A}{\langle \boldsymbol{u}_2, \boldsymbol{u}_2 \rangle_A} \boldsymbol{u}_2 = \boldsymbol{v}_3 - \frac{\boldsymbol{v}_3^T A \boldsymbol{u}_1}{\boldsymbol{u}_1^TA\boldsymbol{u}_1}\boldsymbol{u}_1 - \frac{\boldsymbol{v}_3^T A \boldsymbol{u}_2}{\boldsymbol{u}_2^TA\boldsymbol{u}_2}\boldsymbol{u}_2\)
Similarly, \(\boldsymbol{u}_4, \dots, \boldsymbol{u}_n\) can be obtained, and \(\boldsymbol{u}_1,\dots, \boldsymbol{u}_n\) is a set of A-orthogonal vectors.
Now we continue the discussion of using a set of A-orthogonal vectors to find the solution to Eqn. (23). Since the set of A-orthogonal vectors are not ready in the first place, our goal is to generate them on the fly. Suppose we start with an initial guess of the solution \(\boldsymbol{x}_0=\boldsymbol{0}\). The corresponding residual is \(\boldsymbol{u}_0 = \boldsymbol{r}_0=\boldsymbol{b}-A\boldsymbol{x}_0=\boldsymbol{b}\), which serves as the first vector in the set of \(n\) A-orthogonal vectors. Now, we update the estimate of the solution by Eqns. (24) and (25):
where
In the second step, update the residual:
Note that \(\boldsymbol{u}_0\) and \(\boldsymbol{r}_1\) are not A-orthogonal in general, so we apply the Gram-Schmidt algorithm to \(\boldsymbol{r}_1\):
Now \(\boldsymbol{u}_0\) and \(\boldsymbol{u}_1\) are a set of two A-orthogonal vectors. Again, update the estimated solution by Eqns. (24) and (25):
where
The second equality holds since
Note that \(\boldsymbol{x}_1\) only involves vectors that are A-orthogonal to \(\boldsymbol{u}_1\), so the last term in the equation above vanishes.
The algorithm can proceed and it converges with at most \(n\) steps, when a set of \(n\) A-orthogonal vectors (directions) are found. In addition, it is not necessary to start from a zero initial guess \(\boldsymbol{x}_0=\boldsymbol{0}\). If \(\boldsymbol{x}_0\ne\boldsymbol{0}\), the algorithm actually solves the linear system \(A\hat{\boldsymbol{x}}=\boldsymbol{b}-A\boldsymbol{x}_0\), and \(\hat{\boldsymbol{x}}+\boldsymbol{x}_0\) is the solution to the original linear system.
The algorithm can be summarized as follows.
Initialize \(\boldsymbol{x}_0\) as any vector. Set \(\boldsymbol{r}_0=\boldsymbol{b}-A\boldsymbol{x}_0\) and \(\boldsymbol{u}_0=\boldsymbol{r}_0\).
For \(k=0, 1, \dots, n-1\):
\(a_k=\frac{\boldsymbol{u}_k^T \boldsymbol{r}_k}{\boldsymbol{u}_k^TA\boldsymbol{u}_k}\)
\(\boldsymbol{x}_{k+1} = \boldsymbol{x}_k + a_k \boldsymbol{u}_k\)
\(\boldsymbol{r}_{k+1} = \boldsymbol{b}-A\boldsymbol{x}_{k+1}\)
if (\(||\boldsymbol{r}_{k+1}|| < \epsilon\)):
break
\(\boldsymbol{u}_{k+1} = \boldsymbol{r}_{k+1} - \sum_{i<=k}\frac{\boldsymbol{r}_{k+1}^TA\boldsymbol{u}_{i}}{\boldsymbol{u}_{i}^TA\boldsymbol{u}_{i}}\boldsymbol{u}_{i}\)
return \(\boldsymbol{x}_{k+1}\).
Here \(\epsilon\) is a tolerance, and if the norm of the residual is less than it, the algorithm is considered to converge. The algorithm is implemented in the following Python function ConjGrad_v1.
# Conjugate Gradient Method
def ConjGrad_v1(A, b, x0, eps):
"""
Perform conjugate gradient method, the first version.
A: a symmetric positive definite matrix
b: the right-hand side
x0: the initial guess
eps: the tolerance for stopping the algorithm
"""
if not np.array_equal(A.T, A):
print('Error: the matrix A is not symmetric!')
return
n = A.shape[0]
r = np.zeros((n,n+1)) # The ith column stores r_i
u = np.zeros((n,n+1)) # The ith column stores u_i
r[:,0] = b-np.dot(A, x0)
if np.linalg.norm(r) < eps:
return x0
u[:,0] = r[:,0]
x = x0.copy()
for k in range(n):
a = np.dot(u[:,k], r[:,k])/np.dot(u[:,k], np.dot(A,u[:,k]))
x += a*u[:,k]
r[:,k+1] = b - np.dot(A, x)
if np.linalg.norm(r) < eps:
return x
sum = np.zeros(n)
for i in range(k+1):
sum += np.dot(r[:,k+1], np.dot(A, u[:,i]))/np.dot(u[:,i], np.dot(A, u[:,i]))*u[:,i]
u[:,k+1] = r[:,k+1] - sum
return x
Use Conjugate Gradient to solve the following \(2\times 2\) linear system:
A = np.array([[4., -2], [-2, 10]])
b = np.array([4., 34])
x0 = np.zeros(2)
eps = 1e-9
sol = ConjGrad_v1(A, b, x0, eps)
print(sol)
[3. 4.]
We take a closer look at the process using the following animation. Both \(\boldsymbol{u}\) and \(\boldsymbol{r}\) directions are shown, along with the solutions.
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
Show code cell source
# Conjugate Gradient Method for the animation
def ConjGrad(A, b, x0, nstep):
"""
Perform conjugate gradient method, the first version.
A: a symmetric positive definite matrix
b: the right-hand side
x0: the initial guess
nstep: the number of steps of conjugate gradient
"""
if not np.array_equal(A.T, A):
print('Error: the matrix A is not symmetric!')
return
n = A.shape[0]
r_old = b-np.dot(A, x0)
r_new = r_old.copy()
u = r_old
x = x0.copy()
for k in range(nstep):
a = np.dot(r_old, r_old)/np.dot(u, np.dot(A,u))
x += a*u
r_new = r_old - a*np.dot(A, u)
b = np.dot(r_new, r_new)/np.dot(r_old, r_old)
u = r_new + b*u
r_old = r_new.copy()
a = np.dot(r_old, r_old)/np.dot(u, np.dot(A,u))
return x, r_new, u, a
# Animation for the conjugate gradient method
A = np.array([[4., -2], [-2, 10]])
b = np.array([4., 34])
x0 = np.array([0.0, 0.0])
true_sol = np.linalg.solve(A, b)
fig, ax = plt.subplots(figsize=(12,5))
def animate(i):
ax.cla()
ax.set_xlim(-5, 5)
ax.set_ylim(-5, 5)
x, r, u, a = ConjGrad(A, b, x0, i)
ax.plot(x[0], x[1], 'o', ms=8, label='CG')
ax.quiver([x[0]], [x[1]], [a*u[0]], [a*u[1]], angles='xy', scale_units='xy', scale=1,
color='r', label='u direction')
ax.quiver([x[0]], [x[1]], [r[0]], [r[1]], angles='xy',
color='c', label='r direction')
ax.plot(true_sol[0], true_sol[1], 's', color='green', ms=8, label='True solution')
ax.axhline()
ax.axvline()
ax.set_title('Conjugate gradient method: iteration step '+str(i), size=16)
ax.set_xlabel('$x_1$', size=16)
ax.set_ylabel('$x_2$', size=16)
ax.legend(loc='upper right')
plt.close()
anim = FuncAnimation(fig, animate, frames=3, interval=2000)
HTML(anim.to_jshtml())
Solve the linear system shown in Example 29 using the conjugate gradient method.
Solution.
A = np.array([[1., -3, -2], [-3, 10, 9], [-2, 9, 29]])
b = np.array([0., -5, -47])
x0 = np.zeros(3)
eps = 1e-9
sol = ConjGrad_v1(A, b, x0, eps)
print(sol)
[-1. 1. -2.]
Note that the above conjugate gradient algorithm requires spaces to store \(\boldsymbol{r}\) and \(\boldsymbol{u}\), which is very inefficient for large \(n\). The following modification of the conjugate gradient algorithm avoids the need to store the two variables. It can be shown that the two algorithms are equivalent.
(Conjugate Gradient)
Initialize \(\boldsymbol{x}_0\) as any vector. Set \(\boldsymbol{r}_0=\boldsymbol{b}-A\boldsymbol{x}_0\) and \(\boldsymbol{u}_0=\boldsymbol{r}_0\).
For \(k=0, 1, \dots, n-1\):
\(a_k=\frac{\boldsymbol{r}_k^T \boldsymbol{r}_k}{\boldsymbol{u}_k^TA\boldsymbol{u}_k}\)
\(\boldsymbol{x}_{k+1} = \boldsymbol{x}_k + a_k \boldsymbol{u}_k\)
\(\boldsymbol{r}_{k+1} = \boldsymbol{r}_{k}-a_kA\boldsymbol{u}_{k}\)
if (\(||\boldsymbol{r}_{k+1}|| < \epsilon\)):
break
\(\boldsymbol{b}_k = \frac{\boldsymbol{r}_{k+1}^T\boldsymbol{r}_{k+1}}{\boldsymbol{r}_{k}^T\boldsymbol{r}_{k}}\)
\(\boldsymbol{u}_{k+1} = \boldsymbol{r}_{k+1} + \boldsymbol{b}_k\boldsymbol{u}_{k}\)
return \(\boldsymbol{x}_{k+1}\).
We implement the algorithm in the following Python function ConjGrad_v2.
# Conjugate Gradient Method
def ConjGrad_v2(A, b, x0, eps):
"""
Perform conjugate gradient method, the second version.
A: a symmetric positive definite matrix
b: the right-hand side
x0: the initial guess
eps: the tolerance for stopping the algorithm
"""
if not np.array_equal(A.T, A):
print('Error: the matrix A is not symmetric!')
return
n = A.shape[0]
r_old = b-np.dot(A, x0)
r_new = r_old.copy()
if np.linalg.norm(r_old) < eps:
return x0
u = r_old
x = x0.copy()
for k in range(n):
a = np.dot(r_old, r_old)/np.dot(u, np.dot(A,u))
x += a*u
r_new = r_old - a*np.dot(A, u)
if np.linalg.norm(r_new) < eps:
return x
b = np.dot(r_new, r_new)/np.dot(r_old, r_old)
u = r_new + b*u
r_old = r_new.copy()
return x
Applying the new code to Example 32 generates the same result.
A = np.array([[1., -3, -2], [-3, 10, 9], [-2, 9, 29]])
b = np.array([0., -5, -47])
x0 = np.zeros(3)
eps = 1e-9
sol = ConjGrad_v2(A, b, x0, eps)
print(sol)
[-1. 1. -2.]
Exercise 3.3-5
Show that \(\boldsymbol{u}^TA\boldsymbol{v}\) in Definition 11 defines an inner product.
Exercise 3.3-6
Prove Eqn. (25).