5.6 Numerical differentiation

5.6 Numerical differentiation#

The derivative of \(f\) at \(x_{0}\) is

\[\begin{equation*} f'(x_{0})=\lim_{h\rightarrow0}\frac{f(x_{0}+h)-f(x_{0})}{h}. \end{equation*}\]

This formula gives an obvious way to estimate the derivative by

\[\begin{equation*} f'(x_{0})\approx\frac{f(x_{0}+h)-f(x_{0})}{h} \end{equation*}\]

for small \(h\). What this formula lacks, however, is it does not give any information about the error of the approximation.

We will try another approach. Similar to Newton-Cotes quadrature, we will construct the interpolating polynomial for \(f\), and then use the derivative of the polynomial as an approximation for the derivative of \(f\).

Let’s assume \(f\in C^{2}(a,b),x_{0}\in(a,b),\) and \(x_{0}+h\in(a,b).\) Construct the linear Lagrange interpolating polynomial \(p_{1}(x)\) for the data \((x_{0},f(x_{0})),(x_{1},f(x_{1}))=(x_{0}+h,f(x_{0}+h))\). From Theorem 22, we have

\[\begin{align*} f(x) & =\underbrace{\frac{x-x_{1}}{x_{0}-x_{1}}f(x_{0})+\frac{x-x_{0}}{x_{1}-x_{0}}f(x_{1})}_{p_1(x)}+\underbrace{\frac{f''(\xi(x))}{2!}(x-x_{0})(x-x_{1})}_{\text{interpolation error}}\\ & =\frac{x-(x_{0}+h)}{x_{0}-(x_{0}+h)}f(x_{0})+\frac{x-x_{0}}{x_{0}+h-x_{0}}f(x_{0}+h)+\frac{f''(\xi(x))}{2!}(x-x_{0})(x-x_{0}-h)\\ & =\frac{x-x_{0}-h}{-h}f(x_{0})+\frac{x-x_{0}}{h}f(x_{0}+h)+\frac{f''(\xi(x))}{2!}(x-x_{0})(x-x_{0}-h). \end{align*}\]

Now let’s differentiate \(f(x)\):

\[\begin{align*} f'(x)&=-\frac{f(x_{0})}{h}+\frac{f(x_{0}+h)}{h}+f''(\xi(x))\frac{d}{dx}\left[\frac{(x-x_{0})(x-x_{0}-h)}{2}\right]\\ &+\frac{(x-x_{0})(x-x_{0}-h)}{2}\frac{d}{dx}\left[f''(\xi(x))\right]\\ &=\frac{f(x_{0}+h)-f(x_{0})}{h}+\frac{2x-2x_{0}-h}{2}f''(\xi(x))+\frac{(x-x_{0})(x-x_{0}-h)}{2}f'''(\xi(x))\xi'(x). \end{align*}\]

In the above equation, we know \(\xi\) is between \(x_{0}\) and \(x_{0}+h\); however, we have no knowledge about \(\xi'(x)\), which appears in the last term. Fortunately, if we set \(x=x_{0},\) the term with \(\xi'(x)\) vanishes and we get:

\[\begin{equation*} f'(x_{0})=\frac{f(x_{0}+h)-f(x_{0})}{h}-\frac{h}{2}f''(\xi(x_{0})). \end{equation*}\]

This formula is called the forward-difference formula if \(h>0\) and backward-difference formula if \(h<0\). Note that from this formula we can obtain a bound on the error

\[\begin{equation*} \left|f'(x_{0})-\frac{f(x_{0}+h)-f(x_{0})}{h}\right|\leq\frac{h}{2}\sup_{x\in(x_0,x_0+h)}f''(x). \end{equation*}\]

To obtain the forward-difference and backward-difference formulas we started with a linear polynomial interpolant on two points. Using more points and a higher order interpolant gives more accuracy, but also increases the computing time and roundoff error. In general, let \(f\in C^{n+1}[a,b]\) and \(x_{0},x_{1},...,x_{n}\) are distinct numbers in \([a,b]\). We have

\[\begin{gather*} f(x)=\sum_{k=0}^{n}f(x_{k})l_{k}(x)+f^{(n+1)}(\xi)\frac{(x-x_{0})(x-x_{1})\cdots(x-x_{n})}{(n+1)!}\\ \Rightarrow f'(x)=\sum_{k=0}^{n}f(x_{k})l'_{k}(x)+f^{(n+1)}(\xi)\frac{d}{dx}\left[\frac{(x-x_{0})(x-x_{1})\cdots(x-x_{n})}{(n+1)!}\right]\\ +\frac{(x-x_{0})(x-x_{1})\cdots(x-x_{n})}{(n+1)!}\frac{d}{dx}(f^{(n+1)}(\xi)). \end{gather*}\]

If \(x=x_{j}\) for \(j=0,1,...,n,\) the last term vanishes, and using the following result

\[\begin{equation*} \frac{d}{dx}\left[(x-x_{0})(x-x_{1})\cdots(x-x_{n})\right]_{x=x_{j}}=\prod_{k=0,k\neq j}^{n}(x_{j}-x_{k}) \end{equation*}\]

we obtain

(37)#\[\begin{equation} f'(x_{j})=\sum_{k=0}^{n}f(x_{k})l'_{k}(x_{j})+\frac{f^{(n+1)}(\xi(x_{j}))}{(n+1)!}\prod_{k=0,k\neq j}^{n}(x_{j}-x_{k}) \end{equation}\]

which is called the \((n+1)\)-point formula to approximate \(f'(x_{j})\). The most common formulas use \(n=2\) and \(n=4.\) Here we discuss \(n=2\), that is, three-point formulas. The nodes are, \(x_{0},x_{1},x_{2}\). The Lagrange basis polynomials and their derivatives are:

\[\begin{align*} l_{0}(x) & =\frac{(x-x_{1})(x-x_{2})}{(x_{0}-x_{1})(x_{0}-x_{2})}\Rightarrow l'_{0}(x)=\frac{2x-x_{1}-x_{2}}{(x_{0}-x_{1})(x_{0}-x_{2})}\\ l_{1}(x) & =\frac{(x-x_{0})(x-x_{2})}{(x_{1}-x_{0})(x_{1}-x_{2})}\Rightarrow l'_{1}(x)=\frac{2x-x_{0}-x_{2}}{(x_{1}-x_{0})(x_{1}-x_{2})}\\ l_{2}(x) & =\frac{(x-x_{0})(x-x_{1})}{(x_{2}-x_{0})(x_{2}-x_{1})}\Rightarrow l_{2}'(x)=\frac{2x-x_{0}-x_{1}}{(x_{2}-x_{0})(x_{2}-x_{1})} \end{align*}\]

These derivatives can be substituted in (37) to obtain the three-point formula. We can simplify these formulas if the nodes are spaced equally, that is, \(x_{1}=x_{0}+h,x_{2}=x_{1}+h=x_{0}+2h.\) Then, we obtain

(38)#\[\begin{equation} f'(x_{0}) =\frac{1}{2h}\left[-3f(x_{0})+4f(x_{0}+h)-f(x_{0}+2h)\right]+\frac{h^{2}}{3}f^{(3)}(\xi_{0}) \end{equation}\]

(39)#\[\begin{equation} f'(x_{0}+h) =\frac{1}{2h}\left[-f(x_{0})+f(x_{0}+2h)\right]-\frac{h^{2}}{6}f^{(3)}(\xi_{1}) \end{equation}\]

(40)#\[\begin{equation} f'(x_{0}+2h) =\frac{1}{2h}\left[f(x_{0})-4f(x_{0}+h)+3f(x_{0}+2h)\right]+\frac{h^{2}}{3}f^{(3)}(\xi_{2}) \end{equation}\]

It turns out that the first and third equations (38) and (40) are equivalent. To see this, first substitute \(x_{0}\) by \(x_{0}-2h\) in the third equation to get (ignoring the error term)

\[\begin{equation*} f'(x_{0}) =\frac{1}{2h}\left[f(x_{0}-2h)-4f(x_{0}-h)+3f(x_{0})\right], \end{equation*}\]

and then set \(h\) to \(-h\) in the right-hand side to get \(\frac{1}{2h}\left[-f(x_{0}+2h)+4f(x_{0}+h)-3f(x_{0})\right]\), which gives us the first equation.

Therefore we have only two distinct equations, (38) and (39). We rewrite these equations below with one modification: in (39), we substitute \(x_{0}\) by \(x_{0}-h\). We then obtain two different formulas for \(f'(x_{0})\):

\[\begin{align*} f'(x_{0}) &=\frac{-3f(x_{0})+4f(x_{0}+h)-f(x_{0}+2h)}{2h}+\frac{h^{2}}{3}f^{(3)}(\xi_{0}) \rightarrow \textcolor{blue}{\text{three-point endpoint formula}}\\ f'(x_{0}) &=\frac{f(x_{0}+h)-f(x_{0}-h)}{2h}-\frac{h^{2}}{6}f^{(3)}(\xi_{1}) \rightarrow \textcolor{blue}{\text{three-point midpoint formula}} \end{align*}\]

The three-point midpoint formula has some advantages: it has half the error of the endpoint formula, and it has one less function evaluation. The endpoint formula is useful if one does not know the value of \(f\) on one side, a situation that may happen if \(x_{0}\) is close to an endpoint.

Example 52

The following table gives the values of \(f(x)=\sin x\). Estimate \(f'(0.1),f'(0.3)\) using an appropriate three-point formula.

\(x\)	\(f(x)\)
0.1	0.09983
0.2	0.19867
0.3	0.29552
0.4	0.38942

Solution.

To estimate \(f'(0.1),\) we set \(x_{0}=0.1,\) and \(h=0.1\). Note that we can only use the three-point endpoint formula.

\[\begin{equation*} f'(0.1)\approx\frac{1}{0.2}\left(-3(0.09983)+4(0.19867)-0.29552\right)=0.99835. \end{equation*}\]

The correct answer is \(\cos0.1=0.995004.\)

To estimate \(f'(0.3)\) we can use the midpoint formula:

\[\begin{equation*} f'(0.3)\approx\frac{1}{0.2}\left(0.38942-0.19867\right)=0.95375. \end{equation*}\]

The correct answer is \(\cos0.3=0.955336\) and thus the absolute error is \(1.59\times10^{-3}.\) If we use the endpoint formula to estimate \(f'(0.3)\) we set \(h=-0.1\) and compute

\[\begin{equation*} f'(0.3)\approx\frac{1}{-0.2}\left(-3(0.29552)+4(0.19867)-0.09983\right)=0.95855 \end{equation*}\]

with an absolute error of \(3.2\times10^{-3}.\)

Exercise 5.6-1

In some applications, we want to estimate the derivative of an unknown function from empirical data. However, empirical data usually come with “noise”, that is, error due to data collection, data reporting, or some other reason. In this exercise we will investigate how stable the difference formulas are when there is noise in the data. Consider the following data obtained from \(y=e^x\). The data is exact to six digits.

\(x\)	1.00	1.01	1.02
\(f(x)\)	2.71828	2.74560	2.77319

We estimate \(f'(1.01)\) using the three-point midpoint formula and obtain \(f'(1.01)=\frac{2.77319-2.71828}{0.02}=2.7455\). The true value is \(f'(1.01)=e^{1.01}=2.74560\), and the relative error due to rounding is \(3.6\times 10^{-5}\).

Next, we add some noise to the data: we increase 2.77319 by 10% to 3.050509, and decrease 2.71828 by 10% to 2.446452. Here is the noisy data:

\(x\)	1.00	1.01	1.02
\(f(x)\)	2.446452	2.74560	3.050509

Estimate \(f'(1.01)\) using the noisy data, and compute its relative error. How does the relative error compare with the relative error for the non-noisy data?

We next want to explore how to estimate the second derivative of \(f\). A similar approach to estimating \(f'\) can be taken and the second derivative of the interpolating polynomial can be used as an approximation. Here we will discuss another approach, using Taylor expansions. Expand \(f\) about \(x_{0}\), and evaluate it at \(x_{0}+h\) and \(x_{0}-h\):

\[\begin{align*} f(x_{0}+h) & =f(x_{0})+hf'(x_{0})+\frac{h^{2}}{2}f''(x_{0})+\frac{h^{3}}{6}f^{(3)}(x_{0})+\frac{h^{4}}{24}f^{(4)}(\xi_{+})\\ f(x_{0}-h) & =f(x_{0})-hf'(x_{0})+\frac{h^{2}}{2}f''(x_{0})-\frac{h^{3}}{6}f^{(3)}(x_{0})+\frac{h^{4}}{24}f^{(4)}(\xi_{-}) \end{align*}\]

where \(\xi_{+}\) is between \(x_{0}\) and \(x_{0}+h,\) and \(\xi_{-}\) is between \(x_{0}\) and \(x_{0}-h.\) Add the equations to get

\[\begin{equation*} f(x_{0}+h)+f(x_{0}-h)=2f(x_{0})+h^{2}f''(x_{0})+\frac{h^{4}}{24}\left[f^{(4)}(\xi_{+})+f^{(4)}(\xi_{-})\right]. \end{equation*}\]

Solving for \(f''(x_{0})\) gives

\[\begin{equation*} f''(x_{0})=\frac{f(x_{0}+h)-2f(x_{0})+f(x_{0}-h)}{h^{2}}-\frac{h^{2}}{24}\left[f^{(4)}(\xi_{+})+f^{(4)}(\xi_{-})\right]. \end{equation*}\]

Note that \(\frac{f^{(4)}(\xi_{+})+f^{(4)}(\xi_{-})}{2}\) is a number between \(f^{(4)}(\xi_{+})\text{ and }f^{(4)}(\xi_{-})\), so from the Intermediate Value Theorem 4, we can conclude there exists some \(\xi\) between \(\xi_{-}\) and \(\xi_{+}\) so that

\[\begin{equation*} f^{(4)}(\xi)=\frac{f^{(4)}(\xi_{+})+f^{(4)}(\xi_{-})}{2}. \end{equation*}\]

Then the above formula simplifies as

\[\begin{equation*} f''(x_{0})=\frac{f(x_{0}+h)-2f(x_{0})+f(x_{0}-h)}{h^{2}}-\frac{h^{2}}{12}f^{(4)}(\xi) \end{equation*}\]

for some \(\xi\) between \(x_{0}-h\) and \(x_{0}+h\).

Numerical differentiation and roundoff error#

Arya and the mysterious black box#

College life is full of mysteries, and Arya faces one in an engineering class: a black box! What is a black box? It is a computer program, or some device, which produces an output when an input is provided. We do not know the inner workings of the system, and hence comes the name black box. Let’s think of the black box as a function \(f\), and represent the input and output as \(x,f(x)\). Of course, we do not have a formula for \(f\).

What Arya’s engineering classmates want to do is compute the derivative information of the black box, that is, \(f'(x)\), when \(x=2.\) (The input to this black box can be any real number.) Students want to use the three-point midpoint formula to estimate \(f'(2)\):

\[\begin{equation*} f'(2) \approx \frac{1}{2h}\left[f(2+h)-f(2-h)\right]. \end{equation*}\]

They argue how to pick \(h\) in this formula. One of them says they should make \(h\) as small as possible, like \(10^{-8}\). Arya is skeptical. She mutters to herself, “I know I slept through some of my numerical analysis lectures, but not all!”

She tells her classmates about the cancellation of leading digits phenomenon, and to make her point more convincing, she makes the following experiment: let \(f(x)=e^x\), and suppose we want to compute \(f'(2)\) which is \(e^2\). Arya uses the three-point midpoint formula above to estimate \(f'(2)\), for various values of \(h\), and for each case she computes the absolute value of the difference between the three-point midpoint estimate and the exact solution \(e^2\). She tabulates her results in the table below. Clearly, smaller \(h\) does not result in smaller error. In this experiment, \(h=10^{-6}\) gives the smallest error.

\(h\)	\(10^{-4}\)	\(10^{-5}\)	\(10^{-6}\)	\(10^{-7}\)	\(10^{-8}\)
Abs. error	\(1.2\times 10^{-8}\)	\(1.9\times 10^{-10}\)	\(5.2\times 10^{-11}\)	\(7.5\times 10^{-9}\)	\(2.1\times 10^{-8}\)

Theoretical analysis#

Numerical differentiation is a numerically unstable problem. To reduce the truncation error, we need to decrease \(h\), which in turn increases the roundoff error due to cancellation of significant digits in the function difference calculation. Let \(e(x)\) denote the roundoff error in computing \(f(x)\) so that \(f(x)=\tilde{f}(x)+e(x),\) where \(\tilde{f}\) is the value computed by the computer. Consider the three-point midpoint formula:

\[\begin{align*} &\left|f'(x_{0})-\frac{\tilde{f}(x_{0}+h)-\tilde{f}(x_{0}-h)}{2h}\right|\\ &=\left|f'(x_{0})-\frac{f(x_{0}+h)-e(x_{0}+h)-f(x_{0}-h)+e(x_{0}-h)}{2h}\right|\\ &=\left|f'(x_{0})-\frac{f(x_{0}+h)-f(x_{0}-h)}{2h}+\frac{e(x_{0}-h)-e(x_{0}+h)}{2h}\right|\\ &=\left|-\frac{h^{2}}{6}f^{(3)}(\xi)+\frac{e(x_{0}-h)-e(x_{0}+h)}{2h}\right| \leq \textcolor{blue}{\frac{h^{2}}{6}M+\frac{\epsilon}{h}} \end{align*}\]

where we assumed \(|f^{(3)}(\xi)|\leq M\) and \(|e(x)|\leq\epsilon.\) To reduce the truncation error \(h^{2}M/6\) one would decrease \(h\), which would then result in an increase in the roundoff error \(\epsilon/h\). An optimal value for \(h\) can be found with these assumptions using Calculus: find the value for \(h\) that minimizes the function \(s(h)=\frac{Mh^{2}}{6}+\frac{\epsilon}{h}.\) The answer is \(h=\sqrt[3]{3\epsilon/M}.\)

Let’s revisit the table Arya presented where \(10^{-6}\) was found to be the optimal value for \(h\). The calculations were done using Python, which reports 15 digits when asked for \(e^2\). Let’s assume all these digits are correct, and thus let \(\epsilon = 10^{-16}\). Since \(f^{(3)}(x)=e^x\) and \(e^2\approx 7.4\), let’s take \(M=7.4\). Then

\[\begin{equation*} h=\sqrt[3]{3\epsilon/M}=\sqrt[3]{3\times 10^{-16}/7.4}\approx 3.4 \times 10^{-6} \end{equation*}\]

which is in good agreement with the optimal value \(10^{-6}\) of Arya’s numerical results.

Exercise 5.6-2

Find the optimal value for \(h\) that will minimize the error for the formula

\[\begin{equation*} f'(x_{0})=\frac{f(x_{0}+h)-f(x_{0})}{h}-\frac{h}{2}f''(\xi) \end{equation*}\]

in the presence of roundoff error, using the approach of Section 5.6 Numerical differentiation.

a) Consider estimating \(f'(1)\) where \(f(x)=x^{2}\) using the above formula. What is the optimal value for \(h\) for estimating \(f'(1),\) assuming that the roundoff error is bounded by \(\epsilon=10^{-16}\) (which is the machine epsilon \(2^{-53}\) in the 64-bit floating point representation).

b) Use Python to compute

\[\begin{equation*} f'_{n}(1)=\frac{f(1+10^{-n})-f(1)}{10^{-n}}, \end{equation*}\]

for \(n=1,2,...,20,\) and describe what happens.

c) Discuss your findings in parts (a) and (b) and how they relate to each other.

Exercise 5.6-3

The function \(\int_{0}^{x}\frac{1}{\sqrt{2\pi}}e^{-t^{2}/2}dt\) is related to the distribution function of the standard normal random variable, a very important distribution in probability and statistics. Often times we want to solve equations like

(41)#\[\begin{equation} \int_{0}^{x}\frac{1}{\sqrt{2\pi}}e^{-t^{2}/2}dt=z \end{equation}\]

for \(x\), where \(z\) is some real number between 0 and 1. This can be done by using Newton’s method to solve the equation \(f(x)=0\) where

\[\begin{equation*} f(x)=\int_{0}^{x}\frac{1}{\sqrt{2\pi}}e^{-t^{2}/2}dt-z. \end{equation*}\]

Note that from Fundamental Theorem of Calculus, \(f'(x)=\frac{1}{\sqrt{2\pi}}e^{-x^{2}/2}.\) Newton’s method will require the calculation of

(42)#\[\begin{equation} \int_{0}^{p_{k}}\frac{1}{\sqrt{2\pi}}e^{-t^{2}/2}dt \end{equation}\]

where \(p_{k}\) is a Newton iterate. This integral can be computed using numerical quadrature. Write a Python code that takes \(z\) as its input, and outputs \(x,\) such that Equation (41) holds. In your code, use the Python codes for Newton’s method and the composite Simpson’s rule you were given in class. For Newton’s method set tolerance to \(10^{-5}\) and \(p_{0}=0.5,\) and for composite Simpson’s rule take \(n=10\), when computing the integrals (42) that appear in Newton iteration. Then run your code for \(z=0.4\) and \(z=0.1\), and report your output.