Simplest proof of Taylor's theorem

Asked

Viewed 70k times

I have for some time been trawling through the Internet looking for an aesthetic proof of Taylor’s theorem.

By which I mean this: there are plenty of proofs that introduce some arbitrary construct: no mention is given of from whence this beast came. and you can logically hack away line by line until the thing is solved. but this kind of proof is ugly. a beautiful proof should rise naturally from the ground.

I’ve seen one proof claiming to do it from the fundamental theorem of calculus. It looked messy.

I’ve seen several attempts to use integration by parts repeatedly. But surely it would be tidier to do this without bringing in all of that extra machinery.

The nicest two approaches seem to involve using the mean value theorem and Rolle’s theorem. but I can’t find a lucid presentation of either approach.

Maybe my brain is unusually stupid, and the approaches on Wikipedia etc are perfectly good enough for everyone else.

Does anyone have a crystal clear understanding of this phenomenon? Or a web-link to such an understanding?

*EDIT*: Eventually a Cambridge mathematician explained it to me in a way that I could understand, and I have written up the proof here. To my mind it is the most instructional proof I have encountered, yet putting it as an answer received mostly downvotes. It seems strange to me that no one else seems to concur. But it should be up to the keenest mathematical minds to choose which answer should be accepted. It shouldn’t be up to me. Therefore I will bow to the wisdom of the community, and accept the currently most-upvoted answer. I have learned from Machine Learning that a “Committee of Experts” outperforms any one expert, and I am certainly no expert.

Start a bounty

Here is an approach that seems rather natural, based on applying the fundamental theorem of calculus successively to $f (x)$ , $f' (t 1)$ , $f ″ (t 2)$ , etc.:

f (x) = f (a) + \int a x f' (t 1) d t 1 = f (a) + \int a x f' (a) d t 1 + \int a x \int a t 1 f ″ (t 2) d t 2 d t 1 = f (a) + \int a x f' (a) d t 1 + \int a x \int a t 1 f ″ (a) d t 2 d t 1 + \int a x \int a t 1 \int a t 2 f ‴ (t 3) d t 3 d t 2 d t 1

Notice that

\int a x f' (a) d t 1 = f' (a) \int a x d t 1 = f' (a) (x - a),

\int a x \int a t 1 f ″ (a) d t 2 d t 1 = f ″ (a) \int a x (t 1 - a) d t 1 = f ″ (a) (x - a) 22,

\int a x \int a t 1 \int a t 2 f ‴ (a) d t 3 d t 2 d t 1 = f ‴ (a) \int a x (t 1 - a) 22 d t 1 = f ‴ (a) (x - a) 33!,

and in general

\int a x \int a t 1 \dots \int a t n - 1 f (n) (a) d t n \dots d t 2 d t 1 = f (n) (a) (x - a) nn! .

By induction, then, one proves

f (x) = P_{n} (x) + R_{n} (x)

where $P n$ is the Taylor polynomial

P n (x) = f (a) + f' (a) (x - a) + f ″ (a) (x - a) 22 + \dots + f (n) (a) (x - a) nn!,

and the remainder $R n (x)$ is represented by nested integrals as

R n (x) = \int a x \int a t 1 \dots \int a t n f (n + 1) (t n + 1) d t n + 1 \dots d t 2 d t 1.

We can establish the Lagrange form of the remainder by applying the intermediate and extreme value theorems, using simple comparisons as follows. Consider the case $x > a$ first. Let $m$ be the minimum value of $f (n + 1)$ on $[a, x]$ , and $M$ the maximum value. Then since

m \leq f (n + 1) (t n + 1) \leq M

for all $t n + 1$ in $[a, x]$ , after $n + 1$ repeated integrations one finds

m (x - a) n + 1 (n + 1)! \leq R n (x) \leq M (x - a) n + 1 (n + 1)! .

But now, notice that the function

t \mapsto f (n + 1) (t) (x - a) n + 1 (n + 1)!

attains the extreme values

m (x - a) n + 1 (n + 1)! an d M (x - a) n + 1 (n + 1)!

at some points in $[a, x]$ . By the intermediate value theorem, there must be some point $t$ between these two points (so $t \in [a, x]$ ) such that

R n (x) = f (n + 1) (t) (x - a) n + 1 (n + 1)! .

This is the Lagrange form of the remainder. If $x < a$ and $n$ is odd, the same proof works. If $x < a$ and $n$ is even, $(x - a) n + 1 < 0$ and the same proof works after reversing some inequalities.

One can motivate this whole approach in a couple of different ways. E.g., one can argue that $(x - a)^{n} / n!$ becomes small for large $n$ , so the remainders $R n (x)$ will become small if the derivatives of $f$ stay bounded, say.

Or, one can reason loosely as follows: $f (x) \approx f (a)$ for $x$ near $a$ . Ask, what is the remainder exactly? Apply the fundamental theorem as above, then approximate the first remainder using the approximation $f' (t 1) \approx f' (a)$ . Repeating, one produces the Taylor polynomials by the pattern of the argument above.

The clearest proof one can find, in my opinion, is the following. Note it is just a generalized mean value theorem!

THM Let $f, g$ be functions defined on a closed interval $[a, b]$ that admit finite $n$ -th derivatives on $(a, b)$ and continuous $n - 1$ -th derivatives on $[a, b]$ . Suppose $c \in [a, b]$ . Then for each $x \neq = c$ in $[a, b]$ there exists $x_{1}$ in the segment joining $c$ and $x_{1}$ such that

(f (x) - k = 0 \sum n - 1 \frac{f ^{(k)} ( c )}{k !} (x - c)^{k}) g^{(n)} (x_{1}) = f^{(n)} (x_{1}) (g (x) - k = 0 \sum n - 1 \frac{g ^{(k)} ( c )}{k !} (x - c)^{k})

PROOF For simplicity assume $c < b$ and $x > c$ . Keep $x$ fixed and consider

F (t) = f (t) + k = 1 \sum n - 1 \frac{f ^{(k)} ( t )}{k !} (x - t)^{k}

G (t) = g (t) + k = 1 \sum n - 1 \frac{g ^{(k)} ( t )}{k !} (x - t)^{k}

for each $t \in [c, x]$ . Then $F, G$ are continuous on $[c, x]$ and admit finite derivative on $(c, x)$ . By the mean value theorem we may write

F^{'} (x_{1}) [G (x) - G (c)] = G^{'} (x_{1}) [F (x) - F (c)]

for $x_{1} \in (c, x)$ . This gives that

F^{'} (x_{1}) [g (x) - G (c)] = G^{'} (x_{1}) [f (x) - F (c)]

since $F (x) = f (x), G (x) = g (x)$ . But we see, by cancelling terms with opposite signs, that

F^{'} (t) = \frac{( x - t ) ^{n - 1}}{( n - 1 )!} f^{(n)} (t)

G^{'} (t) = \frac{( x - t ) ^{n - 1}}{( n - 1 )!} g^{(n)} (t)

which gives the desired formula when plugging $t = x_{1}$ .

COR We get Taylor’s theorem with $g (x) = (x - c)^{n}$ , namely, for some $x_{1}$ we have

(f (x) - k = 0 \sum n - 1 \frac{f ^{(k)} ( c )}{k !} (x - c)^{k}) n! = f^{(n)} (x_{1}) (x - c)^{n}

f (x) = k = 0 \sum n - 1 \frac{f ^{(k)} ( c )}{k !} (x - c)^{k} + \frac{f ^{(n)} ( x _{1} )}{n !} (x - c)^{n}

Note that $g^{(k)} (c) = 0$ if $k = 0, 1, 2 \dots, n - 1$ , $g^{(n)} = n!$ .

The following proof is in Bartle’s Elements of Real Analysis. It’s goal is to exploit Rolle’s Theorem as the more elementary version of the Mean Value Theorem does. To this end, it incorporates a clever use of the product rule.

So, suppose that $f$ denotes a function on $[a, b]$ such that $f$ is $n$ -times differentiable on $[a, b]$ and such that $f$ is $n + 1$ times differentiable on $(a, b)$ . For every $x$ and $y$ distinct from $[a, b]$ we show there is a point $ξ$ strictly between both $x$ and $y$ such that

f (y) = \sum k = 0 n f (k) (x) k! (y - x) k + f (n + 1) (ξ) (n + 1)! (y - x) n + 1.

To prove this, let $α$ denote the real number which satisfies

(y - x) n + 1 (n + 1)! α = f (y) - \sum k = 0 n f (k) (x) k! (y - x) k .

And now define the function $φ$ on $[a, b]$ by

φ (t) = f (y) - \sum k = 0 n f (k) (t) k! (y - t) k + α (n + 1)! (y - t) n + 1 .

We clearly have that $φ (y) = 0$ and, by the definition of $α$ , we have $φ (x) = 0$ . Thus, Rolle’s Theorem implies there is a $ξ$ strictly between $x$ and $y$ such that

φ' (ξ) = 0.

This is where the clever use of the product rule comes in. For when we use the definition of $φ$ and differentiate at $ξ$ , we obtain a telescoping series which, upon simplification, leaves us with

φ' (ξ) = α - f (n + 1) (ξ) n! (y - ξ) n .

This shows that $α = f (n + 1) (ξ)$ as desired.

Let us try and approximate a function by a polynomial in such a way that they coincide closely at the origin. To achieve this, we will require the same value, the same slope, the same curvature and the same higher order derivatives at $0$ .

WLOG we us use a cubic polyomial and we start from

f (x) = p (x) + e (x) = a + b x + c x^{2} + d x^{3} + e (x),

where $e$ is an error term.

Imposing our conditions, we need as many equations as there are unknown coefficients

f (0) = a + e (0), f^{'} (0) = b + e^{'} (0), f^{''} (0) = 2 c + e^{''} (0), f^{'''} (0) = 3! d + e^{'''} (0) .

Lastly,

f^{''''} (x) = e^{''''} (x) .

To achieve a small error, we ensure $e (0) = e^{'} (0) = e^{''} (0) = e^{'''} (0)$ , and set $a = f (0), b = f' (0), 2 c = f ″ (0), 3! d = f ‴ (0)$ . This gives us the Taylor coefficients. We now have to bound the error term.

Assuming that $∣ f^{''''} (x) ∣ = ∣ e^{''''} (x) ∣ \leq M$ in the range $[0, h]$ , by integration

∣ e^{'''} (x) ∣ = \int_{0}^{x} e^{''''} (x) d x + e^{'''} (0) \leq M x, ∣ e^{''} (x) ∣ = \int_{0}^{x} e^{'''} (x) d x + e^{''} (0) \leq \int_{0}^{x} M x d x = \frac{M x ^{2}}{2}, ∣ e^{'} (x) ∣ = \int_{0}^{x} e^{''} (x) d x + e^{'} (0) \leq \int_{0}^{x} \frac{M x ^{2}}{2} d x = \frac{M x ^{3}}{3 !}, ∣ e (x) ∣ = \int_{0}^{x} e^{'} (x) d x + e (0) \leq \int_{0}^{x} \frac{M x ^{3}}{3 !} d x = \frac{M x ^{4}}{4 !} .

To summarize, for $x \in [0, h]$ ,

f (x) - f (0) - f^{'} (0) x - f^{''} (0) \frac{x ^{2}}{2} - f^{'''} (0) \frac{x ^{3}}{3 !} \leq M \frac{h ^{4}}{4 !},

where $∣ f ⁗ (x) ∣ \leq M$ .

My personal favorite is the proof which uses L’Hopital’s rule. It is without a doubt one of the lightest proofs for it, and in my own view one of the more elegant. This proof below is quoted straight out of the related Wikipedia page:

Let:

$h_k(x) = \begin{cases} \frac{f(x) - P(x)}{(x-a)^k} & x\not=a\ 0&x=a

\end{cases}$

where, as in the statement of Taylor’s theorem, $P (x) = f (a) + f^{'} (a) (x - a) + \frac{f ^{''} ( a )}{2 !} (x - a)^{2} + \dots + \frac{f ^{(k)} ( a )}{k !} (x - a)^{k}$

It is sufficient to show that

$lim_{x \to a} h_{k} (x) = 0$ . The proof here is based on repeated application of L’Hôpital’s rule.

Note that, for each $j = 0, 1, ..., k - 1, f^{(j)} (a) = P^{(j)} (a)$ .

Hence each of the first $k - 1$ derivatives of the numerator in $h_{k} (x)$ vanishes at $x = a$ , and the same is true of the denominator. Also, since the condition that the function $f$ be $k$ times differentiable at a point requires differentiability up to order $k - 1$ in a neighborhood of said point (this is true, because differentiability requires a function to be defined in a whole neighborhood of a point), the numerator and its $k - 2$ derivatives are differentiable in a neighborhood of $a$ . Clearly, the denominator also satisfies said condition, and additionally, doesn’t vanish unless $x = a$ , therefore all conditions necessary for L’Hopital’s rule are fulfilled, and its use is justified. So

$\begin{align} \lim_{x\to a} \frac{f(x) - P(x)}{(x-a)^k} &= \lim_{x\to a} \frac{\frac{d}{dx}(f(x) - P(x))}{\frac{d}{dx}(x-a)^k} = \cdots = \lim_{x\to a} \frac{\frac{d^{k-1}}{dx^{k-1}}(f(x) - P(x))}{\frac{d^{k-1}}{dx^{k-1}}(x-a)^k}\\ &=\frac{1}{k!}\lim_{x\to a} \frac{f^{(k-1)}(x) - P^{(k-1)}(x)}{x-a}\\ &=\frac{1}{k!}(f^{(k)}(a) - f^{(k)}(a)) = 0 \end{align}$

where the second to last equality follows by the definition of the derivative at $x = a$ .

This is the best proof I’ve seen:

https://arxiv.org/abs/0801.1271v2

It’s all about smoothness of the functions.

A continuous function is such that it can be accurately approximated by a constant in the neighborhood of a point:

f (x) = f (x_{0}) + r (x; x_{0})

where $r$ is a “remainder” function, which tends to zero at $x_{0}$ .

A smooth function is such that it is differentiable, and its derivatives are continuous. (The more derivatives, the smoother.) For the sake of the example, consider the third order:

f^{'''} (x) = f^{'''} (x_{0}) + r_{3} (x; x_{0}) .

Then integrating from $x_{0}$ to $x$ three times,

f^{''} (x) = f^{''} (x_{0}) + (x - x_{0}) f^{'''} (x_{0}) + r_{2} (x; x_{0}),

f^{'} (x) = f^{'} (x_{0}) + (x - x_{0}) f^{''} (x_{0}) + \frac{( x - x _{0} ) ^{2}}{2} f^{'''} (x_{0}) + r_{1} (x; x_{0}),

f (x) = f (x_{0}) + (x - x_{0}) f^{'} (x_{0}) + \frac{( x - x _{0} ) ^{2}}{2} f^{''} (x_{0}) + \frac{( x - x _{0} ) ^{3}}{3 !} f^{'''} (x_{0}) + r_{0} (x; x_{0}) .

In the above, the remainders are antiderivatives of each other, and one can show that they belong to $o ((x - x_{0})^{k})$ .

No integration-by-parts, no l’Hôpital, some telescoping

By the product rule and the chain rule, if the nominated derivatives exist,

\label g r p e 1 \frac{d}{d ξ} (k = 1 \sum n f^{(k)} (ξ) \frac{( x - ξ ) ^{k}}{k !}) = k = 1 \sum n (f^{(k + 1)} (ξ) \frac{( x - ξ ) ^{k}}{k !} - f^{(k)} (ξ) \frac{( x - ξ ) ^{k - 1}}{( k - 1 )!}) . (1)

In the sum on the right, the left-hand term for $k = 1, \dots, n - 1$ cancels with the right-hand term for $k = 2, \dots, n$ respectively, unless $n = 1$ , in which case there is only one pair of terms and no cancellation. In either case, the only remaining terms are the left-hand term for $k = n$ and the right-hand term for $k = 1$ . Solving for the latter gives

\label g r p e 2 f^{'} (ξ) = - \frac{d}{d ξ} (k = 1 \sum n f^{(k)} (ξ) \frac{( x - ξ ) ^{k}}{k !}) + f^{(n + 1)} (ξ) \frac{( x - ξ ) ^{n}}{n !} . (2)

Then, integrating w.r.t. $ξ$ from $a$ to $x$ and adding $f (a)$ to both sides, we have QED:

\label g r p e 3 f (x) = f (a) + k = 1 \sum n f^{(k)} (a) \frac{( x - a ) ^{k}}{k !} + \int_{a}^{x} f^{(n + 1)} (ξ) \frac{( x - ξ ) ^{n}}{n !} d ξ . (3)

This is Taylor’s theorem with the remainder term (red) in integral form. (The indexed summation on the right, highlighted in blue, is the inspiration for the summation on the left of ( $\ref g r p e 1$ ), which yields a telescoping series when differentiated w.r.t. the variable with the minus sign in front; compare the answer by user123641.)

In ( $\ref g r p e 3$ ), the term $f (a)$ may be taken under the $\sum$ sign simply by extending the range of $k$ down to $0$ :

(4) f (x) = \sum k = 0 n f (k) (a) (x - a) kk! + \int a x f (n + 1) (ξ) (x - ξ) nn! d ξ .

This result has been obtained for $n \geq 1$ , and if $n = 0$ it reduces to the Fundamental Theorem of Calculus; thus it is established for $n \geq 0$ .

If $f^{(n + 1)} (ξ)$ is continuous on the interval of integration, the remainder term in ( $\ref g r p e 3$ ) or ( $4$ ) may be converted from integral form to Lagrange form as follows (cf. Venkata Karthik Bandaru’s answer). Because the factor $(x - ξ) n / n!$ does not change sign on the interval, the integral is between the values that it takes if we replace the factor $f (n + 1) (ξ)$ by its minimum and maximum on the interval (where between is interpreted inclusively). The minimum or maximum may then be taken outside the integration, and the remaining integral evaluated as

(5) (x - a) n + 1 (n + 1)! .

Thus the remainder is between

(6) (min f (n + 1) (ξ)) (x - a) n + 1 (n + 1)!

and

(7) (ma x f (n + 1) (ξ)) (x - a) n + 1 (n + 1)!

(inclusive), where the minimum or maximum is taken over the interval of integration, and the other factor is independent of $ξ$ . Hence, by the continuity, there exists a real $μ$ on that interval such that the remainder is exactly

(8) f (n + 1) (μ) (x - a) n + 1 (n + 1)! .

The following isn’t a rigorous proof, but I think it’s “aesthetic”, and “rise[s] naturally from the ground”, as the original question asked for.

In searching for intuition for Taylor Series, I’ve developed a perspective involving Pascal’s Triangle, which arises from recursively applied Riemann Sum approximations to the function.

I found @Bob Pego’s answer really helpful and it’s how I started developing this.

The end result involves coefficients based on rows of Pascal’s Triangle, and the sequence of approximations (sequence of rows) looks like this

“Pascal” approximations for sin(x)

And they’re much less efficient approximations than plain finite Taylor polynomial

Taylor approximations for sin(x)

I’ll explain the derivation, but the essence of it is that the recursive Riemann Sum procedure produces binomial coefficients — rows of Pascal’s Triangle — which are also simplex numbers. Simplex numbers converge to factorial fractions of hypercubes. The nth triangle number approaches $n^{2} / (2!)$ , the nth tetrahedral number approaches $n^{3} / (3!)$ , and so on.

A regular Riemann Sum approximation of f(x) of “resolution” 4 would be

f (x) \approx f (0) + f^{'} (0) \cdot \frac{x}{4} + f^{'} (x /4) \cdot \frac{x}{4} + f^{'} (2 \cdot x /4) \cdot \frac{x}{4} + f^{'} (3 \cdot x /4) \cdot \frac{x}{4} = f (0) + \frac{x}{4} \cdot (f^{'} (0) + f^{'} (x /4) + f^{'} (2 \cdot x /4) + f^{'} (3 \cdot x /4))

After each discrete step, we update the slope by setting it to the true slope of the function — what the 1st derivative is at that point we’ve stepped to along x. This is the idea of a Riemann Sum.

But since we’re interested in Taylor Series (about 0) here, let’s pretend that we can’t update to $f^{'} (x /4)$ directly, and can only use the values of all derivatives evaluated at 0, not at $x /4$ or anywhere else.

So instead of updating to the actual slope, we’ll use a recursive approximation to get an approximate slope update. We can now recurse and approximate each of the terms that have a non-0 x value. For example,

f^{'} (3 x /4) \approx f^{'} (0) + \frac{x}{4} \cdot (f^{(2)} (0) + f^{(2)} (x /4) + f^{(2)} (2 \cdot x /4))

There are still some terms with $f$ evaluated elsewhere than 0, so we recursively approximate terms until all terms are derivatives of $f$ evaluated at 0.

For resolution 4, you’ll end up with

f (x) \approx 1 \cdot (\frac{x}{4})^{0} \cdot f (0) + 4 \cdot (\frac{x}{4})^{1} \cdot f^{'} (0) + 6 \cdot (\frac{x}{4})^{2} \cdot f^{(2)} (0) + 4 \cdot (\frac{x}{4})^{3} \cdot f^{(3)} (0) + 1 \cdot (\frac{x}{4})^{4} \cdot f^{(4)} (0)

Note the appearance of the Pascal row $1, 4, 6, 4, 1$ .

In general, for resolution n, that will be

f (x) \approx k = 0 \sum n (k n) \frac{x ^{k}}{n ^{k}} f^{(k)} (0)

But I prefer to focus on the simplex perspective. Equivalently, that’s

f (x) \approx f (0) + \frac{na t u r a l _{n}}{n} f^{'} (0) x + \frac{t r ian g _{n - 1}}{n ^{2}} f^{(2)} (0) x^{2} + \frac{t e t r a _{n - 2}}{n ^{3}} f^{(3)} (0) x^{3} + \frac{p e n t a _{n - 3}}{n ^{4}} f^{(4)} (0) x^{4} + ...

Where e.g. $p e n t a_{n - 3}$ is the $(n - 3) t h$ pentatope number, like if we index the simplex numbers from 1 to infinity. A few examples:

t r ian g_{1} = (2 ( 2 - 1 ) + 1) = 1, t r ian g_{4} = (2 ( 2 - 1 ) + 4) = 10

\color{blue}{tetra}_\color{red}{2} = {(\color{blue}{3} - 1) + \color{red}{2} \choose \color{blue}{3}} = 4, \color{blue}{tetra}_\color{red}{5} = {(\color{blue}{3} - 1) + \color{red}{5} \choose \color{blue}{3}} = 35

etc.

Check the Pascal’s Triangle wikipedia page if you’re not following that.

Simplex numbers approaching factorial fractions of hypercubes

\frac{t e t r a _{n - 2}}{n ^{3}} = \frac{( 3 ( 3 - 1 ) + ( n - 2 ) )}{n ^{3}} = \frac{( 3 n )}{n ^{3}} = \frac{\frac{n ( n - 1 ) ( n - 2 )}{3 !}}{n ^{3}} = \frac{n ( n - 1 ) ( n - 2 )}{n ^{3}} \cdot \frac{1}{3 !}

and

n \to \infty lim \frac{n ( n - 1 ) ( n - 2 )}{n ^{3}} \cdot \frac{1}{3 !} = \frac{1}{3 !}

Taking $n$ to $\infty$ corresponds to increasing the “resolution” of your Riemann Sum, and approaching continuous integration, thus approaching the Taylor Series.

Just like this “triangle”

Is a low resolution of an actual right isosceles triangle polygon

This may seem really roundabout given the concise alternative of the $(k n)$ binomial coefficient notation, but I think simplexes are a nice way to visualize the “lagged” effect of higher order derivatives. If you begin traveling with constant acceleration of 1, then after 1 unit of time, your displacement will be the area of the right triangle in a unit square, $1/2! = 1/2$ . If you begin traveling with a constant jerk of 1, then after 1 unit of time, your displacement will be the area of a tetrahedron in the corner of a unit cube, $1/3!$ = $1/6$ .

There is also a natural and well-known proof using integration by parts.

Let $f : I \to R$ be a $C^{n}$ function on open interval $I$ , and $a, b \in I$ . The goal is to relate $f (b)$ to $f (a)$ and $f^{(j)} (a)$ s.

$f (b) = f (a) + \int_{a}^{b} f^{'} (t) d t$

Using integration by parts on $\int_{a}^{b} f^{'} (t) d t$ will make higher derivative terms appear.
One thought is to write $\int_{a}^{b} f^{'} (t) d t = f^{'} (t) t_{a}^{b} - \int_{a}^{b} f^{''} (t) t d t$ , but $f^{'} (b)$ appears here.

To avoid this, we can instead do $\int_{a}^{b} f^{'} (t) d t = f^{'} (t) (t - b)_{a}^{b} - \int_{a}^{b} f^{''} (t) (t - b) d t$ .
So continuing this way,
$\scriptstyle{\begin{align} f(b) &= f(a) + \int_{a}^{b} f'(t) dt \\ &= f(a) + f'(t) (t-b) \bigr|_{a}^{b} - \int_{a}^{b} f^{(2)}(t) (t-b) dt \\ &= f(a) + f'(a) (b-a) - \left(f^{(2)}(t) \frac{(t-b)^2}{2} \Bigr|_{a}^{b} - \int_{a}^{b} f^{(3)}(t) \frac{(t-b)^2}{2} dt \right) \\ &= f(a) + f'(a) (b-a) + \frac{f^{(2)}(a)}{2} (b-a)^2 + \int_{a}^{b} f^{(3)}(t) \frac{(t-b)^2}{2} dt \\ &\vdots \\ &= f(a) + f'(a) (b-a) + \frac{f^{(2)}(a)}{2!} (b-a)^2 + \ldots + \frac{f^{(n-1)}(a)}{(n-1)!} (b-a)^{n-1} + (-1)^{n-1} \int_{a}^{b} f^{(n)}(t) \frac{(t-b)^{n-1}}{(n-1)!}dt, \end{align}}%$

the remainder term being $\int_{a}^{b} f^{(n)} (t) \frac{( b - t ) ^{n - 1}}{( n - 1 )!} d t$

Like in Bob Pego’s answer, this can be expressed as $\frac{f ^{(n)} ( c )}{n !} (b - a)^{n}$ where $c \in [a, b]$ :
For convenience say $a < b .$ Now remainder $R_{n} = \int_{a}^{b} f^{(n)} (t) \frac{( b - t ) ^{n - 1}}{( n - 1 )!} d t$ is between $(min_{[a, b]} f^{(n)}) \int_{a}^{b} \frac{( b - t ) ^{n - 1}}{( n - 1 )!} d t$ and $(max_{[a, b]} f^{(n)}) \int_{a}^{b} \frac{( b - t ) ^{n - 1}}{( n - 1 )!} d t .$ That is, $\frac{R _{n}}{( b - a ) ^{n} / n !}$ is between $(min_{[a, b]} f^{(n)})$ and $(max_{[a, b]} f^{(n)}) .$ Hence $\frac{R _{n}}{( b - a ) ^{n} / n !}$ is $f^{(n)} (c)$ for some $c \in [a, b],$ as needed.

Here a nice summary and proof from Stewart’s Calculus:

http://www.stewartcalculus.com/data/CALCULUS%20Early%20Transcendentals/upfiles/Formulas4RemainderTaylorSeries5ET.pdf

Regarding the initial answer to the posted question (which is as straightforward of an approach to a proof of Taylor’s Theorem as possible), I find the following the easiest way to explain how the last term on the RHS of the equation (the nested integrals) approaches 0 as the number of iterations (n) becomes arbitrarily large:

There are two cases - (1) f(x) is finitely differentiable or (2) f(x) is infinitely differentiable.

(1) if f(x) is finitely differentiable, then there exists a value of n s.t. for all derivatives of order n+1 or greater, the derivatives are 0, thus resulting in a nested integral with an innermost integral equal to 0, thus rendering the collective nested integral equal to 0, and thus giving us the aforementioned Taylor Polynomial of finite order n with no remainder.

(2) if f(x) is infinitely differentiable, then, as the number of iterations (n) approaches infinity, because we require by definition of the nested integrals that a < t_n < t_n-1 < t_n-2 <… < t_2 < t_1 < x, we see that t_n → a as n → infinity. As a result, we have (as is true in case (1)), that the innermost integral of the collective nested integral approaches 0, thus giving us a remainder term of 0 in the limit, and hence resulting in the infinite series expression for the Taylor Series of the function, f(x).

Authors of most books will not be so kind to illustrate a proof in this manner, though. It’s upsetting, I know.

First, we have:

Δ_{0} \equiv \int_{0}^{1} F^{'} (1 - y) d y = [- F (1 - y)]_{0}^{1} = F (1) - F (0), Δ_{1} \equiv \int_{0}^{1} y F^{''} (1 - y) d y = [- y F^{'} (1 - y)]_{0}^{1} + \int_{0}^{1} F^{'} (1 - y) d y = Δ_{0} - F^{'} (0), Δ_{2} \equiv \int_{0}^{1} \frac{y ^{2}}{2 !} F^{'''} (1 - y) d y = [- \frac{y ^{2}}{2 !} F^{''} (1 - y)]_{0}^{1} + \int_{0}^{1} y F^{''} (1 - y) d y = Δ_{1} - \frac{F ^{''} ( 0 )}{2 !}, Δ_{3} \equiv \int_{0}^{1} \frac{y ^{3}}{3 !} F^{(4)} (1 - y) d y = [- \frac{y ^{3}}{3 !} F^{'''} (1 - y)]_{0}^{1} + \int_{0}^{1} \frac{y ^{2}}{2 !} F^{'''} (1 - y) d y = Δ_{2} - \frac{F ^{'''} ( 0 )}{3 !}, Δ_{4} \equiv \int_{0}^{1} \frac{y ^{4}}{4 !} F^{(5)} (1 - y) d y = [- \frac{y ^{4}}{4 !} F^{(4)} (1 - y)]_{0}^{1} + \int_{0}^{1} \frac{y ^{3}}{3 !} F^{(4)} (1 - y) d y = Δ_{3} - \frac{F ^{(4)} ( 0 )}{4 !}, ⋮

Second, from this follows:

Δ_{0} = F (1) - F (0), Δ_{1} = F (1) - F (0) - F^{'} (0), Δ_{2} = F (1) - F (0) - F^{'} (0) - \frac{F ^{''} ( 0 )}{2 !}, Δ_{3} = F (1) - F (0) - F^{'} (0) - \frac{F ^{''} ( 0 )}{2 !} - \frac{F ^{'''} ( 0 )}{3 !}, Δ_{4} = F (1) - F (0) - F^{'} (0) - \frac{F ^{''} ( 0 )}{2 !} - \frac{F ^{'''} ( 0 )}{3 !} - \frac{F ^{(4)} ( 0 )}{4 !}, ⋮

In general, arguing by induction, it also follows that, for $N > 4$ :

Δ_{N} = F (1) - F (0) - F^{'} (0) - \frac{F ^{''} ( 0 )}{2 !} - \frac{F ^{'''} ( 0 )}{3 !} - \frac{F ^{(4)} ( 0 )}{4 !} - \dots - \frac{F ^{(N)} ( 0 )}{N !} .

Thus,

F (1) = 0 \leq n \leq N \sum \frac{F ^{(n)} ( 0 )}{n !} + Δ_{N} = 0 \leq n \leq N \sum \frac{F ^{(n)} ( 0 )}{n !} + \int_{0}^{1} \frac{y ^{N}}{N !} F^{(N + 1)} (y) d y .

Third, defining

F (y) = f (a + y (b - a))

it follows that

F^{(n)} (y) = (b - a)^{n} f^{(n)} (a + y (b - a)) .

Thus, upon substitution:

f (b) = 0 \leq n \leq N \sum \frac{( b - a ) ^{n}}{n !} f^{(n)} (a) + (b - a)^{N + 1} \int_{0}^{1} \frac{y ^{N}}{N !} f^{(N + 1)} (a + y (b - a)) d y .

This requires a suitable assumption to be made on $F^{(N + 1)} (y)$ over $y \in [0, 1]$ , and thus on $f^{(N + 1)} (x)$ as $x$ ranges between $a$ and $b$ - not the least of which being that $f^{(N + 1)}$ exists and be continuous over that range.

This states that if $f^{'} (x)$ has the required property for $x$ between $a$ and $b$ , then $f (b) - f (a)$ has $(b - a)$ as a factor; if $f^{''} (x)$ also has that property, then $f (b) - f (a) - (b - a) f^{'} (a)$ has $(b - a)^{2}$ as a factor; and so on; with the multiple of the factor being continuous functions. Then the $N = 0$ case generalizes the theorem that $P (b) - P (a)$ has $(b - a)$ as a factor, for polynomials $P (x)$ in $x$ .

This also applies to multiple points. You can try and write out the expressions for:

f (c) = f (a) (\dots) + f^{'} (a) (\dots) + \dots + f^{(M)} (a) (\dots) + f (b) (\dots) + f^{'} (b) (\dots) + \dots + f^{(N)} (b) (\dots) + (c - a)^{M + 1} (c - b)^{N + 1} \int_{0}^{1} \int_{0}^{1} \int_{0}^{1} Φ (u, v, w) f^{(M + N + 2)} (u a + v b + w c) δ (u + v + w - 1) d u d v d w,

for two points, for instance and try to figure out what the weighting function $Φ (u, v, w)$ should be. The relevant extension of Taylor’s Theorem to multiple points has no name that I am aware of; but it reflects the correct use of Taylor’s Theorem - which is curve-sculpting, a.k.a. smooth-interpolation.

Example:
For $f (x) = e^{x}$ , since $f (0) = 1 = f^{'} (0)$ and $f (1) = e = f^{'} (1)$ , then

e^{x} = f (x) = (x - 1)^{2} (3 x + 1) + e x^{2} (2 - x) + (x (x - 1))^{2} (\dots) .

The remainder term is actually quite small over $x \in [0, 1]$ - and even outside the interval; but also:

e^{n + x} = e^{n} (x - 1)^{2} (3 x + 1) + e^{n + 1} x^{2} (2 - x) + (x (x - 1))^{2} (\dots),

over $x \in [0, 1]$ , for $n \in {\dots, - 2,, - 1, 0, + 1, + 2, \dots}$ .

Based on Edwards’ approach in “Advanced Calculus of Several Variables”, given here because it’s a different flavor from what has been shown.

Given $a \in R$ and a differentiable $f$ (enough times), we want to express $f$ at, say $x > a$ , by a Taylor Polynomial $P_{n}$ of order $n$ in $x - a$ plus another monomial of order $n + 1$ in $x - a$ :

f (x) = k = 0 \sum n \frac{f ^{(k)} ( a )}{k !} (x - a)^{k} + \frac{f ^{(n + 1)} ( ξ )}{( n + 1 )!} (x - a)^{n + 1} (1)

Our $P_{n}$ does not depend on the nearby point $x$ , but the $n + 1^{\mbox t h}$ power’s coefficient does, being implicitly determined by $x$ and by $f$ , and since we’ll need to consider functions on the whole interval $[a, x]$ , we shall use $b$ instead of $x$ , to indicate that the end-point is arbitrary but fixed, and let $x$ denote the free variable.

So, once $a$ and $b$ are given, we start by setting $c$ so that

f (b) = P_{n} (b - a) + c (b - a)^{n + 1} (2)

i.e. we let $c = c (b, f)$ be a constant determined by both $f$ and $b$ , and stress it here that this is the final result, aside from the important fact that $c$ is not set in its proper, final form yet.

[ we could indicate that $c = (f (b) - P_{n} (b - a)) / (b - a)^{n + 1}$ but this involves $P_{n}$ , so it’s not what we want and hence we don’t mention it ]

We notice that $x \mapsto f (x)$ and $x \mapsto P_{n} (x - a)$ both have the same value and first $n$ derivatives at $x = a$ , and so their difference $x \mapsto f (x) - P_{n} (x - a)$ and the first $n$ derivatives of it vanish at $a$ . Let’s call this the central property of $P_{n}$ , which we’ll make use of repeatedly below.

Then, we see that having (2) satisfied can also be written as

g (x) = f (x) - P_{n} (x - a) - c (x - a)^{n + 1}

being zero at $x = b$ , i.e. as $g (b) = 0$ . As observed in brackets above, the equation $g (b) = 0$ does not provide a convenient expression for $c$ , but it allows us to use Rolle to get a point $u_{1} \in (a, b)$ where $g^{'} (x) = 0$ , since $g (a) = 0$ is obvious (central property plus clearly zero term).

This may not feel very helpful, until we write

g^{'} (x) = f^{'} (x) - P_{n}^{'} (x - a) - c (n + 1) (x - a)^{n}

and notice that the undesired term $P_{n}^{'} (x - a)$ , when the above is evaluated at $u_{1}$ , while it still does not vanish, it no longer has degree $n$ in $x - a$ but $n - 1$ . So this equation is still unusable to determine $c$ , but the unwanted term left from $P_{n}$ looks one degree better!

Now add the fact that $f^{'} (x) - P_{n}^{'} (x - a)$ vanishes at $a$ (the central property) and $c (n + 1) (x - a)^{n}$ also vanishes because it still has a positive power of $(x - a)$ present. So, the derivative $g^{'}$ vanishes at both $a$ and $u_{1}$ and we can apply Rolle again on $[a, u_{1}]$ , to $g^{'}$ , and reduce the polynomial order of what’s left from the undesired term $P_{n} (x - a)$ one more time… and again, hopefully repeatedly, until this undesired $P_{n}$ term is gone and only $c$ and a derivative of $f$ remains.

We perform a few steps explicitly, to illustrate how the above insinuation works, and close with the final expression.

The second Rolle step, as laid out above, gives

g^{''} (x) = f^{''} (x) - P_{n}^{''} (x - a) - c (n + 1) n (x - a)^{n - 1}

vanishing at some $u_{2} \in [a, u_{1}]$ , aside from $a$ , and the third Rolle step gives

g^{(3)} (x) = f^{(3)} (x) - P_{n}^{(3)} (x - a) - c (n + 1) n (n - 1) (x - a)^{n - 2}

vanishing at some $u_{3} \in [a, u_{2}]$ , aside from $a$ .

When the $n + 1^{\mbox t h}$ derivative is taken, nothing is left from $P_{n}$ and only a constant from the 3rd term, allowing $c$ to be determined from the last Rolle application, i.e.

g^{(n + 1)} (x) = f^{(n + 1)} (x) - 0 - c (n + 1)!

will vanish at some $ξ \in [a, u_{n}]$ , where

a < ξ < u_{n} < ... < u_{1} < b

which gives the desired expression for $c$ , which no longer contains $P_{n}$ :

c = \frac{f ^{(n + 1)} ( ξ )}{( n + 1 )!}

and putting this back in (2) we get the theorem in std form (1):

f (b) = k = 0 \sum n \frac{f ^{(k)} ( a )}{k !} (b - a)^{k} + \frac{f ^{(n + 1)} ( ξ )}{( n + 1 )!} (b - a)^{n + 1}

[This is from Rosenlicht’s Analysis book]

Let $U \subseteq R$ be an open interval and $f : U \to R$ be a $C^{n + 1}$ function. Let variables $x, y \in U .$

Temporarily fixing $x,$ the unique polynomial in $y$ whose derivatives at $x$ (from $0$ th derivative to $n$ th derivative) agree with the derivatives of $f$ at $x,$ is

f (x) + \frac{f ^{^{'}} ( x )}{1 !} (y - x) + \dots + \frac{f ^{(n)} ( x )}{n !} (y - x)^{n} .

So we can consider the bivariate remainder $R_{n} (x, y)$ defined by

f (y) = f (x) + \frac{f ^{^{'}} ( x )}{1 !} (y - x) + \dots + \frac{f ^{(n)} ( x )}{n !} (y - x)^{n} + R_{n} (x, y) .

Now we can fix $y$ and compare the average rates of change of $R_{n} (x, y)$ and $(y - x)^{n + 1},$ as $x$ varies from some point $x_{0} \in U$ to $y .$

By Generalised MVT, we have

\frac{R _{n} ( y , y ) - R _{n} ( x _{0} , y )}{( y - y ) ^{n + 1} - ( y - x _{0} ) ^{n + 1}} = \frac{\frac{\partial}{\partial x} R _{n} ( x , y ) _{x = c}}{\frac{\partial}{\partial x} ( y - x ) ^{n + 1} _{x = c}}

for some $c \in [[x_{0}, y]] .$

Note the derivative

\frac{\partial}{\partial x} R_{n} (x, y) = \frac{\partial}{\partial x} [f (y) - k = 0 \sum n \frac{f ^{(k)} ( x )}{k !} (y - x)^{k}] = - k = 0 \sum n \frac{f ^{(k + 1)} ( x )}{k !} (y - x)^{k} + k = 1 \sum n \frac{f ^{(k)} ( x )}{( k - 1 )!} (y - x)^{k - 1} = - \frac{f ^{(n + 1)} ( x )}{n !} (y - x)^{n} .

So on simplifying we see

\frac{- ( f ( y ) - \sum _{k = 0}^{n} \frac{f ^{(k)} ( x _{0} )}{k !} ( y - x _{0} ) ^{k} )}{- ( y - x _{0} ) ^{n + 1}} = \frac{- \frac{f ^{(n + 1)} ( c )}{n !} ( y - c ) ^{n}}{- ( n + 1 ) ( y - c ) ^{n}}

that is

f (y) = k = 0 \sum n \frac{f ^{(k)} ( x _{0} )}{k !} (y - x_{0})^{k} + \frac{f ^{(n + 1)} ( c )}{( n + 1 )!} (y - x_{0})^{n + 1}

for some $c \in [[x_{0}, y]],$ as needed.

Let $f$ be infinitely differentiable (we’ll weaken this hypothesis later), on an open interval containing $[a, a + h]$ (so $h$ is $> 0$ for now, for simplicity).

Let’s try to approximate $f$ , over $[a, a + h]$ , with a polynomial of degree $\leq n$ :

f (a + t) = a_{0} + a_{1} t + \dots + a_{n} t^{n} + ε (t), for t \in [0, h]

We didn’t yet fix our approximating polynomial $a_{0} + a_{1} t + \dots + a_{n} t^{n}$ . We’ll first fix it by picking some intuitively plausible $a_{0}, \dots, a_{n}$ , and then study the resulting error function $ε (t)$ .

Fixing an approximation: Intuitively, we want our approximation $a_{0} + a_{1} t + \dots + a_{n} t^{n}$ to be such that $ε (t) = f (a + t) - (a_{0} + a_{1} t + \dots + a_{n} t^{n})$ is “as flat and close to the $0$ -function on $[0, h]$ as possible”. So we can try to make ${ε (0) = 0; ε^{(1)} (0) = 0, \dots, ε^{(n - 1)} (0) = 0; ε (h) = 0} .$ These are $n + 1$ constraints, to fix $n + 1$ coefficients.
Since $ε^{(k)} (0) = f^{(k)} (a) - k! a_{k}$ , setting $a_0 = f(a),$$a_1 = \dfrac{f^{(1)}(a)}{1!},$$\dots, a_{n-1} = \dfrac{f^{(n-1)}(a)}{(n-1)!}$ , $a_{n} = \frac{f ( a + h ) - f ( a ) - f ^{'} ( a ) h - \dots - \frac{f ^{(n - 1)} ( a )}{( n - 1 )!} h ^{n - 1}}{h ^{n}}$ would do the job.

On the resulting error function: $ε (0) = ε (h) = 0$ gives (by Rolle’s theorem) $ε^{'} (h_{1}) = 0$ for some $0 < h_{1} < h$ . Now $ε^{'} (0) = ε^{'} (h_{1}) = 0$ gives $ε^{(2)} (h_{2}) = 0$ for some $0 < h_{2} < h_{1}$ . Now $ε^{(2)} (0) = ε^{(2)} (h_{2}) = 0$ gives $\dots$ , so on. At the end, we get $ε^{(n)} (h_{n}) = 0$ for some $0 < h_{n} < h$ , that is $f^{(n)} (a + h_{n}) - n! a_{n} = 0$ for some $0 < h_{n} < h$ .

Finally, substituting the explicit value of $a_{n}$ gives

f (a + h) = f (a) + f^{'} (a) h + \dots + \frac{f ^{(n - 1)} ( a )}{( n - 1 )!} h^{n - 1} + \frac{f ^{(n)} ( a + h _{n} )}{n !} h^{n} for some 0 < h_{n} < h,

as needed.

[Looking back at the proof, we could have taken ” $f$ is $n$ times differentiable…” instead of ” $f$ is infinitely differentiable…” to begin with. Also, the same idea works for $h < 0$ too.]

🪴 Anil's Garden

Explorer

Simplest proof of Taylor's theorem

No integration-by-parts, no l’Hôpital, some telescoping

Graph View

Backlinks