Home Technology Imposing the Exponential Characteristic

Imposing the Exponential Characteristic

I uncover loads of delicate approximation methods for enforcing the exponential feature, $f(x) = e^x$, including Taylor sequence approximation, Lagrange interpolation, Chebyshev interpolation, Carathéodory-Fejer approximation and MiniMax approximation. This also serves as a extra overall introduction to the utilization of these methods for approximating diversified functions. At some level of I stroll thru the connected thought for every scheme and be conscious numerical diagnosis to navigate diversified forms of error. I also consist of my hold Python and C++ implementations of every algorithm in double precision

1: Demonstrate that the implementations integrated right here will both output and calculate in double precision. We are able to restful be conscious delicate methods on this surroundings, but as a overall rule you’ll need elevated intermediate precision to assemble an approximation which is indistinguishable from the goal price in double precision. Right here is one (but no longer the handiest) cause it’s in overall recommended to consume an optimized exp(x) feature equipped by your platform in exclaim of write one from scratch.

with heavy commentary. In the break, I analyze every implementation’s efficiency and accuracy characteristics the consume of the hyper-optimized, platform-equipped exp(x) feature as a benchmark.

2: In most cases, whichever numerical library you’re the consume of will map to the platform-equipped exp(x) feature under the hood. As an instance, right here’s what numpy.exp does under the hood.

The Background fragment opens with the motivating properties of $e^x$ and the basics of floating level systems. The code for every methodology is integrated under the corresponding Implementation heading.

Background

Taylor sequence

To encourage the definition of $e$, we can want some calculus. Let $f$ be a feature which is infinitely differentiable at some true price $a$. Right here is to express that we can again and again differentiate $f$: for every $okay^{textual convey material{th}}$ spinoff $f^{(okay)}$, we can differentiate $f^{(okay)}$ again to assemble the $(okay + 1)^{textual convey material{th}}$ spinoff $f^{(okay + 1)}$. Any feature with this property would possibly also be uniquely represented as a Taylor sequence expansion “centered” at $a$. The Taylor sequence of $f$ centered at $a$ and evaluated at $x$ is defined to be

$$
f(x) = frac{f(a)}{0!}(x – a)^0 + frac{f’(a)}{1!}(x – a)^1 + ldots + frac{f^{(okay)}(a)}{okay!}(x – a)^okay + ldots
$$

Right here is a vitality sequence, or limitless polynomial with one variable. The guts of expansion determines a neighborhood of values returned by the Taylor sequence, and the coefficient of every Taylor term depends on again and again differentiating the feature $f$ and evaluating it at $a$. A popular heart of expansion is $a$ = 0, by which case the Taylor sequence is on the complete is known as a Maclaurin sequence and the sequence is centered across the origin. This will be regarded as the “default” surroundings. While you happen to nick off all phrases of the Taylor expansion after some term $okay$, you assemble a polynomial with degree $okay$. The coefficient of the $okay^{textual convey material{th}}$ term of the sequence (or polynomial, if truncated) is given by

$$
a_{okay} = frac{f^{(okay)}(a)}{okay!}
$$

where 0! = 1.

For a concrete instance, elevate into fable the Taylor sequence expansion of the sine feature. The sine feature is no longer handiest infinitely differentiable, but cyclic.

$$
open{aligned}
sin^{(1)}(x) &= cos(x), \
sin^{(2)}(x) &= cos^{(1)}(x) = -sin(x), \
sin^{(3)}(x) &= -sin^{(1)}(x) = -cos(x), \
sin^{(4)}(x) &= -cos^{(1)}(x) = sin(x)
discontinue{aligned}
$$

We resolve every $okay^{textual convey material{th}}$ coefficient of the Taylor sequence by evaluating $f^{(okay)}$ at $a$ and dividing it by the factorial of $okay$. If we desire to elongate the sine feature across the origin ($a$ = 0), we assemble the cyclic coefficients

$$
open{aligned}
sin(0) &= 0, \
sin^{(1)}(0) &= cos(0) = 1, \
sin^{(2)}(0) &= cos^{(1)}(0) = -sin(0) = 0, \
sin^{(3)}(0) &= -sin^{(1)}(0) = -cos(0) = -1, \
sin^{(4)}(0) &= -cos^{(1)}(0) = sin(0) = 0
discontinue{aligned}
$$

Since $(x – 0)^{okay} = x^{okay}$, we maintain got the Taylor sequence expansion

$$
sin(x) = x – frac{x^3}{3!} + frac{x^5}{5!} – frac{x^7}{7!} + frac{x^9}{9!} – ldots
$$

Truncating the Taylor expansion of a feature $f$ at any term $okay$ offers a finite approximation of $f$ the consume of the $okay$ degree Taylor polynomial. A Taylor polynomial of $f$ centered at $a$ produces very goal approximations of $f(x)$ when $x$ is moderately shut to $a$. As absolutely the price of $x$ will increase away from $a$, the accuracy of the Taylor polynomial all of sudden decreases, which manner it requires extra phrases of the Taylor sequence (i.e. a elevated degree polynomial) for goal approximation. Grasp into consideration the following situation, which shows the values of $sin(x)$ over the interval $[-20, 20]$ when put next to its Taylor polynomials of degree 1, 3, 5, 7 and 9 centered at $a$ = 0.

$T_{n}$ denotes the degree $n$ Taylor polynomial of $sin$

Ogle that the Taylor approximation of $sin(x)$ is extra goal when $x$ is advance $a$ = 0, but quickly flies away from the goal price of $sin(x)$ additional away from 0. The degree 1 Taylor polynomial is handiest an goal approximation for $sin(x)$ for a in actuality cramped interval advance the origin, whereas the degree 9 Taylor polynomial is very goal within $[-5, 5]$. How long the approximation holds except it turns into extremely unsuitable depends on the sequence of phrases of the Taylor polynomial. A elevated degree polynomial will retain a better approximation of $sin(x)$ for longer, but any finite polynomial will within the break change into extremely unsuitable.

Defining $e$

The mathematical fixed $e$ is (nearly) entirely motivated by the very good properties it shows under exponentiation. In specific, the definition of $e$ used to be born out of the must acquire a continuous feature which is its hold spinoff and which maps the additive identity 0 to the multiplicative identity 1. Right here is because of fixing delicate integration and differentiation considerations is vastly extra expedient with any such feature. By extension a fundamental piece of considerations in utilized arithmetic and physics nick to fixing differential equations, for which any such feature is key.

As it happens, $f(x) = e^x$ uniquely satisfies this property. We are able to jabber this, and present an explanation for $e$ without prolong within the course of, by starting from the Taylor sequence illustration of an arbitrary feature $f$ infinitely differentiable at $a$ = 0. Order $a_0, a_1, ldots$ are the coefficients of the Taylor sequence of $f$ centered at $a$. Then we maintain got the Taylor sequence

$$
f(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + ldots
$$

It follows from the linearity of differentiation that the Taylor sequence expansion of the fundamental spinoff $f’$ is

$$
f’(x) = a_1 + 2a_2 x + 3a_3 x^2 + ldots
$$

To resolve a feature which is its hold spinoff, we resolve for the coefficients $a_0, a_1, ldots$ which satisfy $f = f’$:

$$
a_0 + a_1 x + a_2 x^2 + ldots = a_1 + 2a_2 x + 3a_3 x^2 + ldots
$$

From right here we can look the sample

$$
open{aligned}
a_0 &= a_1, \
a_1 &= 2a_2, \
a_2 &= 3a_3
discontinue{aligned}
$$

etc, which is goal like

$$
open{aligned}
a_1 &= a_0, \
a_2 &= frac{a_1}{2}, \
a_3 &= frac{a_2}{3}
discontinue{aligned}
$$

By induction we maintain got a recurrence relation which defines the $okay^{textual convey material{th}}$ coefficient $a_k$

$$
a_k = frac{a_{okay – 1}}{okay}
$$

Given $a_0$ = 1, we uncover that the Taylor sequence of a feature which is its hold spinoff is

$$
f(x) = 1 + x + frac{x^2}{2!} + frac{x^3}{3!} + ldots.
$$

We denote this selection with $e^x$, where $e$ is defined to be the price of this selection at $x$ = 1.

$$
e = f(1) = sum_{okay = 0}^{infty}frac{1}{okay!} = 2.7183ldots
$$

A extra intuitive illustration of why $e^x$ is particular is given by the following graph, by which exponential functions with diversified bases are plotted alongside their derivatives. An exponential feature with a foul no longer as a lot as $e$, love $b$ = 2, grows extra quickly than its spinoff. But when the dreadful is elevated than $e$, love $b$ = 4, it grows much less quickly than its spinoff.

When $b e$, we maintain got $f'(x) > f(x)$. But $b = e$ is the “goldilocks” dreadful at which $f'(x) = f(x)$.

Floating level

There may possibly be an intrinsic tension in that we desire to uncover goal values of $e^x$ with out doing too considerable work. Before we can elevate into fable the effectivity of an algorithm, we must elevate into fable its accuracy. This leads us to clarify a diversity of forms of error, a truly noteworthy of which comes from the methodology we approximate true numbers. It’s on the complete impossible to calculate the true price of $f(x)$ for an arbitrary feature $f$, because of computer systems can’t work with arbitrary true numbers.

3: Virtually about all true numbers are no longer computable. The reals that are computable are continuously no longer exactly representable to a handsome degree of accuracy because of they’re both irrational (and therefore maintain limitless decimal expansions) or rational with very long decimal expansions.

Essentially the most attention-grabbing we can derive is approximate the price to some acceptable accuracy.

The IEEE 754 floating level popular discretizes true intervals into a computable develop by mapping all within attain true values in given neighborhoods to a single rounded price. Internally, an IEEE 754 binary floating level number $N$ is represented the consume of the normalized develop

$$
N = pm b_1.b_2b_3 ldots b_p cases 2^{E_{okay}}
$$

where the fundamental bit is dispensed for the imprint (the imprint bit), the $p$ bits $b_1.b_2b_3 ldots b_p$ comprise the mantissa, or significand, and $E_{okay}$ is an integer exponent consisting of $okay$ bits. Demonstrate that since this develop is normalized, $b_1 = 1$, while every of $b_2, ldots b_p$ would possibly moreover equal 0 or 1. IEEE 754 single precision binary floating level numbers maintain a complete dimension of $32$ bits: $8$ are dispensed for the exponent $E in [-126, 127]$ and $23$ are dispensed for the mantissa (with $p$ = 24 accounting for the normalized bit). Thus you may possibly perhaps possibly moreover signify $2^{32}$ diversified values in single precision floating level, with underflow and overflow limits of $2^{127} approx 3.4 cases 10^{38}$ and $2^{-126} approx 1.2 cases 10^{-38}$, respectively. Likewise IEEE 754 double precision floating level values maintain a complete dimension of $64$ bits: $11$ bits are dispensed for the exponent $E in [-1022, 1024]$ and $52$ bits are dispensed for the mantissa (with $p = 53$). You may possibly perhaps possibly signify $2^{64}$ determined values in double precision, with underflow and overflow limits of $2^{1024} approx 1.8 cases 10^{308}$ and $2^{-1022} approx 2.2 cases 10^{-308}$, respectively.

Floating level values are no longer evenly dispensed. Right here is illustrated by the following density situation, which graphs all 16-bit floating level values in opposition to their dreadful 2 exponents.

Brighter colour indicates a elevated density of values, whereas darker colour indicates a comparative sparsity of values.

We are able to appear the values are moderately densely clustered advance $0$ and extra and extra sparse the additional you transfer away from the origin. As one other instance, half of all $32$-bit floating level numbers stay within the true interval $[-1, 1]$. Meanwhile the smallest and supreme $32$-bit floating level numbers are $-3.4 cases 10^{38}$ and $3.4 cases 10^{38}$, respectively. More in overall, in every interval $[2^n, 2^{n + 1}]$, the readily available within the market floating level values are dispensed with a spacing of $2^{n – p}$ between them, where $p$ is the sequence of mantissa bits for the precision under consideration. As you transfer farther away from the origin, it turns into extra likely that your calculations will bump into true values in between the readily available within the market floating level values, that shall be rounded to the nearest readily available within the market price as an different.

For any binary floating level machine, we can earn the most precision readily available within the market in that machine the consume of the sequence of bits $p$ readily available within the market within the mantissa. Given the uneven distribution of floating level values, it follows that the readily available within the market precision decreases as you transfer away from the origin (this contributes to approximation error!). For this cause we present an explanation for machine epsilon, denoted by $epsilon_{M}$, which is the adaptation between $1$ and the least representable floating level price elevated than $1$. In single precision floating level, the smallest distinguishable price better than $1$ is $2^{-23}$, so $epsilon_{M} = 2^{-23}$. Likewise in double precision we maintain got $epsilon_{M} = 2^{-52}$. Essentially the most decimal precision of these systems would possibly also be got by converting $epsilon_{M}$ to a decimal price. Placing this all together, we glance that $32$-bit floating level has a most decimal precision of about $7$ digits, and $64$-bit floating level has a most decimal precision of about $16$ digits. You’ll handiest be ready to invent this precision for a subset of the readily available within the market floating level values; on the boundaries furthest from the origin you’ll handiest be ready to assemble $6$ decimal digits of precision with $32$-bit values and $15$ decimal digits of precision with $64$-bit values.

In summary, single and double precision floating level systems maintain the following characteristics:

Characteristics Single Precision Double Precision
Whole Size 32 bits 64 bits
Signal Size 1 bit 1 bit
Mantissa Size 23 bits ($p$ = 24) 52 bits ($p$ = 53)
Exponent Size 8 bits 11 bits
Exponent Fluctuate $[-126, 127]$ $[-1022, 1024]$
Machine Epsilon $2^{-23}$ $2^{-52}$
Underflow Limit $2^{-126} approx 1.2 cases 10^{-38}$ $2^{-1022} approx 2.2 cases 10^{-308}$
Overflow Limit $2^{127} approx 3.4 cases 10^{38}$ $2^{1024} approx 1.8 cases 10^{308}$
Max Decimal Precision 7 16
Min Decimal Precision 6 15

Sorts of error

Numerical diagnosis traditionally considers four foremost classes of error. These are:

  1. Fundamental error. This arises when a model intrinsically deviates from actuality in a nontrivial methodology. As an instance, the Kermack-McKendrick SIR model in epidemiology does no longer elevate into fable the different of reinfection by default, so it would possibly perhaps jabber fundamental traditional error if a recovered population shall be at danger of an infection. For our functions we don’t ought to be afflicted about traditional error.

  2. Floating level error. This form of error can occur in loads of how. In the most uncomplicated sense, you may possibly perhaps’t invent extra accuracy than is equipped for by the precision of your floating level machine. Any true number with extra than 16 decimal digits merely can’t be expressed to most attention-grabbing accuracy in double precision. You may possibly moreover acquire extra delicate errors launched thru operations akin to equality comparability, because of to every floating level number there correspond infinitely many true values that are rounded to it. Floating level error also happens if, at some level of your calculations, an intermediate consequence’s outside the underflow/overflow limits, even supposing the completion of the calculation would elevate it lend a hand into the fluctuate of readily available within the market values.

  3. Discretization error. Right here shall be referred to as truncation error. It happens everytime you elevate a continuous feature and approximate by finitely many values. One instance of discretization error is polynomial interpolation, where you consume $n + 1$ factors to approximate a continuous feature $f(x)$ the consume of a polynomial $P_{n}$ of degree $n$. A connected instance happens when approximating functions the consume of their Taylor sequence. A Taylor sequence can’t be evaluated in finite time because of it has infinitely many phrases, so that you want to truncate the sequence at some term $n$, which ends in a polynomial $T_n$ of degree $n$. The infinitely many ideally suited phrases of the Taylor sequence sliced off after truncation comprise the discretization error in your calculation.

  4. Convergence error. Right here shall be referred to as uninteresting lack of significance. It happens when your calculation performs too considerable work and returns results beyond the readily available within the market precision. Right here is determined from floating level error because of it’s connected to iterative repetitions. The preliminary consequence would possibly perhaps be representable (but rounded), but repeated iterations compound the preliminary error except it passes a precision threshold, at which level it ceases to amplify in accuracy and as an different decreases in accuracy.

Right here is a uncomplicated instance of floating level error. In double precision we maintain got

$$
1 + frac{1}{2}epsilon_{M} – 1 = 0 neq 1.11022 cases 10^{-16} = 1 – 1 + frac{1}{2}epsilon_{M}
$$

In the same scheme we can display veil that floating level numbers are no longer associative under addition:

$$
0.1 + (0.2 + 0.3) = 0.600ldots 001 neq 0.6 = (0.1 + 0.2) + 0.3
$$

Convergence error is primarily dreadful. Grasp into consideration this moderately uncomplicated iterative calculation in Python:

x = 10 / 9
for okay in fluctuate(20):
	x -= 1
	x *= 10
	print(x)

that can return

1.1111111111111116,
1.1111111111111160,
1.1111111111111605,
1.1111111111116045,
1.1111111111160454,
1.1111111111604544,
1.1111111116045436,
1.1111111160454357,
1.1111111604543567,
1.1111116045435665,
1.1111160454356650,
1.1111604543566500,
1.1116045435665000,
1.1160454356650007,
1.1604543566500070,
1.6045435665000696,
6.0454356650006960,
50.454356650006960,
494.54356650006960,
4935.4356650006960

Right here is a textbook instance of lack of significance. Every consequence has 16 decimal digits complete, however the precision of the consequence decreases over time. Demonstrate that with a starting price of $x = frac{10}{9}$, after one iteration we can maintain

$$
f(x) = (x – 1) cdot 10 = left(frac{10}{9} – 1goal) cdot 10 = frac{1}{9} cdot 10 = frac{10}{9}
$$

Therefore the price of $x$ ought to remain the similar after every iteration. But as an different we maintain got a mark which is all of sudden diverging away from the goal price with every iteration.

In the break, it’s no longer abnormal to eye all of these form of errors occur within the similar approximation. Order you may possibly perhaps possibly effectively be approximating a continuous feature $f(x)$ by its Taylor sequence. By picking a truncation level for the sequence, you straight bump into discretization error and thereby exclaim a lower sure on the complete error of your approximation. Then, reckoning on the implementation of the Taylor sequence approximation, your intermediate calculations on the person Taylor phrases would possibly moreover personal from compounding error over time. Your consequence would possibly moreover no longer exactly representable in floating level, that can additional decrease the accuracy of your calculation thru rounding error.

Now we maintain two ways of measuring error. Given a feature $f$ and its approximation $F$, the absolute error at some $x$ is given by

$$
epsilon(x) = |f(x) – F(x)|
$$

and the relative error at $x$ is given by

$$
eta(x) = left|frac{f(x) – F(x)}{f(x)}goal| = frac{epsilon(x)}{|f(x)|}
$$

Relative error depends on, and normalizes, absolute error. It’s precious when evaluating the accuracy of an approximation over a in actuality wide fluctuate of that you may possibly perhaps possibly moreover think about values. As an instance, say we maintain got an approximation $F$ of a feature $f$ which returns values between 0 and 4,294,967,296. An absolute error of $epsilon$ = 10 is indubitably nugatory for cramped values in that vary, but moderately exact for better values advance the simpler sure. To derive a better checklist of how goal an approximation feature is over any such fluctuate, we can as an different elevate into fable the relative error. This comes with the caveat that the relative error is undefined at $x$ when the goal price of $f$ at $x$ is precisely 0, because of in any other case we’d maintain a division by $0$. Fortunately we won’t bump into this area right here because of there is no price of $x$ which satisfies $e^x$ = 0.

Absolute and relative error give us a arrangement to rigorously quantify the accuracy of an approximation. If we can jabber an better sure on thought to be such a error metrics, we can express the worst-case accuracy of the approximation. Furthermore these error metrics are amenable to popular summary statistics, so we can consume them to uncover the imply accuracy, or the accuracy by quantile. Relative error also offers us with a arrangement to assess the accuracy of our calculation when it involves digits of precision. For any relative error $eta$, the corresponding precision is the most integer $p$ which satisfies

$$
eta leq 5 cases beta^{-p}
$$

where $beta$ is the dreadful under consideration. Decimal precision assigns $beta$ = 10 and binary precision assigns $beta$ = 2.

Setup

To preserve up away from having to address underflow and overflow limits, we’ll put into effect $e^x$ for $x$ within the domain $[-709, 709]$. These values are advance the bounds which is intriguing to be calculated in double precision (despite the indisputable truth that technically we can evaluate $x$ down to -740 or so, if we don’t care about having a symmetric domain). Before we birth on an implementation we must build an proper benchmark feature to evaluate it in opposition to. We’ll consume the exp(x) functions equipped by NumPy in Python and the popular library in C++.

Subsequent we’ll write a reporting feature rel_error that can return the relative error of our implementation versus the benchmark. Since we’re assessing an interval of values, this selection will elevate as its inputs the “goal” feature $f$ given by the benchmark, our approximation $F(x)$, the infimum $a$ and supremum $b$ of the interval $[a, b]$ to operate over, and the number $n$ of values $x$ to mediate within the interval. This would possibly moreover return a vector of the relative errors at every price of $x$ we evaluated. The output of this selection would possibly also be historical to graphically assess relative errors the consume of matplotlib.

We also need a few convenience functions, starting with linspace. The linspace feature will elevate as enter the bounds $a, b$ of an interval and a undeniable integer $n$, then return a vector consisting of $n$ linearly-spaced floating level values within the interval. NumPy offers this selection out of the box, but we’ll write our hold for the C++ assessments. We’ll also write a few C++ functions for calculating summary statistics that are equipped by NumPy.

import numpy as np

def rel_error(f, F, a, b, n, situation=False, save=False, file=None):
    """
    Benchmark feature for assessing relative error of two
    functions. The actual feature is handed first, the
    feature under testing is handed 2nd.
    :param f: The benchmark feature to be when put next in opposition to.
    :param F: The implementation of the feature we desire to test.
    :param a: The left boundary of the interval to raise into fable.
    :param b: The goal boundary of the interval to raise into fable.
    :param n: The sequence of values within the interval to eye at.
    :param situation: Boolean, whether or no longer or now to no longer situation the errors in matplotlib.
    :param save: Boolean, whether or now to no longer save situation to disk or jabber in stdout.
    :param file: String name of a file exclaim to avoid wasting situation. If save is
    goal and file is
    :param fade with the fade with the scoot64: Boolean flag signifying return precision.
    :returns: An inventory consisting of: the array of relative error
    |f(x) - F(x)| / f(x) at every of the okay linearly spaced x
    values within the interval [a, b]; the most relative error;
    the common relative error; the most sequence of iterations;
    the common sequence of iterations.
    """
    x = np.linspace(a, b, n)
    trials = np.zeros(x.form)
    for i, xi in enumerate(x):
        trials[i] = F(xi)
    management = f(x)
    errors = np.abs(trials - management) / np.abs(management)
    if situation:
        fig = plt.figure()
        axes = fig.add_subplot(1, 1, 1)
        axes.situation(x, errors, colour='b')
        axes.set_xlabel(r'$x$', fontsize=12)
        axes.set_ylabel(r'$eta_{x}$', fontsize=12)
        axes.set_title(r'Relative error of $e^x$ approximation in double precision',
                       fontsize=12)
        axes.grid()
        if save:
            express(file is no longer None)
            plt.savefig(file + '.svg', layout='svg', clear=Appropriate)
        else:
            plt.jabber()
    return errors

#consist of 
#consist of 
#consist of 

std::vector linspace(double a, double b, int n) {
    /Characteristic which returns a std::vector containing n linearly-spaced
     factors within the interval [a, b], inclusive.
     Args:
     - a: Lower sure of an interval in double precision
     - b: Upper sure of an interval in double precision
     - n: Clear integer sequence of linearly-spaced values to advance.
     Returns:
     - A vector of n double precision values within [a, b]
     */
    express(a < b && n > 0);
    std::vector factors;
    int iter = 0;
    double d = b - a;
    for (int i = 0; i <= n - 2; i++) {
        factors.insert(factors.open() + iter, a + (i d) / (flooring(n) - 1));
        iter++;
    }
    // We also must insert the simpler sure on the discontinue, or the consequence is incomplete.
    factors.insert(factors.open() + iter, b);
    return factors;
}

template 
std::vector rel_error(Characteristic f, Characteristic F, double a, double b, int n) {
    /Benchmark feature for assessing relative error of two functions over n values in
     an interval [a, b].
     Args:
     - f: Benchmark feature to be when put next in opposition to.
     - F: Approximation of the goal feature.
     - a: Left boundary of the interval. Must be no longer as a lot as b.
     - b: Honest boundary of the interval. Must be elevated than a.
     - n: The sequence of values within the interval to raise into fable.
     Returns:
     - A vector matching the enter precision consisting of the relative errors
     f and F at every x evaluated from the interval [a, b]
     */
    std::vector x = linspace(a, b, n);
    std::vector management;
    std::vector test;
    for (auto i : x) {
        management.push_back(f(i));
        test.push_back(F(i));
    }
    std::vector errors;
    errors.reserve(management.dimension());
    for (int i = 0; i < control.size(); i++) {
        errors.push_back(abs(control[i] - test[i]) / abs(control[i]));
    }
    return errors;
}

double mean(std::vector nums) {
    /Basic function which returns the mean of double precision
     values contained in the input vector. Returns the mean as
     a double precision number.
     */
    return std::accumulate(nums.begin(), nums.end(), 0) / nums.size();
}

double var(std::vector nums) {
    /Basic function which returns the variance of double
     precision values contained in the input vector. Returns
     the variance as a double precision number.
     */
    std::vector square_vec;
    square_vec.reserve(nums.size());
    for (auto i : nums) {
        square_vec.push_back(i i);
    }
    return mean(square_vec) - pow(mean(nums), 2);
}

double median(std::vector nums) {
    /Basic function which returns the median of double
     precision values contained in the input vector. Returns
     the median as a double precision number.
     */
    std::sort(nums.begin(), nums.end());
    if (nums.size() % 2 == 0) {
        return nums[(int)(nums.size() / 2) - 1] + nums[(int)(nums.size() / 2)];
    }
    else {
        return nums[(int)(nums.size() / 2)];
    }
}

For reporting error statistics, we’ll use the following main harnesses.

if __name__ == '__main__':
    a = -709
    b = 709
    n = 10000
    errors = rel_error(np.exp, taylor_exp, a, b, n)
    print("Max relative error = {}".format(errors.max()))
    print("Min relative error = {}".format(errors.min()))
    print("Avg relative error = {}".format(np.average(errors)))
    print("Med relative error = {}".format(np.median(errors)))
    print("Var relative error = {}".format(np.var(errors)))
    s = 0
    for error in errors:
        if error > 5e-15:
            s += 1
    print("{} percent of the values maintain no longer as a lot as 15 digits of precision".layout(
        s / len(errors) 100
    ))

int foremost() {
    int a = -709.0;
    int b = 709.0;
    int n = 10000;
    std::vector errors = rel_error(exp, taylor_exp, a, b, n);
    std::cout << "Max relative error: " << *std::max_element(errors.open(), errors.discontinue()) << std::endl;
    std::cout << "Min relative error: " << *std::min_element(errors.open(), errors.discontinue()) << std::endl;
    std::cout << "Avg relative error: " << imply(errors) << std::endl;
    std::cout << "Med relative error: " << median(errors) << std::endl;
    std::cout << "Var relative error: " << var(errors) << std::endl;
    double s = 0;
    for (auto i : errors) {
        if (i > 5e-15) {
            s += 1;
        }
    }
    std::cout << s / errors.dimension() 100 << " percent of the values maintain no longer as a lot as 15 digits of precision." << std::endl;
    return 0;
}

This will give us the maximum, minimum, mean and median relative error over the interval $[-709, 709]$. It will also tell us the variance of the relative error and the percentage of the sample which has less precision than a given threshold.

Taylor series approximation

In the Background section we showed how $e^x$ is defined using its Taylor series representation. Since this is the “canonical” definition of $e^x$, it’s also the most straightforward method for implementing the function. We can use the Taylor series representation to calculate $e^x$ to arbitrary precision by evaluating finitely many terms of the Taylor series. By definition, we have

$$
e^x = sum_{k = 0} frac{x^k}{k!} = 1 + frac{x}{1!} + frac{x^2}{2!} + frac{x^3}{3!} + ldots
$$

There are several problems with naively implementing and evaluating this polynomial as written. First, it requires exponentiation in the numerator. In real-world scenarios where we’d be implementing the exp(x) function, it’s likely that we don’t have a general pow(a, b) function we can use to evaluate arbitrary exponents of arbitrary bases.

4: In fact, we might be implementing exp(x) precisely because we want that function and don’t presently have it. While it’s not the most optimal method for doing so, we can naively define pow(a, b) with exp(x) using the identity $a^b = e^{a log{b}}$.

The other problem with exponentiation is that it’s much more expensive than e.g. multiplication. Given the opportunity we would prefer a method which only requires multiplication, or at most exponentiation in base 2.

The second problem is that we have factorials in the denominators. Dividing by a factorial is highly prone to error and expensive. For example, at relatively small values of $x$, completely evaluating factorial(x) will overflow before we have a chance to complete the division. This is numerically unstable, so we need to find an alternative way to represent the Taylor series.

Horner's method

Note that for any polynomial $a_0 + a_1 x + a_2 x^2 + ldots$, we can iteratively factor out the variable $x$ to obtain the equivalent polynomial $a_0 + x(a_1 + x(a_2 + ldots$ Furthermore, if we have a factorial expression of the form $1! + 2! + 3! + 4! + ldots$, we can (by the definition of a factorial), factor out $3!$ from $4!$, and $2!$ from $3!$, and so on. Factoring out terms in order to nest multiplications in this way is known as Horner’s method of polynomial evaluation. When we apply Horner’s method to the Taylor series representation of $e^x$, we obtain the new form

$$
e^x = 1 + x(1 + frac{x}{2}(1 + frac{x}{3}(1 + ldots
$$

This is much more efficient and stable for implementation, and we’ll use this representation for the truncated series instead.

Error bound

Now we’ll do a bit of error analysis on the Taylor series. We can prove an upper bound on the relative error of the Taylor series truncated at the $n^{text{th}}$ term of the series. Let $T_{n}(x)$ denote the Taylor series of $e^x$ truncated at the $n^{text{th}}$ term, let $epsilon_{n}$ denote the absolute error of $T_{n}(x)$ versus $e^{x}$ and let $eta_n$ denote the relative error of $T_{n}(x)$ versus $e^x$. By the definition of absolute error, we have

$$
e^x = T_{n}(x) + epsilon_n(x)
$$

Since $e^{x}$ is an infinitely differentiable function, it follows from Taylor’s theorem that the absolute error can be represented using the Lagrange form

$$
epsilon_n(x) = frac{e^{(n + 1)c}}{(n + 1)!}(x - a)^{n + 1}
$$

where $e^{(n + 1)c}$ denotes the $(n + 1)^{text{th}}$ derivative of $e^x$, and $c$ is a real value satisfying $a leq c leq x$. Since $e^x$ is its own derivative, the foregoing identity simplifies to

$$
epsilon_n(x) = e^{c}frac{x^{n + 1}}{(n + 1)!}
$$

For all $a, b$ such that $b > a$, it follows that $e^b > e^a$. Right here is to express that $e^x$ is an rising feature, and its most price on the interval $[a, x]$ is attained at $x$. Therefore we can derive that, since $c$ is at most $x$,

$$
epsilon_n(x) leq e^{x}frac{x^{n + 1}}{(n + 1)!}
$$

It follows from the definition of relative error that

$$
eta_n = frac{epsilon_n}{e^x} leq frac{x^{n + 1}}{(n + 1)!}
$$

which offers us an better sure on the relative error of the Taylor sequence of $e^x$ truncated on the $n^{textual convey material{th}}$ term. Furthermore we can look that completely the error of $T_{n + 1}(x)$ and $T_{n}(x)$ is merely the $(n + 1)^{textual convey material{th}}$ term of the Taylor sequence, because of subtracting $T_{n}(x)$ from $T_{n + 1}(x)$ cancels all phrases with the exception of for the $(n + 1)^{textual convey material{th}}$ term.

$$
open{aligned}
|T_{n + 1}(x) - T_{n}(x)| &= left|left(1 + x + ldots + frac{x^n}{n!} + frac{x^{n + 1}}{(n + 1)!}goal) - left(1 + x + ldots + frac{x^n}{n!}goal)goal| \
&= left|frac{x^{n + 1}}{(n + 1)!}goal|
discontinue{aligned}
$$

Thus the relative error of $T_{n}(x)$ is bounded above by the $(n + 1)^{textual convey material{th}}$ term of the Taylor sequence. Right here is no longer the optimum better sure, but it’s a precious heuristic for proceeding.

5: You cannot on the complete acquire the true price of absolutely the error $epsilon_n$ after truncating the Taylor sequence of some feature $f$ on the $n^{textual convey material{th}}$ term. While you happen to would possibly, you wouldn’t need an approximation since $f(x) = T_{n}(x) + epsilon_{n}$. In most cases the aim of error diagnosis is to title an proper sure on the error and work from there to envision a precious precision aim. We would indubitably develop the sure we’ve proven tighter the consume of extra delicate diagnosis.

Precision sure

We are able to be conscious the relative error better sure we’ve excellent proven to uncover a lower sure on the sequence of phrases we’ll must invent that relative error. Order we’d love to invent an approximation with a relative error of at most $eta$. Then we’re drawn to figuring out the number $n$ of Taylor phrases that shall be required to invent a relative error no longer as a lot as or equal to $eta$. By the foregoing error diagnosis we know that the relative error of the Taylor sequence truncated on the $n^{textual convey material{th}}$ term is bounded above by the price of the $(n + 1)^{textual convey material{th}}$ Taylor term. We are able to open by discovering an $eta$ to fulfill

$$
left|frac{x^n}{n!}goal| leq eta
$$

By making consume of logarithms to both aspects of the inequality we assemble the identical inequality

$$
logleft(left|frac{x^n}{n!}goal|goal) leq log(eta)
$$

To simplify working with the factorial, we be conscious Stirling’s approximation. By the algebra of logarithms, we can then nick this inequality to

$$
nlog(x) - nlog(n) + n leq log(eta)
$$

and then to

$$
logleft(frac{x}{n}goal)^{n} + n leq log(eta)
$$

We are able to simplify this additional by introducing the multiplicative identity the consume of $log(e) = 1$:

$$
logleft(frac{x}{n}goal)^n + nlog(e) leq log(eta)
$$

We be conscious log solutions again to nick the 2nd term on the left:

$$
logleft(frac{x}{n}goal)^n + log(e^n) leq log(eta)
$$

Then we can crumple the complete left aspect to a single term:

$$
logleft(frac{x e}{n}goal)^n leq log(eta)
$$

which is goal like

$$
logleft(frac{x e}{n}goal) leq frac{1}{n}log(eta)
$$

and at last

$$
frac{xe}{n} leq eta^{-n}
$$

Therefore in uncover to approximate $e^x$ with a relative error no longer as a lot as or equal to $eta$, we must calculate on the least $n$ Taylor phrases, where $n$ satisfies

$$
frac{xe}{eta^{-n}} leq n
$$

This puts a rough flooring on the sequence of phrases we’ll need. We’ll on the complete need vastly extra. The limit of $frac{1}{n}$ as $n$ approaches $infty$ is 0. It follows that, for any true $x$, the limit of $x^{-n}$ as $n$ approaches $infty$ is $x^0$ = 1. Thus as $n$ will increase, the flooring on the sequence of phrases we’ll need turns into asymptotically closer to

$$
xe leq n
$$

With this preliminary diagnosis in hand, we can transfer on to implementation.

Implementation

Our precision diagnosis uses $xe$ Taylor phrases as a rough lower sure. We peek an approximation of $e^x$ which is indistinguishable from the goal price in double precision (or very practically so). Right here is to express that we’d love to heart of attention on a relative error of machine epsilon, $eta = epsilon_{M}$. We are able to’t invent $epsilon_{M}$ across the complete interval because of the the gradual convergence of Taylor sequence and the sparsity of fade with the fade with the scoot values some distance from the origin. Alternatively we can derive within one digit of precision nearly in every single place. To derive so, we’ll need extra than $xe$ Taylor phrases.

Heuristically, multiplying $xe$ by 12 offers a sweet exclaim on the sequence of phrases to consume in double precision. The utilization of a multiplicative fixed no longer as a lot as 12 quickly reduces the quantity of accuracy you may possibly perhaps possibly moreover assemble as you methodology the bounds of $[-709, 709]$. On the diversified hand, the Taylor sequence of $e^x$ converges so slowly that a fixed elevated than 12 offers indubitably no fundamental utility. In actual fact if we consume a fixed elevated than 12, the additional phrases of the Taylor sequence are so cramped that they vanish under the double precision flooring, so at most productive they give no switch in calculation while making our algorithm much less atmosphere pleasant, and at worst they make contributions to a uninteresting lack of significance. Therefore we consume an better sure of 12$lceil |x|e rceil$, where $lceil |x|e rceil$ is the ceiling of $|x|e$ (because of we can handiest maintain a undeniable integer sequence of Taylor phrases) and $|x|$ is historical because of $x$ is potentially no longer sure. By the symmetry of $e^{-x}$ = $frac{1}{e^{x}}$, when $x$ is detrimental we merely want to consume $|x|$.

import numpy as np
import math

def taylor_exp(x):
    """
    Evaluates f(x) = e^x for any x within the interval [-709, 709].
    If x < -709 or x > 709, raises an assertion error thanks to
    underflow/overflow limits. Implemented the consume of the Taylor sequence
    approximation of e^x truncated at ceil(|x| e) 12 phrases, which
    achieves on the least 14 and at most 16 digits of precision over the
    complete interval.
    Efficiency - There are exactly 36 ceil(|x| e) + 5
    operations; 69,413 within the worst case (x = 709 or -709):
    - (12 ceil(|x| e)) + 2 multiplications
    - (12 ceil(|x| e)) + 1 divisions
    - (12 ceil(|x| e)) additions
    - 1 rounding
    - 1 absolute price
    Accuracy - over a sample of 10,000 equi-spaced factors in
    [-709, 709] we maintain got the following error statistics:
    - Max relative error = 8.39803008291388e-15
    - Min relative error = 0.0
    - Avg relative error = 1.2504273150972772e-15
    - Med relative error = 9.45400364584735e-16
    - Var relative error = 1.2390105184708059e-30
    - 0.91 percent of the values maintain no longer as a lot as 15 digits of precision
    :param x: (int) or (fade with the fade with the scoot) vitality of e to mediate
    :return: (fade with the fade with the scoot) approximation of e^x
    """
    # Test that x is a sound enter.
    express(-709 <= x <= 709)
    # When x = 0 we know e^x = 1, so we exit early.
    if x == 0:
        return 1
    # Normalize x to a non-detrimental price to raise succor of
    # reciprocal symmetry. But preserve music of the fashioned imprint
    # in case we must advance the reciprocal later.
    x0 = np.abs(x)
    # First term of the Taylor expansion of e^x is 1.
    # Tn is the variable we can return for e^x, and its
    # price at any term is the sum of all within the intervening time evaluated
    # phrases within the Taylor sequence.
    Tn = 1
    # Preserve a truncation level for the Taylor sequence,
    # then work down from there evaluating the polynomial
    # the consume of Horner's scheme.
    n = math.ceil(x0 np.e) 12
    for okay in fluctuate(n, 0, -1):
        Tn = Tn (x0 / okay) + 1
    # If the fashioned enter is no longer as a lot as 0, we desire the
    # reciprocal because of e^-x = 1 / e^x
    if x < 0:
        # 1 division if fashioned imprint is detrimental
        Tn = 1 / Tn
    return Tn

#consist of 
#consist of 

double taylor_exp(double x) {
    /Evaluates f(x) = e^x for any x within the interval [-709, 709].
     If x < -709 or x > 709, raises an assertion error. Implemented
     the consume of the truncated Taylor sequence of e^x with ceil(|x| e) 12
     phrases. Achieves on the least 14 and at most 16 digits of precision
     over the complete interval.
     Efficiency - There are exactly 36 ceil(|x| e) + 5
     operations; 69,413 within the worst case (x = 709 or -709):
     - (12 ceil(|x| e)) + 2 multiplications
     - (12 ceil(|x| e)) + 1 divisions
     - (12 ceil(|x| e)) additions
     - 1 rounding
     - 1 absolute price
     Accuracy - Over a sample of 10,000 linearly spaced factors in
     [-709, 709] we maintain got the following error statistics:
     - Max relative error = 8.39803e-15
     - Min relative error = 0.0
     - Avg relative error = 0.0
     - Med relative error = 1.90746e-15
     - Var relative error = 0.0
     - 0.88 percent of the values maintain no longer as a lot as 15 digits of precision
     Args:
     - x: (double fade with the fade with the scoot) vitality of e to mediate
     Returns:
     - (double fade with the fade with the scoot) approximation of e^x in double precision
     */
    // Test that x is a sound enter.
    express(-709 <= x && x <= 709);
    // When x = 0 we already know e^x = 1.
    if (x == 0) {
        return 1.0;
    }
    // Normalize x to a non-negative value to take advantage of
    // reciprocal symmetry. But keep track of the original sign
    // in case we need to return the reciprocal of e^x later.
    double x0 = abs(x);
    // First term of Taylor expansion of e^x at a = 0 is 1.
    // tn is the variable we we will return for e^x, and its
    // value at any time is the sum of all currently evaluated
    // Taylor terms thus far.
    double tn = 1.0;
    // Chose a truncation point for the Taylor series using the
    // heuristic bound 12 ceil(|x| e), then work down from there
    // using Horner's method.
    int n = (int)ceil(x0 M_E) 12;
    for (int i = n; i > 0; i--) {
        tn = tn (x0 / i) + 1.0;
    }
    // If the fashioned enter x is no longer as a lot as 0, we desire the reciprocal
    // of the e^x we calculated.
    if (x < 0) {
        tn = 1 / tn;
    }
    return tn;
}

Taylor sequence approximations present arbitrary precision on the worth of inefficiency because of they converge moderately slowly. This signifies that while these functions are calculating comparatively many phrases for his or her approximations, despite the consume of the similar (double) precision for intermediate calculation and output we’re ready to assemble a minimum precision of practically 15 digits. Likewise no longer as a lot as 1% of all values in our sample jabber no longer as a lot as 15 digits of precision. To derive a better thought of the error distribution over the complete interval, let’s situation the relative error of this implementation versus the benchmark exp(x) feature accessed thru NumPy at every price of $x$ in our sample. If we limit our situation to the sample of $x$ values within the interval $[-1, 1]$, we uncover that it performs extremely effectively.

The relative error of this implementation is no longer as a lot as $3.5 cases 10^{-16}$ in every single place on $[-1, 1]$. That manner the two functions are practically indistinguishable to machine precision the consume of double floats.

Alternatively, elevate into fable what happens when we situation the relative error across the complete interval $[-709, 709]$.

$eta_{x}$ represents the relative error of the Taylor sequence implementation truncated at term $12lceil |x|erceil$ and evaluated at $x$

This case illustrates a popular “wing” sample evident within the variance of Taylor approximation relative error. The variance of the relative error is cramped advance 0 and handsome in direction of the boundaries of $[-709, 709]$. Furthermore the scale of the relative error at $x$ is proportionate to the distance of $x$ from 0. Right here is because of Taylor sequence are comparatively goal advance the center $a$ of approximation (on this case, the origin). As $x$ moves away from the origin, the Taylor polynomial requires extra phrases to invent the similar degree of accuracy. At the similar time, the inherent sparsity of floating level values farther away from the origin works in opposition to the approximation, which compounds the general lack of precision. Thus the approximation turns into extra and extra much less atmosphere pleasant and no more true as absolutely the price of $x$ will increase. The following situation illustrates the sequence of operations historical to approximate $e^x$.

Our Taylor sequence implementation approximates $e^x$ in $36lceil |x| e rceil + 5$ operations

This algorithm shows linear time complexity - as absolutely the price of $x$ will increase, the sequence of operations will increase by a component of approximately $36e approx 100$. We also look that the quantity of work the algorithm performs is commensurate with the variance of its relative error, which manner that this implementation is both much less atmosphere pleasant and no more goal away from the center of the interval.

Fluctuate reduction

We are able to considerably mitigate the aforementioned obstacles by combining the core methodology of Taylor sequence approximation with one scheme known as fluctuate reduction. In so doing, we nick the exclaim of Taylor approximation to a cramped neighborhood which is bounded across the origin no topic the price of $x$. This would possibly moreover alternate off some precision in return for fundamental efficiency improvements. The elemental thought is to decompose $e^x$ by factorizing it as a made of a in actuality cramped vitality of $e$ and an integer vitality of two, since both of these substances are extremely atmosphere pleasant to compute. We consume the identity

$$
e^x = e^r cdot 2^okay
$$

where $r$ is a true price fulfilling $|r| leq frac{1}{2}log 2$ and $okay$ is a undeniable integer.

6: Incidentally, right here's the methodology the exp(x) feature is implemented within the MPFR arbitrary precision library.

The recent algorithm

  1. solves for a $okay$ to fulfill these constraints,

  2. computes $r$ the consume of the price of $okay$,

  3. evaluates a truncated Taylor polynomial of $e^r$, and

  4. left shifts the consequence by $okay$ to assemble $e^x$.

Whereas we don’t on the complete don’t maintain a overall pow(a, b) feature when enforcing exp(x), it’s very likely we can effectively calculate powers of two the consume of both exp2(x) or left shifts. In specific, display veil that $e^x = e^r cdot 2^okay$ implies $x = r + okay log 2$, by taking the pure logarithm of both aspects of the equality. Then by conserving apart $okay$ we assemble the identity

$$
okay = leftlceil frac{x}{log 2} - frac{log 2}{2} goalrceil
$$

which offers us a formula for figuring out $okay$. Likewise we can compute $r$ when $okay$ is known by conserving apart $r$ in $x = r + okay log 2$

$$
r = x - okay log 2
$$

We handiest need 14 Taylor phrases to mediate $e^r$ to 16 digits of precision for any $|r| leq frac{1}{2}log 2$, because of $r$ is very shut to the center $a = 0$. Right here is a in actuality handsome improvement over the $12 lceil |x| erceil$ phrases required within the outdated implementation. That being acknowledged, we won't quiz the similar precision characteristics as we got with the last algorithm. The fluctuate reduction step achieves a fundamental amplify in effectivity by trading off a modest (single digit) decrease in precision attributable to the left shift step. To appear this, let $epsilon$ be absolutely the error of $T_{n}$ and $e^r$. Then by the foregoing identity, we maintain got

$$
e^x = 2^okay cdot (e^r - epsilon)
$$

which shows that no topic error we maintain got within the preliminary approximation of $e^r$ will be compounded when we multiply $e^r$ by $2^okay$. This motivates us to acquire the least fulfilling $okay$ we can. Our implementations are as follows. Provided that we can quiz elevated relative error, we can assess what number of values invent no longer as a lot as 14 digits of precision in exclaim of 15.

import numpy as np
import math

def reduced_taylor_exp(x):
    """
    Evaluates f(x) = e^x for any x within the interval [-709, 709].
    If x < -709 or x > 709, raises an assertion error thanks to
    underflow/overflow limits. Performs a fluctuate reduction step
    by making consume of the identity e^x = e^r 2^okay for |r| <= 1/2 log(2)
    and likely integer okay. e^r is approximated the consume of a Taylor sequence
    truncated at 14 phrases, which is sufficient because of e^r is within the
    interval [0, 1/2 log(2)]. The end result is left shifted by okay
    to assemble e^r 2^okay = e^x.
    Efficiency: In the worst case we maintain got 51 operations:
    - 16 multiplications
    - 16 divisions
    - 14 additions
    - 2 subtractions
    - 1 rounding
    - 1 left shift
    - 1 absolute price
    Accuracy: Over a sample of 10,000 equi-spaced factors in
    [-709, 709] we maintain got the following statistics:
    - Max relative error = 7.98411243625574e-14
    - Min relative error = 0.0
    - Avg relative error = 1.7165594254816366e-14
    - Med relative error = 1.140642685235478e-14
    - Var relative error = 2.964698541882666e-28
    - 6.29 percent of the values maintain no longer as a lot as 14 digits of precision
    :param x: (int) or (fade with the fade with the scoot) vitality of e to mediate
    :return: (fade with the fade with the scoot) approximation of e^x
    """
    # If x is no longer a sound price, exit early
    express(-709 <= x <= 709)
    # If x = 0, we know e^x = 1 so exit early
    if x == 0:
        return 1
    # Normalize x to a undeniable price since we can consume reciprocal
    # symmetry to handiest work on sure values. Preserve music of
    # the fashioned imprint for return price
    x0 = np.abs(x)
    # Arduous code the price of pure log(2) in double precision
    l = 0.6931471805599453
    # Resolve for an integer okay fulfilling x = okay log(2) + r
    okay = math.ceil((x0 / l) - 0.5)
    # p = 2^okay
    p = 1 << k
    # r is a value between 0 and 0.5 log(2), inclusive
    r = x0 - (k l)
    # Setup the Taylor series to evaluate e^r after
    # range reduction x -> r. The Taylor sequence
    # handiest approximates on the interval [0, log(2)/2]
    Tn = 1
    # We would like at most 14 phrases to invent 16 digits
    # of precision anyplace on the interval [0, log(2)/2]
    for i in fluctuate(14, 0, -1):
        Tn = Tn (r / i) + 1
    # e^x = e^r 2^okay, so multiply p by Tn. This loses us
    # about two digits of precision because of we compound the
    # relative error by 2^okay.
    p *= Tn
    # If the fashioned imprint is detrimental, return reciprocal
    if x < 0:
        p = 1 / p
    return p

#consist of 
#consist of 

double reduced_taylor_exp(double x) {
    /Evaluates f(x) = e^x for any x within the interval [-709, 709].
     If x < -709 or x > 709, raises an assertion error. Performs a
     fluctuate reduction step by making consume of the identity e^x = e^r 2^okay
     for |r| <= 1/2 log(2) and likely integer okay. e^r is evaluated
     the consume of Taylor sequence truncated at 14 phrases, which is sufficient
     to invent 16 digits of precision for r so shut to 0. The
     consequence's left shifted by alright to assemble e^r 2^okay = e^x.
     Efficiency: In the worst case we maintain got 51 operations:
     - 16 multiplications
     - 16 divisions
     - 14 additions
     - 2 subtractions
     - 1 rounding
     - 1 left shift
     - 1 absolute price
     Accuracy: Over a sample of 10,000 linearly spaced factors in
     [-709, 709], we maintain got the following error statistics:
     - Max relative error = 7.99528e-14
     - Min relative error = 0
     - Avg relative error = 0
     - Med relative error = 2.27878e-14
     - Var relative error = 0
     - 6.4 percent of the values maintain no longer as a lot as 14 digits of precision.
     Args:
     - x (double fade with the fade with the scoot) vitality of e to mediate
     Returns:
     - (double fade with the fade with the scoot) approximation of e^x
     */
    // Be determined x is a sound enter
    express(-709 <= x && x <= 709);
    // When x is 0, we know e^x is 1
    if (x == 0) {
        return 1;
    }
    // Normalize x to a undeniable price to raise succor of
    // reciprocal symmetry, but preserve music of the fashioned price
    double x0 = abs(x);
    // Resolve for okay fulfilling x = 2^okay log(2) r, with |r| <= 1/2 log(2)
    int k = ceil((x0 / M_LN2) - 0.5);
    // Determine r using the k we computed
    double r = x0 - (k M_LN2);
    // Setup the Taylor series to evaluate e^r after range reduction
    // x -> r. This handiest approximates over the interval [0, 1/2 log(2)]
    double tk = 1.0;
    double tn = 1.0;
    // We handiest need 14 phrases to invent 16 digits of precision for e^r
    for (int i = 1; i < 14; i++) {
        tk *= r / i;
        tn += tk;
    };
    tn = tn exp2(okay);
    if (x < 0) {
        return 1 / tn;
    }
    return tn;
}

This implementation is scheme faster. Our worst-case operation depend is diminished by 3 orders of magnitude, from about 70,000 iterations within the fundamental algorithm to excellent 51 on this algorithm. And as anticipated we maintain got lost one digit of precision on common. If we can tolerate the one digit lack of precision, right here's an wonderful efficiency improvement. We restful invent 13 digits of precision within the worst case, and 14 digits of precision on common. Most attention-grabbing about 6% of values jabber no longer as a lot as 14 digits of precision. Right here is the placement of the error distribution across the interval $[-1, 1]$.

Likewise right here is the placement of the error distribution across the complete interval.

$eta_{x}$ represents the relative error of the diminished Taylor sequence approximation of $e^r$ truncated at 14 phrases and left shifted $okay$ cases

We are able to appear that the relative error is extremely symmetric and heaps extra and masses extra popular than the outdated situation. The variance of the relative error restful clearly will increase additional away from the origin. Alternatively, there are six distinctive and symmetric subintervals on both aspect of the origin, with jumps in variance between subintervals closer to the origin and subintervals closer to the outer boundaries. This illustrates one other succor of fluctuate reduction, which is that the relative error is scheme extra predictable and more uncomplicated to cause about. We are able to also be conscious fluctuate reduction for the diversified implementation methodologies.

Lagrange interpolation

Polynomial bases

Now we can transfer from Taylor sequence approximation to the extra overall realm of polynomial interpolation. By the Weierstrass approximation theorem, for any situation of $n$ + 1 values $y_1, y_2, ldots, y_{n + 1}$ positioned at determined factors $x_1, x_2, ldots, x_{n + 1}$, there exists a unfamiliar polynomial of degree $n$, denoted by $P_n$, which exactly passes thru every of the $n$ + 1 factors. This polynomial is known as an interpolant. Likewise functions would possibly also be interpolated: $P_n$ is the unfamiliar interpolant with degree $n$ of a feature $f$ at factors $x_1, x_2, ldots, x_{n + 1}$ if $P_n(x_i) = f(x_i)$ for all $x_i$ with $1 leq i leq n + 1$.

The situation of all polynomials with degree no longer as a lot as or equal to $n$, denoted by $mathbb{F}[x]$, comprises a vector rental of dimension $n$ + 1. Right here is to express that polynomials with degree at most $n$ are equivalently represented as $n$-dimensional vectors and develop a linear structure under the operations of vector addition and scalar multiplication. It follows that every polynomial within the rental $mathbb{F}[x]$ is a linear aggregate of a foundation of $mathbb{F}[x]$. The popular foundation of $mathbb{F}[x]$ is the placement of monomials

$$
{1, x, x^{2}, ldots, x^{n}}
$$

Right here is on the complete is known as the monomial foundation. After we consume the monomial foundation, every vector of $mathbb{F}[x]$ is a polynomial with the illustration

$$
P_n(x) = sum_{okay = 0}^{n}p_{okay}x^{okay} = p_0 + p_{1}x + p_{2}x^2 + ldots + p_{n}x^n
$$

where $p_0, p_1, ldots, p_n$ comprise the coordinate vector of $P_n$. Right here is to express that $p_0, p_1, ldots, p_n$ are the unfamiliar scalar coefficients which present an explanation for $P_n$ as a linear aggregate of the monomial foundation. Provided that the polynomials comprise a vector rental, the topic of figuring out the unfamiliar interpolant $P_{n}$ for a feature $f$ reduces to fixing a machine of linear equations. With $n$ + 1 determined factors $x_0, x_1, ldots, x_n$ and their corresponding values $f(x_0), f(x_1), ldots, f(x_n)$, we maintain got the machine of $n$ + 1 equations in $n$ + 1 unknowns

$$
open{aligned}
P_{n}(x_0) &= p_0 + p_{1}x_0 + p_{2}x_{0}^{2} + ldots + p_{n}x_{0}^{n} = f(x_0) \
P_{n}(x_1) &= p_0 + p_{1}x_1 + p_{2}x_{1}^{2} + ldots + p_{n}x_{1}^{n} = f(x_1) \
P_{n}(x_2) &= p_0 + p_{1}x_2 + p_{2}x_{2}^{2} + ldots + p_{n}x_{2}^{n} = f(x_2) \
vdots &quad vdots quad vdots quad vdots quad vdots quad vdots quad vdots quad vdots quad vdots quad vdots quad vdots quad vdots quad vdots \
P_{n}(x_n) &= p_0 + p_{1}x_n + p_{2}x_{n}^{2} + ldots + p_{n}x_{n}^{n} = f(x_n)
discontinue{aligned}
$$

Solving this sort offers us the coordinate vector defining the unfamiliar interpolating polynomial $P_n$ of $f$ with admire to the monomial foundation. To resolve this sort, we can signify the factors $1, x_1, ldots, x_n$ and their powers the consume of the Vandermonde matrix

$$
open{bmatrix}
1 & x_0 & x_0^2 & ldots & x_0^n \
1 & x_1 & x_1^2 & ldots & x_1^n \
1 & x_2 & x_2^2 & ldots & x_2^n \
vdots & vdots & vdots & ddots & vdots \
1 & x_n & x_n^2 & ldots & x_n^n
discontinue{bmatrix}
$$

which permits us to signify the complete machine of equations the consume of the matrix equation

$$
open{bmatrix}
1 & x_0 & x_0^2 & ldots & x_0^n \
1 & x_1 & x_1^2 & ldots & x_1^n \
1 & x_2 & x_2^2 & ldots & x_2^n \
vdots & vdots & vdots & ddots & vdots \
1 & x_n & x_n^2 & ldots & x_n^n
discontinue{bmatrix}
open{bmatrix}
p_0 \
p_1 \
p_2 \
vdots \
p_n
discontinue{bmatrix} =
open{bmatrix}
f(x_0) \
f(x_1) \
f(x_2) \
vdots \
f(x_n)
discontinue{bmatrix}
$$

Solving this matrix equation is goal like inverting the Vandermonde matrix, because of multiplying the inverse of the Vandermonde matrix by the vector of $f(x_i)$ values yields the coordinate vector of $p_i$ scalars.

The sequence of foundation has a area topic influence on how effectively we can acquire the interpolating polynomial. If we desire to preserve away from doing matrix multiplication and inversion to acquire the interpolating polynomial, we can as an different consume the Lagrange foundation of $mathbb{F}[x]$. Related to the factors $x_0, x_1, ldots, x_n$, every vector within the Lagrange foundation is a polynomial of the develop

$$
ell_{i}(x) = frac{x - x_0}{x_i - x_0} cdot ldots cdot frac{x - x_{i - 1}}{x_i - x_{i - 1}} cdot frac{x - x_{i + 1}}{x_i - x_{i + 1}} cdot ldots cdot frac{x - x_n}{x_i - x_n}
$$

or, extra succinctly,

$$
ell_{i}(x) = prod_{j = 0, j neq i}^{n} frac{x - x_j}{x_i - x_j}
$$

The Lagrange foundation comes with a in actuality noteworthy advantage: the price of every foundation polynomial $ell_{i}$ at a degree $x_j$ is equal to the Kronecker delta

$$
delta_{i, j} = open{cases}
0, & i neq j \
1, & i = j
discontinue{cases}
$$

It follows that the $n$ + 1 $cases$ $n$ + 1 matrix inquisitive about the matrix equation could be the identity matrix in exclaim of the Vandermonde matrix. Right here is less complicated to compute because of we don’t must originate an inversion step - the identity matrix is its hold inverse. This also simplifies the illustration of the interpolant, for the rationale that Lagrange develop is a linear aggregate of the Lagrange foundation polynomials $ell_i(x)$ and the corresponding values $f(x_i)$.

$$
P_n(x) = sum_{i = 0}^{n}f(x_i)ell_i(x) = f(x_0)ell_0(x) + f(x_1)ell_1(x) + ldots + f(x_n)ell_n(x)
$$

Implementation

Let’s elevate a peek at a fluctuate diminished implementation.

import numpy as np

class LagrangeInterpolation:
    """
    Calculates the Lagrange interpolant of a feature f parameterized by n determined
    x values and corresponding f(x) values. Stores the level data, then approximates
    the price of f at x by calculating the Lagrange foundation polynomials with admire
    to x and taking their linear aggregate with admire to f(x_0), ..., f(x_n).
    """
    def __init__(self, xi, yi):
        self.xi = xi
        self.yi = yi
    def interpolate(self, x):
        """

        :param x: (double fade with the fade with the scoot) level at which to approximate f
        :return: (double fade with the fade with the scoot) approximation of f(x)
        """
        # Initialize the foundation as a vector of 1s with the similar length as the n factors
        foundation = np.ones(self.xi.form)
        # Double for loop is O(n^2), where n is the sequence of determined x factors
        for i in fluctuate(len(self.xi)):
            for j in fluctuate(len(self.xi)):
                # If the i index is equal to the j index, skip this
                # We're running thru this loop to calculate
                # l_i(x) = (x - x_0) / (x_i - x_0) ... (x - x_n) / (x_i - x_n)
                if i == j:
                    continue
                foundation[i] *= (x - self.xi[j]) / (self.xi[i] - self.xi[j])
        # Return the linear aggregate of the foundation and f(x_0), ..., f(x_n) values
        return sum(foundation[i] self.yi[i] for i in fluctuate(len(self.xi)))

def lagrange_exp(x):
    """
    Approximates f(x) = e^x for any x on the interval [-709, 709]
    the consume of Lagrange interpolation.
    If x < -709 or x > 709, raises an assertion error because of
    double precision underflow/overflow limits. Performs a
    fluctuate reduction step by making consume of the identity e^x = e^r 2^okay
    for |r| <= 1/2 log(2) and integer okay. The Lagrange interpolant
    approximates the price of e^r, then the consequence is multiplied
    by 2^alright to assemble the last approximation for e^x.
    Efficiency: We sample 15 determined factors of e^x and their values to manufacture
    a Lagrange foundation consisting of 15 polynomials. Since the Lagrange foundation is
    computed for every approximation f(x), we maintain got a flooring of
    - 15^2 multiplications
    - 15^2 divisions
    - 2 15^2 subtractions
    for approximately 4 15^2 operations overall. The linear aggregate step provides
    one other 15 additions and 15 multiplications by running thru the length of
    the foundation once extra, and the interpolation correct additional entails
    - 1 absolute price
    - 1 rounding
    - 1 division
    - 2 subtraction
    - 1 left shift
    - 2 multiplications
    for one other 8 operations. Therefore there are (4 15^2) + 38 = 938 operations
    complete.
    Accuracy: Over a sample of 10,000 equi-spaced factors in
    [-709, 709], we maintain got the following error statistics:
    Max relative error = 8.014646895154806e-14
    Min relative error = 0.0
    Avg relative error = 1.716737407758425e-14
    Med relative error = 1.131501762186489e-14
    Var relative error = 2.963335086848574e-28
    6.329 percent of the values maintain no longer as a lot as 14 digits of precision
    :param x: (int) or (fade with the fade with the scoot) vitality of e to mediate
    :return: (fade with the fade with the scoot) approximation of e^x
    """
    express(-709 <= x <= 709)
    if x == 0:
        return 1
    x0 = np.abs(x)
    l = 0.6931471805599453
    okay = math.ceil((x0 / l) - 0.5)
    p = 1 << k
    r = x0 - (k l)
    # lagrange is an instance of the LagrangeInterpolation class
    Pn = lagrange.interpolate(r) p
    return Pn if x > 0 else 1 / Pn

class LagrangeInterpolation {
    std::vector xi;
    std::vector yi;
public:
    LagrangeInterpolation(std::vector x_points, std::vector y_points) {
        xi = x_points;
        yi = y_points;
    }
    double interpolate(double x) {
        /Calculates the Lagrange interpolant of a feature f parameterized by n determined
         x values and corresponding f(x) values. Stores the level data, then approximates
         the price of f at x by calculating the Lagrange foundation polynomials with admire
         to x and taking their linear aggregate with admire to f(x_0), ..., f(x_n).
         Args:
         - x (double fade with the fade with the scoot) level at which to approximate f
         Returns:
         - (double fade with the fade with the scoot) approximation of f(x)
         */
        // Initialize the foundation as a vector of 1s with the similar length as the n factors
        std::vector foundation(xi.dimension(), 1);
        double val = 0;
        // Double for loop is O(n^2), where n is the sequence of determined x factors
        for (int i = 0; i < xi.dimension(); i++) {
            for (int j = 0; j < xi.dimension(); j++) {
                // If the i index is equal to the j index, skip this
                // We're running thru this loop to calculate
                // l_i(x) = (x - x_0) / (x_i - x_0) ... (x - x_n) / (x_i - x_n)
                if (i == j) {
                    continue;
                }
                foundation[i] *= (x - xi[j]) / (xi[i] - xi[j]);
            }
        }
        // Return the linear aggregate of the foundation and f(x_0), ..., f(x_n) values
        for (int i = 0; i < foundation.dimension(); i++) {
            val += (foundation[i] yi[i]);
        }
        return val;
    }
};

double lagrange_exp(double x) {
    /Approximates f(x) = e^x for any x on the interval [-709, 709]
     the consume of Lagrange interpolation.
     If x < -709 or x > 709, raises an assertion error because of
     double precision underflow/overflow limits. Performs a
     fluctuate reduction step by making consume of the identity e^x = e^r 2^okay
     for |r| <= 1/2 log(2) and integer okay. The Lagrange interpolant
     approximates the price of e^r, then the consequence is multiplied
     by 2^alright to assemble the last approximation for e^x.
     Efficiency: We sample 15 determined factors of e^x and their values to manufacture
     a Lagrange foundation consisting of 15 polynomials. Since the Lagrange foundation is
     computed for every approximation f(x), we maintain got a flooring of
     - 15^2 multiplications
     - 15^2 divisions
     - 2 15^2 subtractions
     for approximately 4 15^2 operations overall. The linear aggregate step provides
     one other 15 additions and 15 multiplications by running thru the length of
     the foundation once extra, and the interpolation correct additional entails
     - 1 absolute price
     - 1 rounding
     - 1 division
     - 2 subtraction
     - 1 left shift
     - 2 multiplications
     for one other 8 operations. Therefore there are (4 15^2) + 38 = 938 operations
     complete.
     Accuracy: Over a sample of 10,000 equi-spaced factors in
     [-709, 709], we maintain got the following error statistics:
     Max relative error: 8.00308e-14
     Min relative error: 0
     Avg relative error: 0
     Med relative error: 2.26284e-14
     Var relative error: 0
     6.44 percent of the values maintain no longer as a lot as 14 digits of precision.
     Args:
     - (double fade with the fade with the scoot) vitality of e to approximate
     Returns:
     - (double fade with the fade with the scoot) approximation of e^x
     */
    express(-709 <= x && x <= 709);
    if (x == 0) {
        return 1;
    }
    double x0 = abs(x);
    int okay = ceil((x0 / M_LN2) - 0.5);
    double r = x0 - (okay M_LN2);
    // lagrange is an occasion of the LagrangeInterpolation class
    double pn = lagrange.interpolate(r);
    pn = pn exp2(okay);
    if (x < 0) {
        return 1 / pn;
    }
    return pn;
}

First elevate into fable the error distribution of the Lagrange interpolation class versus the benchmark feature over $[-1, 1]$.

Over the interval $[-1, 1]$, the Lagrange interpolant sampled over 15 factors handiest achieves 13 digits of precision.

Right here is clearly no longer as goal as the non-diminished Taylor sequence approximation, which used to be ready to invent 16 digits of precision in every single place on $[-1, 1]$. But when we situation the relative error distribution of this implementation, we uncover that it’s remarkably goal like that of the diminished Taylor sequence approximation.

Lagrange interpolation of $e^x$ over $[-709, 709]$ with fluctuate reduction is considerable like diminished Taylor sequence approximation.

The accuracy characteristics are extremely identical, for the rationale that vary reduction step dominates any idiosyncrasies of the Lagrange interpolant. We restful maintain a worst-case precision of 13 fundamental digits, and all but approximately 6.5% of the values even maintain 14 fundamental digits of precision or extra. But this algorithm is set 18.8 cases much less atmosphere pleasant than the diminished Taylor sequence approximation scheme, because of calculating the Lagrange foundation has O($n^2$) time complexity and can't be memoized. Since we must recalculate the Lagrange foundation for every enter $x$, there are over 900 operations required to approximate $e^x$ anyplace on the interval.

Error diagnosis

Let $P_n$ be the interpolant of $e^x$ with degree $n$ on the interval $[a, b]$, and let $epsilon_n$ be absolutely the error of $P_n$ and $e^x$. Then by the Lagrange the relaxation theorem, we maintain got the identity

$$
e^x = P_n(x) + epsilon_n(x)
$$

where the error term is given by

$$
epsilon_n(x) = frac{e^{(n + 1)c}}{(n + 1)!}(x - x_0)(x - x_1)ldots(x - x_n)
$$

for some $c in [a, b]$. Demonstrate the similarity with the error term in Taylor sequence approximation. Every maintain $frac{e^{(n + 1)c}}{(n + 1)!}$ as a component. But absolutely the error of the Taylor sequence approximation multiplies this term by $(x - a)^{n + 1}$, where $a$ is the center of expansion. The Lagrange absolute error is moderately diversified, because of it as an different multiplies that component by

$$
prod_{i = 0}^{n}(x - x_i)
$$

Right here is to express that the details of the Taylor sequence error is implicit within the center of expansion. Whereas for Lagrange interpolation, it depends on the $n$ + 1 determined factors $x_0, x_1, ldots, x_n$. We are able to simplify this error in a identical sort as the Taylor sequence error, because of $e^x$ is its hold spinoff.

$$
epsilon_n(x) = e^c frac{(x - x_0)(x - x_1)ldots(x - x_n)}{(n + 1)!}
$$

Likewise we can once extra exploit the truth that $e^x$ is an rising feature and $x in [a, b]$ to assemble the inequality

$$
epsilon_n(x) leq e^x frac{(x - x_0)(x - x_1)ldots(x - x_n)}{(n + 1)!}
$$

which permits us to sure the relative error at $x$ love so:

$$
eta_n(x) = left| frac{epsilon_n(x)}{e^x}goal| leq left|frac{(x - x_0)(x - x_1)ldots(x - x_n)}{(n + 1)!}goal|
$$

Clearly we would truly like the most relative error on the complete interval $[a, b]$ to be as cramped as that you may possibly perhaps possibly moreover think about. We are able to peek that $(x - x_0)(x - x_1)ldots(x - x_n)$ is itself a factorized polynomial and that the determined factors $x_0, x_1, ldots, x_n$ are its roots. Thus the topic of decreasing the relative error is goal like the topic of picking roots which nick the local maxima. As it turns out, an equi-spaced sequence of issues $x_0, x_1, ldots, x_n$ is no longer the optimum different for minimizing the relative error over the complete interval. This leads us to a in actuality fundamental plight of Lagrange interpolation when the factors are chosen in any such manner.

Runge's phenomenon

When the $n$ + 1 determined factors $x_0, x_1, ldots, x_n$ sampled for interpolation are equi-spaced, the interpolation is at danger of numerical instability. Functions whose Taylor sequence derive no longer maintain a limiteless radius of convergence will jabber extra and extra handsome error perturbations under interpolation as the degree $n$ of the interpolant will increase. In diversified phrases, say a feature $f$ is infinitely differentiable at some $a$. There exists an interval $|a - c| leq x$ such that $f$ totally converges at any $c$ within that interval. If this interval is limitless, then $f$ has a limiteless radius of convergence and interpolants of $f$ will converge to $f$ uniformly no topic their degree. But if the interval is finite, there'll be a degree at which interpolants of $f$ in actuality change into extra and extra unsuitable within particular subintervals. Right here is is known as Runge’s phenomenon.

7: Closely connected is the Gibbs phenomenon within the approximation of functions the consume of Fourier sequence.

Fortunately this area does no longer have an effect on approximations of $e^x$, because of $e^x$ totally converges in every single place. Therefore $e^x$ would possibly also be interpolated with a polynomial of arbitrarily excessive degree with out struggling from numerical instability (despite the indisputable truth that in insist there is no succor to the consume of extra than a degree 15 polynomial in double precision). But as an instance this area extra clearly, we can eye at Lagrange interpolation of the sine feature. Right here is a situation of the degree 10 Lagrange interpolant of sine in opposition to its goal price on the interval $[-1, 1]$.

The blue dotted line indicates the goal price of $sin(x)$, and the shadowy dots denote the sampling factors for interpolation. The worth returned by the interpolant is depicted in red.

Now eye at what happens when we amplify the degree of the interpolant to 60.

The interpolant shows noticeable deviation from the goal price.

We are able to appear that the price returned by the interpolant has started to noticeably deviate from the goal price of $sin(x)$ advance the boundaries of $[-1, 1]$. Right here is despite the truth that this interpolant is of a elevated degree than the fundamental one. If we zoom in on $[-1, -0.9]$ we can look this habits extra clearly.

The interpolant is amazingly noisy on the boundary.

This illustrates a in actuality fundamental level of warning when working with Lagrange interpolation: on the complete, approximations of a feature over an interval derive no longer uniformly strengthen in every single place on that interval when the sampling factors of interpolation are equi-spaced on the interval. Search what happens when we push the degree of interpolation as a lot as 100.

Rising the degree of the interpolant has (catastrophically) elevated absolutely the error on the tails of approximation.

Absolutely the error of the interpolant on the boundaries of the interval continues to amplify commensurate with the degree of the interpolant. Now the error visually dominates the goal price of the sine feature, such that we can’t distinguish it from a flat line.

The approximation is off by many orders of magnitude advance the boundaries of interpolation.

Absolutely the error of the degree 100 interpolant is so excessive that it’s off from the goal price of $sin(x)$ by over 1,000,000,000! The counterintuitive consequence's that approximation results are no longer necessarily improved by elevated degree interpolants. In actual fact, they're going to moreover change into (vastly) worse.

Barycentric algorithm

We are able to leverage a few properties of the Lagrange foundation polynomials to develop the interpolation algorithm extra atmosphere pleasant. In specific, we desire to acquire a arrangement to interpolate arbitrary values of $f(x)$ which doesn’t require us to recalculate the foundation for every $x$. It would be better if we could pre-calculate the foundation excellent once, because of it’s an O($n^2$) operation. To acquire this additional, say we desired to acquire the sum of all Lagrange foundation polynomials, irrespective of the price of $x$. Then we would calculate

$$
sum_{i = 0}^{n} ell_i(x) = ell_0(x) + ell_1(x) + ldots + ell_n(x)
$$

With out lack of generality, we can signify this as a sum with coefficients of 1 at every term:

$$
1ell_0(x) + 1ell_1(x) + ldots + 1ell_n(x)
$$

Then by the definition of Lagrange interpolation, right here's goal like

$$
f(x_0)ell_0(x) + f(x_1)ell_1(x) + ldots + f(x_n)ell_n(x)
$$

where $f(x_0) = f(x_1) = ldots = f(x_n) =$ 1. In diversified phrases, the sum of Lagrange foundation polynomials for arbitrary $x$ is given by interpolating the fixed feature $f(x) = 1$. Therefore, letting $f(x) = 1$, we maintain got

$$
1 = f(x) = sum_{i = 0}^{n}f(x_i)ell_i(x) = sum_{i = 0}^{n}1ell_i(x) = sum_{i = 0}^{n}ell_i(x)
$$

from which we derive that the sum of Lagrange foundation polynomials is always equal to 1. Now we’ll decompose every foundation polynomial. We present an explanation for a series of weights $lambda_i$ such that

$$
lambda_i = frac{1}{prod_{i neq j}^{n}(x_i - x_j)}
$$

which is to express that every weight $lambda_i$ is equal to the product within the denominator of $ell_i(x)$, and does no longer count upon the price of $x$. Subsequent we peek that the product within the numerator is the similar for the numerator of every foundation polynomial $ell_j(x)$, save for $(x - x_j)$. Therefore we present an explanation for a handy shorthand for this numerator

$$
mathcal{L}(x) = (x - x_0)(x - x_1) ldots (x - x_n) = prod_{i = 0}^{n}(x - x_i)
$$

Now we can signify every foundation polynomial in decomposed develop

$$
ell_i(x) = frac{mathcal{L}(x)lambda_i}{x - x_i}
$$

which offers us the recent Lagrange interpolation identity

$$
P_n(x) = sum_{i = 0}^{n}frac{mathcal{L}(x)lambda_i}{x - x_i}f(x_i) = mathcal{L}(x)sum_{i = 0}^{n}frac{lambda_i}{x - x_i}f(x_i)
$$

for the rationale that summation of $mathcal{L}(x)$ does no longer count upon $i$. Extra display veil that

$$
sum_{i = 0}^{n}ell_i(x) = sum_{i = 0}^{n}frac{mathcal{L}(x)lambda_i}{x - x_i} = mathcal{L}(x)sum_{i = 0}^{n}frac{lambda_i}{x - x_i}
$$

Since we’ve already proven the sum of all Lagrange foundation polynomials is equal to 1, we can divide the recent Lagrange interpolation identity by the sum of foundation polynomials with out altering the consequence. Doing so permits us to raze $mathcal{L}(x)$. Then we assemble the Barycentric interpolation algorithm

$$
open{aligned}
P_n(x) &= mathcal{L}(x)sum_{i = 0}^{n}frac{lambda_i}{x - x_i}f(x_i) bigg/ 1 \
&= mathcal{L}(x)sum_{i = 0}^{n}frac{lambda_i}{x - x_i}f(x_i) bigg/ sum_{i = 0}^{n}ell_i(x) \
&= mathcal{L}(x)sum_{i = 0}^{n}frac{lambda_i}{x - x_i}f(x_i) bigg/ mathcal{L}(x)sum_{i = 0}^{n}frac{lambda_i}{x - x_i} \
&= sum_{i = 0}^{n}frac{lambda_i f(x_i)}{x - x_i} bigg/ sum_{i = 0}^{n}frac{lambda_i}{x - x_i}
discontinue{aligned}
$$

Let’s elevate a peek at an implementation of this formula.

class BarycentricInterpolation:
    """
    Interpolates a feature the consume of the Barycentric algorithm. __init__
    precalculates the weights, which takes O(n^2) time within the number
    n + 1 of determined interpolation factors. The weights are the similar for
    all values of x, so right here's calculated once then saved. The interpolate
    member feature uses the weights and yi = f(x_i) values to approximate
    the feature. With the weights pre-calculated, every evaluate of x
    takes handiest O(n) time. As a cramped complexity tradeoff, the values
    of f(x_i) multiplied by the weights are also pre-calculated because of
    they derive no longer count upon the enter x.
    """
    def __init__(self, x_points, y_points):
        self.xi = x_points
        self.yi = y_points
        self.weights = []
        self.wi = []
        self.n = len(self.xi)
        for j in fluctuate(self.n):
            w = 1
            for okay in fluctuate(self.n):
                if okay == j:
                    continue
                w *= (self.xi[j] - self.xi[k])
            self.weights.append(1 / w)
        for i in fluctuate(self.n):
            self.wi.append(self.weights[i] self.yi[i])
    def interpolate(self, x):
        """
        Interpolates f(x) the consume of the Barycentric algorithm with the n + 1
        determined factors x_0, x_1, ..., x_n, the pre-calculated weights
        lambda_0, lambda_1, ..., lambda_n, and the pre-calculated phrases
        f(x_0)lambda_0, f(x_1)lambda_1, ..., f(x_n)lambda_n
        Efficiency: With the weights lambda_i and f(x_i)lambda_i
        pre-calculated, we maintain got:
        - n + 1 subtractions
        - 2 (n + 1) additions
        - 2 (n + 1) + 1 divisions
        which is 5(n + 1) + 1 operations complete. Assuming we interpolate
        the consume of 15 factors, which is equal to 81 operations.
        :param x: (double fade with the fade with the scoot) level of f to interpolate
        :return: (double fade with the fade with the scoot) approximation of f(x)
        """
        # We elevate into fable the last numerator and denominator one at a time, and
        # within the open they equal 0.
        p_num = 0
        p_den = 0
        # Accomplish one loop thru every of the n + 1 factors x_0, ..., x_n
        for j in fluctuate(self.n):
            # Now we maintain a particular case that permits us to eject out of the
            # loop, when the within the intervening time regarded as x_j is equal to the
            # enter x. Then we merely return the corresponding f(x_j).
            if self.xi[j] == x:
                return self.yi[j]
            # If x_j is no longer equal to the enter x, we save the price of
            # x - x_j, which capacity price is needed twice.
            d = x - self.xi[j]
            # Add the price of f(x_j)lambda_j / (x - x_j) to the numerator
            p_num += self.wi[j] / d
            # Add the price of lambda_j / (x - x_j) to the denominator
            p_den += self.weights[j] / d
        # Now we excellent return the numerator over the denominator
        return p_num / p_den

class BarycentricInterpolation {
    std::vector weights;
    std::vector xi;
    std::vector yi;
    std::vector wi;
public:
    BarycentricInterpolation(std::vector x_points, std::vector y_points) {
        /Interpolates a feature the consume of the Barycentric algorithm. __init__
         precalculates the weights, which takes O(n^2) time within the number
         n + 1 of determined interpolation factors. The weights are the similar for
         all values of x, so right here's calculated once then saved. The interpolate
         member feature uses the weights and yi = f(x_i) values to approximate
         the feature. With the weights pre-calculated, every evaluate of x
         takes handiest O(n) time. As a cramped complexity tradeoff, the values
         of f(x_i) multiplied by the weights are also pre-calculated because of
         they derive no longer count upon the enter x.
         */
        xi = x_points;
        yi = y_points;
        for (int j = 0; j < x_points.dimension(); j++) {
            double w = 1;
            for (int okay = 0; okay < x_points.dimension(); okay++) {
                if (okay == j) {
                    continue;
                }
                w *= x_points[j] - x_points[k];
            }
            weights.push_back(1 / w);
        }
        for (int i = 0; i < x_points.dimension(); i++) {
            wi.push_back(weights[i] y_points[i]);
        }
    }
    double interpolate(double x) {
        /Interpolates f(x) the consume of the Barycentric algorithm with the n + 1
         determined factors x_0, x_1, ..., x_n, the pre-calculated weights
         lambda_0, lambda_1, ..., lambda_n, and the pre-calculated phrases
         f(x_0)lambda_0, f(x_1)lambda_1, ..., f(x_n)lambda_n
         Efficiency: With the weights lambda_i and f(x_i)lambda_i
         pre-calculated, we maintain got:
         - n + 1 subtractions
         - 2 (n + 1) additions
         - 2 (n + 1) + 1 divisions
         which is 5(n + 1) + 1 operations complete. Assuming we interpolate
         the consume of 15 factors, which is equal to 81 operations.
         Args:
         - (double fade with the fade with the scoot) level of f to interpolate
         Returns:
         - (double fade with the fade with the scoot) approximation of f(x)
         */
         //We elevate into fable the last numerator and denominator one at a time, and
         // within the open they equal 0.
        double p_num = 0;
        double p_den = 0;
        // Accomplish one loop thru every of the n + 1 factors x_0, ..., x_n
        for (int j = 0; j < xi.dimension(); j++) {
            // Now we maintain a particular case that permits us to eject out of the
            // loop, when the within the intervening time regarded as x_j is equal to the
            // enter x. Then we merely return the corresponding f(x_j).
            if (xi[j] == x) {
                return yi[j];
            }
            // If x_j is no longer equal to the enter x, we save the price of
            // x - x_j, which capacity price is needed twice.
            double d = x - xi[j];
            // Add the price of f(x_j)lambda_j / (x - x_j) to the numerator
            p_num += wi[j] / d;
            // Add the price of lambda_j / (x - x_j) to the denominator
            p_den += weights[j] / d;
        }
        // Now return the numerator over the denominator
        return (p_num / p_den);
    }
};

This algorithm is a fundamental efficiency improvement. Whereas the outdated implementation of Lagrange interpolation took O($n^2$) time complexity on every evaluate of $x$, this one requires handiest O($n$) complexity. We're ready to pre-calculate no longer handiest the weights $lambda_0, lambda_1, ldots, lambda_n$ upon initialization, but in addition the values $f(x_0)lambda_0, f(x_1)lambda_1, ldots, f(x_n)lambda_n$. Assuming we interpolate the consume of 15 factors, we peek a tenfold reduction in operation depend from the fundamental implementation. The accuracy characteristics are in actuality considerably better when put next to the outdated algorithm, so this implementation is precisely better overall. Grasp into consideration the error distribution of this algorithm over $[-1, 1]$.

The peaks of the error distribution are visually identical when put next to naive Lagrange interpolation, but this implementation is de facto extra goal.

Visually the error distribution is the similar. Alternatively the Barycentric algorithm achieves 14 digits of precision nearly in every single place over $[-1, 1]$, which is an improvement by a few chubby digit of precision. As with the outdated implementations, including the fluctuate reduction step in actuality compresses the error distribution such that it’s extra goal in every single place, including $[-1, 1]$. Right here is the error distribution of our lagrange_exp feature with the Barycentric interpolation algorithm historical for approximating $e^r$ in exclaim of the fashioned Lagrange interpolation methodology.

So we glance that it shows the similar relative error distributions as the outdated implementations when fluctuate reduction is historical. Subsequent we can elevate into fable superior methods of level different that are better than an equi-spaced sequence of values $x_0, x_1, ldots, x_n$.

Chebyshev interpolation

We’ve seen that the unfamiliar $n$ degree interpolant $P_n$ of a feature $f$ is a vector within the rental $mathbb{F}[x]$. Furthermore, there are diversified ways of representing $P_n$ reckoning on the chosen foundation of $mathbb{F}[x]$. Grasp that the error of the Lagrange interpolant depends on the (factorized) polynomial

$$
(x - x_0)(x - x_1)ldots(x - x_n)
$$

If we nick the extrema of this polynomial by picking acceptable roots $x_0, x_1, ldots, x_n$, we likewise nick the error of the interpolant $P_n$ versus $f$. This coincides with the topic of picking optimum interpolation factors $x_0, x_1, ldots, x_n$, and figuring out a polynomial with such optimum roots reduces to a specific differential equation. The solutions to that equation are known as the Chebyshev polynomials, and the respective roots of every polynomial are known as the Chebyshev nodes.

8: Technically there are two polynomial sequence continuously referred to with the term “Chebyshev.” The Chebyshev polynomials depicted right here are namely the Chebyshev polynomials “of the fundamental kind.” There are also Chebyshev polynomials “of the 2nd kind” which we won’t consume.

The foremost and 2nd Chebyshev polynomials are defined as $T_0(x) = 1$ and $T_1(x) = x$, respectively. Extra Chebyshev polynomials would possibly also be sure the consume of the recurrence relation

$$
T_n(x) = 2xT_{n - 1}(x) - T_{n - 2}(x)
$$

Furthermore, the $n^{textual convey material{th}}$ Chebyshev polynomial shall be characterized by the identity

$$
T_n(cos theta) = cos(n theta)
$$

To preserve up away from Runge’s phenomenon, we can originate Lagrange interpolation the consume of the roots of these polynomials as the sequence of interpolation factors. But we would possibly moreover also without prolong consume the Chebyshev polynomials as a series for interpolation.

Orthogonal foundation

Orthogonality is a generalization of perpendicularity in elevated dimensions. Whereas perpendicularity happens when all vectors meet at goal angles, orthogonality is defined the consume of the inner product. In turn, an inner product offers a rigorous conception of the distance and perspective of two vectors with admire to at least one one other. A vector rental geared up with the additional structure of an inner product is known as an inner product rental. Two vectors $x, y$ are orthogonal if and handiest if their inner product is $langle x, y rangle$ = 0. Orthogonality also implies linear independence, from which it follows that an orthogonal situation of $n$ vectors comprises an orthogonal foundation of the $n$-dimensional vector rental. We are able to jabber that any situation of $n$ Chebyshev polynomials is orthogonal, and is thus a foundation situation. Let

$$
{T_0(x), T_1(x), ldots, T_n(x)}
$$

be a situation of Chebyshev polynomials. By definition, these are vectors within the rental $mathbb{F}[x]$ of all polynomials with degree no longer as a lot as or equal to $n$. There are a complete bunch that you may possibly perhaps possibly moreover think about inner products, and a situation of vectors would possibly also be orthogonal with admire to at least one but no longer orthogonal with admire to at least one other. Regarded as one of the commonest inner products is the $L^2$ inner product, defined on an interval $[a, b]$:

$$
langle f, g rangle = int_{a}^{b} f(x)g(x)dx
$$

Chebyshev polynomials are no longer orthogonal with admire to the $L^2$ inner product. Alternatively, we can present an explanation for a recent inner product from any diversified inner product by scaling it. So given any weight $w$, we maintain got the recent inner product

$$
langle f, g rangle = int_{a}^{b} f(x)g(x)w(x)dx
$$

In this case, we can jabber that the Chebyshev polynomials are orthogonal on the interval $[-1, 1]$ with admire to the burden

$$
w(x) = frac{1}{sqrt{1 - x^2}}
$$

To teach this, we must jabber

$$
int_{-1}^{1}T_{j}(x)T_{okay}(x)frac{1}{sqrt{1 - x^2}}dx = open{cases}
0, & j = okay \
a, & j neq okay
discontinue{cases}
$$

The utilization of the identity $T_n(cos theta) = cos(n theta)$, we shall be conscious a switch of variable to assemble a recent sure integral with the burden implicit:

$$
langle T_j, T_k rangle = int_{0}^{pi}cos(j x)cos(okay x)dx
$$

Now say $j = okay =$ 0. By making consume of the trigonometric identity $cos(0 x) = cos(0) = 1$, we maintain got

$$
int_{0}^{pi}cos(j x)cos(okay x)dx = int_{0}^{pi}1 cdot 1dx = pi
$$

Subsequent say $j = okay neq$ 0. Change $nx = mx = u$, and $s = 2u$, $ds = 2du$. The utilization of the trigonometric identity $sin(pi) = 0$ with these variable substitutions, we maintain got

$$
int_{0}^{pi} cos^2(u)du = int_{0}^{pi} left(frac{1}{2}cos(2u) + frac{1}{2}goal)du = frac{pi}{2}
$$

In the break say $j neq okay$. Change $u = x - frac{pi}{2}$ and $du = dx$. Then we maintain got

$$
int_{0}^{pi}cos(j x)cos(okay x) dx = int_{-frac{pi}{2}}^{frac{pi}{2}}cosleft(jleft(u + frac{pi}{2}goal)goal)cosleft(okayleft(u + frac{pi}{2}goal)goal)du
$$

Provided that $f(x) = cosleft(jleft(u + frac{pi}{2}goal)goal)cosleft(okayleft(u + frac{pi}{2}goal)goal)$ is an odd feature and the interval $left[-frac{pi}{2}, frac{pi}{2} right]$ is symmetric about the origin, we maintain got

$$
int_{-frac{pi}{2}}^{frac{pi}{2}}f(x)dx = 0
$$

Therefore we glance the Chebyshev polynomials satisfy orthogonality with admire to the burden $frac{1}{sqrt{1 - x^2}}$ because of

$$
langle T_j, T_k rangle = int_{-1}^{1}T_{j}(x)T_{okay}(x)frac{1}{sqrt{1 - x^2}}dx = open{cases}
0, & j neq okay \
pi, & j = okay = 0 \
frac{pi}{2}, & j = okay neq 0
discontinue{cases}
$$

Thus the Chebyshev polynomials comprise an orthogonal foundation for functions which satisfy particular continuity constraints on $[-1, 1]$. This fluctuate would possibly moreover seem cramped, but offered that we’re on the complete performing fluctuate reduction into a smaller interval than $[-1, 1]$ anyway, it doesn’t present considerable of an obstacle. It’s extra fundamental for us to jabber that candidate functions for Chebyshev interpolation maintain unfamiliar representations as linear mixtures of the Chebyshev foundation polynomials. The orthogonality of the Chebyshev polynomials also permits us to effectively compute the scalar coefficients of these representations.

Lipschitz continuity

A feature $f$ is known as Lipschitz continuous, or merely Lipschitz, in a web online page $[a, b]$ if there exists a fixed $L$ such that

$$
|f(x) - f(y)| leq L|x - y|
$$

for any $x, y in [a, b]$. Then $L$ is known as the Lipschitz fixed of $f$. Acknowledged extra it appears that evidently, a feature which is Lipschitz continuous on a particular domain has a strict sure on how quickly it would possibly perhaps switch within that domain. The feature is globally Lipschitz if it’s Lipschitz continuous on the domain $(-infty, infty) = mathbb{R}$, and domestically Lipschitz if it’s Lipschitz continuous on a bounded subset $[a, b] subset mathbb{R}$. Functions that are Lipschitz continuous on $[-1, 1]$ are fundamental because of they exist in a Sobolev rental spanned by an orthogonal foundation consisting of Chebyshev polynomials.

9: Sobolev areas are continuously historical in numerical diagnosis and approximation thought. For extra recordsdata, check with these notes.

Therefore these functions maintain a unfamiliar illustration as a Chebyshev sequence

$$
f(x) = sum_{okay = 0}^{infty}a_kT_k(x) = a_0 + a_1x + a_2T_2(x) + ldots
$$

where, by the orthogonality of the polynomials, every coefficient $a_k$ is defined

$$
a_k = frac{2}{pi}int_{-1}^{1}frac{f(x)T_k(x)}{sqrt{1 - x^2}}dx
$$

with the exception of for the fundamental coefficient $a_0$, which has the component $frac{1}{pi}$ in exclaim of $frac{2}{pi}$.

10: A proof of these details is given in Trefethen’s Approximation Principle and Approximation Exclaim, Chapter 3, Theorem 3.1.

In spite of the truth that $e^x$ is analytic, it is no longer globally Lipschitz. Right here is because of $e^x$ is an rising feature and the price of $e^x$ is divergent and grows arbitrarily as $x$ approaches $infty$. Alternatively, it is domestically Lipschitz on the domain $[-1, 1]$.

11: In actual fact, $e^x$ is Lipschitz on the domain $[-infty, 1]$.

On the diversified hand, the sine feature is globally Lipschitz because of its first spinoff, the cosine feature, is bounded: $cos(x) leq |1|$ for all $x in mathbb{R}$.

Uniform norm

Whereas an inner product defines a conception of vector distance and perspective, a norm defined on a rental formalizes the conception of vector length, and is denoted by $||cdot||$. Regarded as one of the ways we can rigorously sure absolutely the error of an approximation is by minimizing the norm of the error. As an instance, the uniform norm, or sup norm, of a feature $f$ on an interval $[a, b]$ is defined as

$$
||f||_infty = max{|f(x)|}, quad x in [a, b]
$$

Then for any feature $f$ defined on $[a, b]$ with approximation $F$, the norm of absolutely the error is defined

$$
||epsilon||_infty = max{|f(x) - F(x)|}, quad x in [a, b]
$$

Therefore minimizing the uniform norm of absolutely the error minimizes the most absolute error. For any feature $f$ defined on $[a, b]$, there exists a unfamiliar polynomial $P_n$ which is the “most productive” approximation of $f$, within the sense that the norm of $|P_n(x) - f(x)|$ is the minimum that you may possibly perhaps possibly moreover think about for a polynomial with degree $n$ on the domain. Then $P_n$ is known as the minimax polynomial of $f$. Minimax polynomials are delicate to compute and derive no longer maintain a uncomplicated illustration. But Chebyshev interpolants are very shut to most productive approximation polynomials while being considerable more uncomplicated to worship.

Implementation

Our first implementation will approximate $e^x$ the consume of our familiar fluctuate reduction methodology with the identity $e^x = e^r cdot 2^okay$. We are able to first approximate $e^r$ on the interval $[-1, 1]$ by figuring out the coordinate vector of $e^r$ with admire to the Chebyshev polynomial foundation in dimension 14. Then we can elevate the linear aggregate of that coordinate vector and the Chebyshev foundation, and multiply the consequence by $2^okay$. Since every coefficient $a_k$ is a undeniable integral on $[-1, 1]$, the coefficients derive no longer count upon the price of $x$ and we can pre-calculate them. Therefore we calculate every coefficient

12: For a concrete instance of easy learn how to set that in Mathematica, look this blog put up pertaining to Chebyshev approximation of the sine feature.

$$
a_k = frac{2}{pi}int_{-1}^{1} frac{e^x T_{okay}(x)}{sqrt{1 - x^2}}dx
$$

for $0 leq okay leq n$, and acquire that the coefficients are

1.266065877752008335598244625214717537923,
1.130318207984970054415392055219726613610,
0.2714953395340765623657051399899818507081,
0.04433684984866380495257149525979922986386,
0.00547424044209373265027616843118645948703,
0.000542926311913943750362147810307554678760,
0.00004497732295429514665469032811091269841937,
3.198436462401990505863872976602295688795e-6,
1.992124806672795725961064384805589035648e-7,
1.103677172551734432616996091335324170860e-8,
5.50589607967374725047142040200552692791e-10,
2.497956616984982522712010934218766985311e-11,
1.039152230678570050499634672423840849837e-12,
3.991263356414401512887720401532162026594e-14

This offers us what we must write the leisure of our implementation. We’ll be conscious the popular fluctuate reduction methodology and consume Chebyshev approximation to mediate $e^r$. But we’ll also elevate into fable the error distribution of the non-diminished Chebyshev approximation over $[-1, 1]$.

def cheb_series_exp(x):
    """
    Approximates f(x) = e^x for any x on the interval [-709, 709] the consume of
    Chebyshev interpolation within the Chebyshev foundation with degree 13.
    If x < -709 or x > 709, raises an assertion error because of
    double precision underflow/overflow limits. Performs a
    fluctuate reduction step by making consume of the identity e^x = e^r 2^okay
    for |r| <= 1/2 log(2) and integer okay. The Chebyshev interpolant
    approximates the price of e^r, then the consequence is multiplied
    by 2^alright to assemble the last approximation for e^x.
    Efficiency: In the worst case we maintain got 65 operations:
    - 34 multiplications
    - 13 subtractions
    - 12 additions
    - 2 divisions
    - 1 rounding
    - 1 left shift
    - 1 absolute price
    Accuracy: Over a sample of 10,000 equi-spaced factors in
    [-709, 709], we maintain got the following error statistics.
    - Max relative error = 8.133024023260273e-14
    - Min relative error = 0.0
    - Avg relative error = 1.7128494413598806e-14
    - Med relative error = 1.127445546004485e-14
    - Var relative error = 2.938702723948832e-28
    - 6.31 percent of the values maintain no longer as a lot as 14 digits of precision
    :param x: (int) or (fade with the fade with the scoot) vitality of e to mediate
    :return: (fade with the fade with the scoot) approximation of e^x
    """
    express(-709 <= x <= 709)
    if x == 0:
        return 1
    # Scalar coefficients of e^x illustration as linear aggregate
    # of the Chebyshev polynomial foundation
    coeffs = [
        1.266065877752008335598244625214717537923,
        1.130318207984970054415392055219726613610,
        0.2714953395340765623657051399899818507081,
        0.04433684984866380495257149525979922986386,
        0.00547424044209373265027616843118645948703,
        0.000542926311913943750362147810307554678760,
        0.00004497732295429514665469032811091269841937,
        3.198436462401990505863872976602295688795e-6,
        1.992124806672795725961064384805589035648e-7,
        1.103677172551734432616996091335324170860e-8,
        5.50589607967374725047142040200552692791e-10,
        2.497956616984982522712010934218766985311e-11,
        1.039152230678570050499634672423840849837e-12,
        3.991263356414401512887720401532162026594e-14
    ]
    x0 = np.abs(x)
    l = 0.6931471805599453
    okay = math.ceil((x0 / l) - 0.5)
    r = x0 - (okay l)
    # We would like the fundamental two Chebyshev polynomials to
    # resolve the others
    ti, tj = 1, r
    # Exhaust the fundamental two Chebyshev coefficients to
    # initialize the linear aggregate sum
    p = coeffs[0] + (coeffs[1] r)
    # Iterate thru the coefficients once
    # We put into effect the Chebyshev polynomial recurrence
    # relation iteratively because of or no longer it is extra stable
    for i in fluctuate(2, 14):
        tk = (2 r tj) - ti
        p += coeffs[i] tk
        ti, tj = tj, tk
    # Lend a hand out of the fluctuate reduction the consume of e^x = e^r 2^okay
    p *= 1 << okay
    if x < 0:
        p = 1 / p
    return p

double cheb_series_exp(double x) {
    /Approximates f(x) = e^x for any x on the interval [-709, 709] the consume of
     Chebyshev interpolation within the Chebyshev foundation with degree 13.
     If x < -709 or x > 709, raises an assertion error because of
     double precision underflow/overflow limits. Performs a
     fluctuate reduction step by making consume of the identity e^x = e^r 2^okay
     for |r| <= 1/2 log(2) and integer okay. The Chebyshev interpolant
     approximates the price of e^r, then the consequence is multiplied
     by 2^alright to assemble the last approximation for e^x.
     Efficiency: In the worst case we maintain got 65 operations:
     - 34 multiplications
     - 13 subtractions
     - 12 additions
     - 2 divisions
     - 1 rounding
     - 1 left shift
     - 1 absolute price
     Accuracy: Over a sample of 10,000 equi-spaced factors in
     [-709, 709], we maintain got the following error statistics.
     - Max relative error: 8.13302e-14
     - Min relative error: 0
     - Avg relative error: 0
     - Med relative error: 2.25423e-14
     - Var relative error: 0
     - 6.31 percent of the values maintain no longer as a lot as 14 digits of precision.
     Args:
     - (double fade with the fade with the scoot) vitality of e to approximate
     Returns:
     - (double fade with the fade with the scoot) approximation of e^x
     */
    express(-709 <= x && x <= 709);
    if (x == 0) {
        return 1;
    }
    double x0 = abs(x);
    int okay = ceil((x0 / M_LN2) - 0.5);
    double r = x0 - (okay M_LN2);
    // Scalar coefficients of e^x illustration as linear
    // aggregate of Chebyshev polynomial foundation
    std::vector coeffs =  {
            1.266065877752008335598244625214717537923,
            1.130318207984970054415392055219726613610,
            0.2714953395340765623657051399899818507081,
            0.04433684984866380495257149525979922986386,
            0.00547424044209373265027616843118645948703,
            0.000542926311913943750362147810307554678760,
            0.00004497732295429514665469032811091269841937,
            3.198436462401990505863872976602295688795e-6,
            1.992124806672795725961064384805589035648e-7,
            1.103677172551734432616996091335324170860e-8,
            5.50589607967374725047142040200552692791e-10,
            2.497956616984982522712010934218766985311e-11,
            1.039152230678570050499634672423840849837e-12,
            3.991263356414401512887720401532162026594e-14
    };
    // We would like the fundamental two Chebyshev polynomials to
    // resolve the others
    double ti = 1;
    double tj = r;
    // Exhaust the fundamental two coefficients to initialize the
    // linear aggregate sum
    double p = coeffs[0] + (coeffs[1] r);
    // Iterate thru the coefficients once
    // We put into effect the Chebyshev polynomial recurrence
    // relation iteratively because of or no longer it is extra stable
    for (int i = 2; i < 14; i++) {
        double tk = (2 r tj) - ti;
        p += coeffs[i] tk;
        ti = tj;
        tj = tk;
    }
    // Lend a hand out of the fluctuate reduction the consume of e^x = e^r 2^okay
    p *= exp2(okay);
    if (x < 0) {
        p = 1 / p;
    }
    return p;
};

Plotting the relative error of the non-diminished implementation over $[-1, 1]$, we uncover that it’s vastly extra goal than Lagrange interpolation, including the consume of the Barycentric algorithm. We also look a particular symmetry which is no longer exhibited by the Lagrange interpolant; right here's anticipated by the Equioscillation Theorem since Chebyshev interpolants carefully approximate optimum interpolants of the similar degree.

The non-diminished Chebyshev sequence approximation achieves on the least 15 digits of precision in every single place on $[-1, 1]$, and has practically symmetrical peaks.

Then when we situation the relative error of the fluctuate-diminished implementation over $[-709, 709]$, we peek the similar relative error distribution as these of the outdated implementations.

The operation depend is fixed, love the Barycentric interpolation diagram, but about 25% smaller. We are able to additional nick the operation depend if we be conscious a switch of foundation from Chebyshev to monomial. Given the bases are the similar dimension and correspond to the similar vector rental, there exists an invertible linear map $T: P_{n} rightarrow P_{n}$ which transforms vectors represented within the Chebyshev foundation into their unfamiliar illustration with admire to the monomial foundation. This requires us to left multiply the coordinate vector $[a_0, a_1, ldots, a_n]$ by the switch of foundation matrix $[T]$, which is the matrix illustration of the map $T$. After we derive this with the coordinate vector $[a_0, a_1, ldots, a_n]$, we assemble

0.99999999999999999710,
1.0000000000000000304,
0.50000000000000176860,
0.16666666666666168651,
0.041666666666492764626,
0.0083333333335592763712,
0.0013888888951224622134,
0.00019841269432671705526,
0.000024801486520816039662,
2.7557622535824850520e-6,
2.7632293485109074120e-7,
2.4994303701340372837e-8,
2.0862193241800986288e-9

This permits us to put into effect the Chebyshev interpolation algorithm the consume of fewer operations.

def cheb_mono_exp(x):
    """
    Approximates f(x) = e^x for any x on the interval [-709, 709] the consume of
    Chebyshev interpolation within the monomial foundation with degree 13.
    If x < -709 or x > 709, raises an assertion error because of
    double precision underflow/overflow limits. Performs a
    fluctuate reduction step by making consume of the identity e^x = e^r 2^okay
    for |r| <= 1/2 log(2) and integer okay. The Chebyshev interpolant
    approximates the price of e^r, then the consequence is multiplied
    by 2^alright to assemble the last approximation for e^x.
    Efficiency: In the worst case we maintain got 31 operations.
    - 14 multiplications
    - 2 divisions
    - 12 additions
    - 2 subtractions
    - 1 rounding
    - 1 left shift
    - 1 absolute price
    Accuracy: Over a sample of 10,000 equi-spaced factors in
    [-709, 709], we maintain got the following error statistics.
    - Max relative error = 8.197045651378647e-14
    - Min relative error = 0.0
    - Avg relative error = 1.7408224033601725e-14
    - Med relative error = 1.149513869636177e-14
    - Var relative error = 3.005578974516822e-28
    - 6.41 percent of the values maintain no longer as a lot as 14 digits of precision
    :param x: (int) or (fade with the fade with the scoot) vitality of e to mediate
    :return: (fade with the fade with the scoot) approximation of e^x
    """
    express(-709 <= x <= 709)
    if x == 0:
        return 1
    poly = [0.99999999999999999710,
            1.0000000000000000304,
            0.50000000000000176860,
            0.16666666666666168651,
            0.041666666666492764626,
            0.0083333333335592763712,
            0.0013888888951224622134,
            0.00019841269432671705526,
            0.000024801486520816039662,
            2.7557622535824850520e-6,
            2.7632293485109074120e-7,
            2.4994303701340372837e-8,
            2.0862193241800986288e-9]
    x0 = np.abs(x)
    l = 0.6931471805599453
    okay = math.ceil((x0 / l) - 0.5)
    p = 1 << okay
    r = x0 - (okay l)
    Pn = poly[-1]
    for okay in fluctuate(len(poly) - 1, -1, -1):
        Pn = Pn r + poly[k]
    Pn *= p
    if x < 0:
        Pn = 1 / Pn
    return Pn

double cheb_mono_exp(double x) {
    /Approximates f(x) = e^x for any x on the interval [-709, 709] the consume of
     Chebyshev interpolation within the monomial foundation with degree 13.
     If x < -709 or x > 709, raises an assertion error because of
     double precision underflow/overflow limits. Performs a
     fluctuate reduction step by making consume of the identity e^x = e^r 2^okay
     for |r| <= 1/2 log(2) and integer okay. The Chebyshev interpolant
     approximates the price of e^r, then the consequence is multiplied
     by 2^alright to assemble the last approximation for e^x.
     Efficiency: In the worst case we maintain got 31 operations.
     - 14 multiplications
     - 2 divisions
     - 12 additions
     - 2 subtractions
     - 1 rounding
     - 1 left shift
     - 1 absolute price
     Accuracy: Over a sample of 10,000 equi-spaced factors in [-709, 709],
     we maintain got the following error statistics.
     - Max relative error = 8.01465e-14
     - Min relative error = 0
     - Avg relative error = 0
     - Med relative error = 2.27191e-14
     - Var relative error = 0
     - 6.38 percent of the values maintain no longer as a lot as 14 digits of precision.
     Args:
     - (double fade with the fade with the scoot) price of e^x to mediate
     Returns:
     - (double fade with the fade with the scoot) approximation of e^x
     */
    express(-709 <= x && x <= 709);
    if (x == 0) {
        return 1;
    }
    double x0 = abs(x);
    int okay = ceil((x0 / M_LN2) - 0.5);
    double r = x0 - (okay M_LN2);
    std::vector coeffs = {
            1.000000000000000,
            1.000000000000000,
            0.500000000000002,
            0.166666666666680,
            0.041666666666727,
            0.008333333333342,
            0.001388888888388,
            1.984126978734782e-4,
            2.480158866546844e-5,
            2.755734045527853e-6,
            2.755715675968011e-7,
            2.504861486483735e-8,
            2.088459690899721e-9,
            1.632461784798319e-10
    };
    double pn = 1.143364767943110e-11;
    for (auto i = coeffs.rbegin(); i != coeffs.rend(); i++) {
        pn = pn r + *i;
    }
    pn *= exp2(okay);
    if (x < 0) {
        return 1 / pn;
    }
    return pn;
}

This implementation has a fixed time complexity, with 31 operations required to approximate $e^x$ for any $x in [-709, 709]$. This makes it the most atmosphere pleasant algorithm to this level. On the diversified hand, its accuracy fits every diversified implementation the consume of fluctuate reduction. Particularly that it hits on the least 13 digits of precision in every single place on the interval, and on the least 14 digits of precision for approximately 94% of the sampled values. That being acknowledged, there are two concerns worth noting. First, the smoothness characteristics of $e^x$ are such that the Chebyshev and Taylor approximations are asymptotically identical, and extra importantly, practically identical in double precision. Right here is to express that we could put into effect the fluctuate diminished, truncated Taylor sequence approximation in an specific manner and it'd be indubitably goal like this one in appearance, effectivity and accuracy. Whereas right here's goal for $e^x$, it is miles no longer only for all functions.

2d, we can peek a (very) minor reduction in accuracy going from the algorithm the consume of the Chebyshev foundation to the algorithm the consume of the monomial foundation, in addition to a degradation of the equioscillation property. Grasp into consideration the error distribution over $[-1, 1]$ when we consume the monomial foundation in exclaim of the Chebyshev foundation.

The overall relative error is the similar, however the peaks maintain clustered together and are elevated than these of the interpolant under the Chebyshev foundation.

Right here is for the rationale that conditioning of the monomial foundation is worse than that of the Chebyshev foundation. For the feature $e^x$, the adaptation in conditioning is negligible. But for diversified functions, it'd be acceptable to explicitly investigate the adaptation in conditioning, because of there would possibly effectively be numerical penalties to altering the foundation love this. In the case of $e^x$, the loss in precision because of the the switch in foundation is so insignificant that it doesn’t meaningfully switch the accuracy characteristics of the recent algorithm.

Provided that $e^x$ is totally-behaved, let’s encourage Chebyshev interpolation additional by exhibiting how it performs with the sine feature. To envision this we’ll consume the Barycentric interpolation algorithm with factors $x_0, x_1, ldots, x_n$ sampled on the Chebyshev nodes. Grasp that these are the roots of the Chebyshev polynomials, and by the foregoing error diagnosis within the fragment on Lagrange interpolation, ought to approximately nick the error of the interpolant. First we can need a feature for computing the nodes.

import numpy as np

def cheb_nodes(n, a = -1, b = 1):
    nodes = np.zeros(n)
    for okay in fluctuate(n):
        nodes[k] = ((a + b) / 2) + (((b - a) / 2) math.cos((okay math.pi) / n))
    return nodes

xi = cheb_nodes(15)
yi = np.sin(2 np.pi xi)
bary_cheb = BarycentricInterpolation(xi, yi)

After we situation the interpolant bary_cheb in opposition to the goal price of $sin(x)$ for $x in [-1, 1]$ with 10 Chebyshev nodes, we glance very identical interpolation characteristics as when put next to the Lagrange interpolation with 10 equi-spaced factors.

By visual comparability, the degree 10 Chebyshev interpolation is moderately extra goal than the Lagrange interpolation with equi-spaced factors.

But in distinction to the Lagrange interpolation with equi-spaced factors, when we amplify the sequence of interpolation factors to 100 we derive no longer peek any catastrophic amplify in error on the boundaries of the interval.

The interpolation does no longer degrade on the boundaries as the degree of the inteprolant will increase.

The Barycentric interpolation algorithm is no longer doing the heavy lifting right here. What’s happening is the Chebyshev nodes present for a better approximation advance the boundaries of the interval under consideration, because of jabber clustering. Whereas popular Lagrange interpolation samples equi-spaced factors, the Chebyshev nodes on an interval $[a, b]$ are naturally clustered closer to the perimeters. This offers tighter management over the approximation, which reduces both absolutely the and relative error of the interpolant at every price of $x$.

Carathéodory-Fejer approximation

Minimax approximation

Closing thoughts

References

Printed: March 8, 2020

Up so some distance: June 27, 2020

Role: Unfinished

Procure I made a mistake? If that is so, please elevate into fable contacting me so I will fix it. I would possibly also consist of you within the Acknowledgements in uncover for you.

Learn More

Most Popular

Jerry Sloan remembered by fellow coaches Don Nelson, Phil Jackson, Lenny Wilkens

4:34 PM ETJackie MacMullanESPN Senior Writer CloseAward-winning columnist and authorRecipient of Basketball Hall of Fame Curt Gowdy Media AwardJoined ESPNBoston.com in 2010Don Nelson, the winningest coach in NBA history, had just finished a stroll on the beach in his beloved Hawai'i on Friday when a reporter informed him that his coaching contemporary Jerry Sloan had…

Underground nanny economy, racism, exploitation amplified by pandemic

Business Insider talked to 16 nannies and some parents to see how questionable working conditions have changed, in many ways for worse, since March. The nanny industry is an unregulated "gray market" that predominantly employs women of color who aren't citizens. Because there's so little oversight and so few protections, nannies have little recourse in dealing…

The covid-19 pandemic is forcing a rethink in macroeconomics

Editor’s note: Some of our covid-19 coverage is free for readers of The Economist Today, our daily newsletter. For more stories and our pandemic tracker, see our hubIN THE FORM it is known today, macroeconomics began in 1936 with the publication of John Maynard Keynes’s “The General Theory of Employment, Interest and Money”. Its subsequent…

Nvidia apologizes and removes ‘Did it work’ meme tweet

Nvidia today apologized for publishing a tweet Tuesday which depicts a graphics card with scantily clad legs. The tweet imitating the “Did it work” meme was sent from Nvidia’s GeForce Twitter account, which has over 1 million followers. The tweet was errantly posted and quickly removed, an Nvidia spokesperson told VentureBeat. “We apologize unreservedly. This…