Lesson 9-II: Exponential Least Squares

In Lesson 9-I, we used Excel's built-in model fitting to find exponential models for given data. But what's Excel really doing?

Exponential Least Squares

As in Lesson 2, Excel isn't just guessing. Maybe, like in Lesson 2, Excel is minimizing the sum of squares? Let's see!

Recall the Raleigh population data:

years since 1900 Raleigh
0 13643
10 19218
20 24418
30 37379
40 46879
50 65679
60 93931
70 122830
80 150255

We expect (because this is a growing population) that these data should basically fit a function of the form

P(t)=P_0e^{rt}

Like we did in Lesson 2, we could decide that the first data point (for the year 1900), is somehow especially correct, and just declare that P_0=13643. That means we'd only need to figure out what  r is. But there's really no reason to think the census in the year 1900 was any more correct, or that nothing weird happened in 1900 to make the population lower or higher than it "should" have been otherwise.

As in Lesson 2, we'll consider the sum of squares quantity. Use P_k to denote the actual measured population corresponding to  t_k, and P(t_k) the predicted population by the model P(t)=13643e^{rt}. Then the sum of squares should be:

 S = \sum_k \left(P_k-P(t_k)\right)^2=\sum_k \left(P_k-P_0e^{rt_k}\right)^2

which for us works out to:

    \begin{align*}S =& \left(13643-P_0e^{0r}\right)^2+\left(19218-P_0e^{10r}\right)^2+\left(24418-P_0e^{20r}\right)^2\\&+\left(37379-P_0e^{30r}\right)^2+\left(46879-P_0e^{40r}\right)^2+\left(65679-P_0e^{50r}\right)^2\\&+\left(93931-P_0e^{60r}\right)^2+\left(122830-P_0e^{70r}\right)^2 +\left(150255-P_0e^{80r}\right)^2\end{align*}

Observe that this is just a function of P_0 and r, so we can minimize using the two-variable optimization technique from Lesson 4.

a screencast of this Lesson

First, we enter S(P_0,r). To do this, let's tell Maple about our model in general:

Then we can use this to enter S(P_0,r):

Let's verify that that's correct:

Now we need to solve the system

\begin{cases}\frac{\partial S}{\partial P_0}=0\\\frac{\partial S}{\partial r}=0\end{cases}

or in Maple-speak:

[>solve({diff(S(P0,r),P0)=0,diff(S(P0,r),r)=0})

which returns a mess too great to even screenshot.

We're interested in a numerical value, so let's try that again with an evalf:

There are two solutions. The first one has P_0=0, which isn't reasonable -- then we'd have  P(t)=0. It also has r given as a complex number (as indicated by the capital I). So probably the first solution isn't what we're interested in. The second solution is

P_0=15789.60744,\ \ \ r=0.02862048031

which means a model of

P(t)=15789.607e^{0.02862 t}

(after a bit of rounding). Recall from Lesson 9-I that the model Excel fitted for these data was

P(t)=13696e^{.0307t}

as reported by Trendline, and

P(t)=13969.0577 e^{.0307t}

as reported by the log-trick. To summarize in table form:

method trendline log trick least squares actual data
model P(t)=13696e^{.0307t} P(t)=13969.0577 e^{.0307t} P(t)=15789.607e^{0.02862 t} -
growth rate 3.07% 3.07% 2.862% -
initial population 13696 13969 15789 13643

Here's a plot of all three models, together with the given data. The least-squares model is in red; Trendline's model is in blue; the log-trick model is in green.

This plot was created in Maple; the code is:

[>with(plots)
[>leastsquares(t):= 15789.60744*exp(0.02862048031*t)
[>trendline(t):=13696*exp(0.0307*t)
[>logtrick(t):=13969*exp(0.0307*t)
[>A := plot({leastsquares(t), logtrick(t), trendline(t)}, t = 0 .. 80)
[>B := pointplot([0, 10, 20, 30, 40, 50, 60, 70, 80], [13643, 19218, 24418, 37379, 46879, 65679, 93931, 122830, 150255])
[>plots[display]({A, B})

We can see that, whatever Excel is doing, it's not using the least-squares method to compute its exponential-of-best fit.

Some Other Least Squares Fits

We have decent reason to think these data are growing exponentially, but at first glance we might imagine the curve that the data points trace is a parabola. Which parabola would fit best? Let's ask Maple!

A parabola is the graph of a quadratic function

 P(t)=at^2+bt+c

so there are now three parameters to fit. Let's update Maple on this:

Then the sum of squares is now a function of a,b,c:

so we need to solve a system involving three variables,

\begin{cases}\frac{\partial S}{\partial a}=0\\\frac{\partial S}{\partial b}=0\\\frac{\partial S}{\partial c}=0\end{cases}

which Maple can easily do:

The resulting model doesn't look too different from the least-squares model in the years 1900-1980 (the least squares model is in red; the parabolic model is in blue):

but you can see that they make very different predictions if we look out to the year 2020:

Takeaways/Deliverables

To say you've successfully completed this lesson, you should be able to do the following:

  1. Use Maple to find a least-squares exponential to fit given data.
  2. Interpret the parameters of an exponential fitting given data.
  3. Use Maple to find a least-squares quadratic to fit given data.
  4. Use Maple to plot your least-squares exponential and least-squares quadratic models on the same axes as the given data for comparison.