Often time we measure some observable as a function of some independent variable, , and we have some theoretical function that we expect the observations to obey. The function may have parameters in it that we need to determine. The process of determining the best values of the parameters that fit the function to the given data is known as regression.
Consider the Arrhenius equation commonly used to describe the rate constant in chemistry,
- is the rate constant of the chemical reaction,
- is the pre-exponential factor
- is the activation energy,
- is the gas constant
- is the temperature.
If we measured as a function of we could determine and using regression.
Linear Least Squares Regression
Linear Least Squares Regression can be applied if we can write our function in polynomial form, . If we can do this, then we are looking for the coefficients of the polynomial, given observations . We can write this as
or, in matrix form,
Regression applies when . In other words, when we have many more equations than we have unknowns. In this case, the least-squares solution is written as
Let's illustrate linear least squares regression with our example above. First, we must rewrite the problem:
Note that this now looks like a polynomial, , where , , , and .
If we write these equations for each of our data points in matrix form we obtain
and the equations to be solved are
Note that since is a 2x matrix and is a x1 vector, we have a 2x2 system of equations to solve (see notes on matrix multiplication).
Let's consider another example, where we want to determine the value of the gravitational constant, , via regression. To do this, we could drop an object from a set height, at time and measure its height as a function of time. Neglecting air resistance, we have
where m/s2 is the gravitational constant. Assume that meters is known precisely, and that since we drop the ball from rest, m/s. The data we collect is given in the table below:
Let's go through this problem by hand:
- First, we rearrange our equation as follows:
- Write down the equations in matrix form, one equation for each of
the data points.
- Forming the normal equations, we have ,
- Using the rules of
matrix multiplication, we
can rewrite this as
- Substituting numbers in from our above table, we obtain g=-9.75 m/s2
Converting a Nonlinear Problem Into a Linear One
The R-Squared Value
The value is a common way to measure how well a predicted value matches the observed data. It is calculated as
where is a predicted value of , is the observed value, and is the average value of the observed value of .
The value is always less than unity, .
Nonlinear Least Squares Regression
The discussion and examples above focused on linear least squares regression, where the problem could be recast as a polynomial and we were solving for the coefficients of that polynomial. We showed an example of the Arrhenius equation which was originally nonlinear in the parameters but we rearranged it to be solved using linear regression.
Occasionally, it is not possible to reduce a problem to a linear one in order to perform regression. In such cases, we must use nonlinear regression, which involves solving a system of nonlinear equations for the unknown parameters.
Derivation of the Linear Least Squares Equations