A guide to linear regression for environmental sciences

The purpose of this document is to provide practical material to guide researchers in the geosciences, and in atmospheric science in particular, regarding the use of multiple linear regression. The statistical world is full of texts than can be useful for linear regression. It can be difficult to find a text that works well, because everyone comes to the subject from a different starting point. It may be helpful to understand that statisticians are mathematicians. Often, textbooks and papers can be about proving theorems for sets of data which have particular properties. If that doesn't sound like a useful starting point for you, looking for statistical books with examples, or designed for business majors may be more helpful.

This front page is divided into two sections. The first section considers the topic of linear regression at a higher level, providing background information that you will need to guide a number of decisions that you will need to make. The second section is more prescriptive in nature, providing more of a linear recipe on what you will need to do to get your regression model results as quickly and painlessly as possible (but without sacrificing statistical rigor). In many cases the two sections will link to the same underlying material, but we have presented the material in these two different ways in the hope that we can serve the needs of the widest possible audience.

Understanding your data
Understanding the driving physics/chemistry
Developing the statistical model
Testing the assumptions
Interpreting the output

If you are looking for a more pedagogical, step-by-step approach to applying a regression model to your data, follow the steps below. The goal of the following section is to provide a 'fast track' to help you to get the job done, with secondary information being provided for non-standard situations or for when you need to know more. Much of the recipe below is required for correctly determining the uncertainties on the regression model fit coefficients.

Step 1: Construct your regression model
Step 2: Fit the basis functions to the data
Step 3: Derive regression model fit coefficients
Step 4: Calculate the autocorrelation in the residuals
Step 5: Recalculate the uncertainties on your data
Step 6: Transform your data and basis functions to account for autocorrelation
Step 7: Repeat the regression with the transformed data and basis functions
Step 8: Extract the regression coefficients and their uncertainties

and that's it. You're done. Not too difficult was it?

Actually, there is an alternative method for estimating uncertainties on the regression model fit parameters based on 'bootstrapping'. Click here for more information on that.