Fitting Basis Function Time Series To Data

A statistical formula contains much of the information that the scientist believes is relevant for explaining the data. For instance, if boundary layer temperature is affected by surface aerosols and one is interested in the behavior of temperature other than that which is affected by aerosols, then one can use aerosols as a basis function to explain some of the observed variability.

A common example: Removing seasonality
A common feature that one often wants to fit as a basis function is seasonality or diurnal effects. These can often add considerably to the range and variability observed in a time series, but may not be the feature that one wants to isolate. If this is the case, correctly accounting for seasonality and diurnal effects can allow other effects appear more clear. If, as is common with surface temperatures, one wants to analyze monthly averaged temperature values, one of the largest sources of variability is the seasonal cycle. Two methods exist for removing the seasonal cycle:

Calculate the average January temperature and subtract this value from all January monthly mean values to arrive at a time series of anomalies. The advantage of this method is that it is easy to explain it to colleagues and does not introduce any long-term bias. A problem can occur if the time series for calculation is very short, for instance in the first ten years of a new network. The "monthly average" method can be biased in a manner that is not geophysical in nature, for instance an unusual few years can result in very jagged seasonal cycle when one examines the average January, average February, average March values. In this case, using the "monthly average" method can allow too much of the observed variability to be assigned to the seasonal cycle. The result can cause true effects, such as the impact of land use change on local temperatures to be obscured.

If one understands that the time series is rather short (less than ten or twenty years), and also understands that the seaonal cycle should be smooth, then it may be better to fit the data with some smooth functions. Often four sines and cosines work well to fit most seasonal cycles. This technique was first noted in papers by Greg Reinsel and has been adopted my many other researchers for its usefulness. By using only eight well-defined functions, the seasonal cycle is forced to be smooth and therefore likely does not assign too much of the observed variability to the seasonal cycle.

Physical Meaning of Basis Functions
Care must be taken to allow basis functions represent the best of the our understanding of the data. It is always possible for circular thinking to slide into statistical analysis. For instance, low pressure systems can, in some situations, allow for a build up pollution, at the same time can be associated with higher temperatures and higher humidity. Which of these correlated factors should be used as a basis function? The answer depends on the scientific inquiry. Is one looking for the influence of temperature on pollution levels? or the influence of temperature, once pressure has been accounted for, on pollution levels? Separate questions dicatate separate statistical approaches.

Are the factors linear?
Many basis functions represent what we understand about the physics and chemistry of our system. But just because something may serve as an explanatory variable does not mean that the statistical formula is easy to write. Is tropospheric ozone a function of temperature or is it a function of temperature **2? If one believes the relationship is linear, then one can set up the statistical relationship as ozone = alpha * temperature + Noise. However, if the true relationship is not linear, but may behave as temperature **2, the results will be very different and not enough of the variability will be removed that can be removed by a more appropriate function representation of the basis function.

Are the factors additive?
A researcher may identify a few factors that affect their data, for instance both temperature and chlorine affect polar ozone levels. If one sets up a statistical fit of the form: ozone = alpha * temperature + beta * chlorine + Noise, then one is only allowing for additive effects of temperature and chlorine on ozone. If one does not believe that the relationship is additive, then the statistical formulation is not justified and will likely give meaningless results.

A great strength of statistics is that it can help determine, from data, whether various factors exist and whether they behave in an additive way or linear way. Many tests exist that can help tease these relationships from the data. The important point is not to allow the statistics to work for us by allowing the statistics to contain all of the information that we understand about the physical system.

Subscription expired — please renew

Pro account upgrade has expired for this site and the site is now locked. If you are the master administrator for this site, please renew your subscription or delete your outstanding sites or stored files, so that your account fits in the free plan.