How the Use of Moving Averages Can Create the Appearance of Confirmation of Theories Where None Exists

San José State University

applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA

How the Use of Moving Averages
Can Create the Appearance
of Confirmation of Theories
Where None Exists

It is well known that the use of moving averages can create the appearance of cycles in data where none exists. The purpose of this page is to show how this phenomenon can result in the apparent but spurious confirmation of theories.
Suppose a theory concludes that variable x(t) and y(t) should be correlated. If both variables have a linear trend this would lead to the appearance of a correlation even if no causal relationship exists. If the two variables are expressed in different units then scales for the two variables can be chosen such that the trend lines for the two variables coincide. Although this would be visually impressive only the naive would believe it meant anything.
Now suppose that x(t) has random fluctuations about a linear trend. It is commonplace for researchers to compute a moving average for a variable to be able to discern the trend. This computation of a moving average is not an innocent and innocuous adjustment of the data. As stated above, the use of such data smoothing techniques as moving averages can create the appearance of cycles. For an analysis of such matters see Spectral Analysis. For an illustration of the phenomena consider the graph below.

The data for this graph was created by taking random values uniformly distributed between 0 and 1 and computing a four-period moving average and then a four-period moving average was computed from the moving averages. There are not perfect cycles but there is the appearance of some sort of cyclic pattern.
Now suppose the variable y(t) does have a cyclic pattern but the variable x(t) is a linear trend with random fluctuations about that linear trend. The identification of a significant correlation between x(t) and y(t) would require a correlation between the deviations of these two variable and their linear trends. Suppose that no such correlation exists but the original x data is subjected to the computation of a moving average of a moving average such as was illustrated previously. The result would be as shown below.

Now suppose this moving averaged data is plotted in the same graph with the cyclic y(t) variable as shown below.

While the correspondence is not perfect there is given the appearance of some imperfect correlation. The meaningless correspondence of both variables having a linear trend is made to seem meaningful by the appearance of cyclicalness in both variables.
Below is an illustration for two unrelated variables both having random fluctuations about a linear trend. Because they have different units the scales have been chosen to bring their trend lines into coincidence. The data for the variables although unrelated have been subjected to the same moving average of the moving averages. The result is the appearance that the data have corresponding cyclical behavior which sometimes coincides and at other times seems to be out of phase. It would be easy for observers of this graph to conclude that the variables are definitely related but that the relationship is a bit more complicated than a simple one-to-one pattern. In fact there is no relationship whatsoever between the variables. The appearance of a relationship comes solely from the variables both having a positive linear trend. The remaining appearance of a relationship comes solely from their having been subjected to the same smoothing operations.

The motivation for this webpage came from reading Kerry Emanuel's article in Nature of August 4, 2005 (Increasing destructiveness of tropical cyclones over the past 30 years) in which he purported to have confirmed his 1986 theoretical analysis that an increase in sea surface temperature would increase the intensity of hurricanes. Kerry Emanuel argued that a hurricane functioned like a heat engine and that an increase in the temperature at the bottom would result in more energy dissipation. A search for confirmation of the theory over a twenty year period failed. There was no statistical evidence of an increase in wind speed or in the number of hurricanes. The original content of the analysis was therefore not confirmed.
Kerry Emanuel convinced in his heart that his theory was correct tried a different ploy. He defined the power dissipation (PD) of a cyclone at the surface to be
PD = ∫₀^τ∫₀^RC_Dρ|V(r)|³(2πr)drdt

where r is the distance from the cyclone center, |V| is wind speed, ρ is air density at the surface, C_D is a surface drag coefficient reflecting the amount of vertical surface per unit horizontal surface, R is the outer radius of the cyclone and τ is the lifetime of the cyclone. This is a legitimate extension of his original analysis but it is a redefinition of the notion of the intensity of a hurricane. His implementation of this new formulation is a bit on the crude side. Since he does not have any empirical values for the surface drag coefficient C_D for sea surface versus land surface of different forms he assumes that C_D is the same for all different surfaces encountered by hurricanes and is therefore a constant that can be ignored. He does not have the distribution of wind speeds V(r) for all past hurricanes so he assumes that all wind speeds in a hurricane are proportional to the maximum wind speed and that the maximum wind speed observed is the actual maximum for the hurricane. For each hurricane he took the cube of the maximum wind speed as a measure of its potential destructiveness. (For a justification of the use of the cube of wind speed as the measure of power dissipation see Hurricane Power)
Kerry Emanuel did not have the spatial dimensions of past hurricanes so he assumed the outer radii of all hurricanes are the same. He also assumed the surface air density is the same through the lifetime of the hurricane and therefore could be ignored. He then integrated this wind speed cubed variable over the life of the hurricane. This means that his power dissipation for a hurricane is reduced to an index of the form
PDI = ∫₀^τ|V_max|³dt

He then added up the integrated wind speed cubed values for all of the hurricanes in each year and called it the Power Dissipation Index for the year. It is in the nature of a Potential Destruction Index. This was the y(t) variable. The x(t) variable, the causative agent, was the sea surface temperature (SST) in the tropical cyclone-spawning region. This variable showed an upward trend. Apparently the raw Power Dissipation Index y(t) did not show an impressive correlation with the x(t) variable, the sea surface temperature, so he computed a moving average of both variables with the scheme
z_i' = 0.25z_i-1 + 0.5z_i + 0.25z_i+1

Apparently those variables were not impressively correlated with each other so he computed a moving average of the moving average. That variable showed an apparent correlation with the sea surface temperature and Kerry Emanuel declared his theory confirmed.
It is plausible that the increased sea surface temperature manifested its effect on hurricanes in a combination of higher wind speed and longer duration. However Kerry Emanuel did not in his article give a justification for the double smoothing operation of taking a moving average of the moving average of his potential destruction index and how he came to choose the particular weights he used in the computation of the moving averages. This lack of transparency leaves his results open to question.
Here are two examples of what Kerry Emanuel's double smoothing operation would do to two variables which have random fluctuations about a linear trend.

Remember that the scales of two variables with positive linear trends can always be chosen so that the trend lines coincide.
The measured wind speed of hurricanes could be subject to a trend simply because in the early days of data collection for hurricanes it was difficult to measure wind speed. Observers would naturally tend to understate the wind speed. As better measurement techniques became available the higher wind speeds could be reported with confidence. Also in the early days the sample of wind speed measurements was smaller and the chance of catching the peak wind speeds was lower. In other words the observed sample maximum is a function of the sample size. So a trend in the number of wind measurements for a hurricane will tend to induce a trend in maximum observed wind speeds. For more on this point see Sample Maximums. Also with more comprehensive monitoring of potential hurricanes the identification of hurricanes earlier would have induced an upward trend in the duration of hurricanes. Thus the observed intensity and duration of hurricanes could have an increasing trend without their being any real trend in the hurricane phenomena.
Here are Kerry Emanuel's graphs.

The approximate coincidence of the lines is merely a result of scaling and is of no empirical significance. Kerry Emanuel notes this but casual viewers may not be conscious of this fact and take the approximate coincidence as impressive confirmation of his theory. The confirmation of the theory would come only from a correlation of the up and down movements of both variables. There are some lobes of the pseudo-cycles in which the variables go up together and down together. There are other where they do not. The width of the lobes are approximately the same for the two variables because they have been subjected the same double smoothing operation.
Because the time pattern in the Atlantic hurricane basin is not a linear trend the results look more impressive. But Kerry Emanuel is looking for the confirmation of a universal principle. It has to work in all basins. If it only works in one basin then the result is likely a fluke. Given the way the double smoothing operations distorts the analysis there is no telling what is involved.
The upswing at the end of the interval is possibly due to the impact of the El Niño event of 1998. This may have influenced the sea surface temperature and the number of hurricanes and typhoons. The double smoothing operation smudged the El Niño effects over a number of years at the end of the observation period.
The visual correlations in these graphs is no greater than the example of the correlation induced in purely random series by Kerry Emanuel's double smoothing operation. Thus Kerry Emanuel's graphs do not constitute confirmation of his theories.
It would have been helpful if Kerry Emanuel had displayed scatter diagrams of his power dissipation index versus the sea surface temperature. However that in itself would not have been decisive. Observe what the data for the doubly smoothed random variables with trends looks like.

This still looks like there is a relationship between the two variables. The only relationship comes from both having a trend and both being subjected to the double smoothing process. The true state of the relationship between the two variables is revealed only when the deviations for the trends are displayed, as below.

The conclusion is that Kerry Emanuel misled the general public as to the confirmation of his theory by several aspects of the form of its presentation. Unfortunately the general public treats any presentation by an ordained scientist as the gospel truth. Such is not the case. Kerry Emanuel's analysis in the late 1980's implying an increase in storm intensity, meaning wind speed, with an increase in sea surface temperature has not been confirmed. An extension of the notion of storm intensity to include duration is a legitimate area for empirical research but it is doubtful that storm duration can be tied to Kerry Emanuel's original analysis.
(To be continued.)

HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins

PD = ∫0τ∫0RCDρ|V(r)|³(2πr)drdt

PDI = ∫0τ|Vmax|³dt

zi' = 0.25zi-1 + 0.5zi + 0.25zi+1

PD = ∫₀^τ∫₀^RC_Dρ|V(r)|³(2πr)drdt

PDI = ∫₀^τ|V_max|³dt

z_i' = 0.25z_i-1 + 0.5z_i + 0.25z_i+1