San José State University

applet-magic.com
Thayer Watkins
Silicon Valley
& Tornado Alley
USA

The Inappropriateness of Applying Normal Statistics
to Weather Phenomena in Assessing Normal Variations

Normal Variable Statistics and the Assessment of Variation

One of the big issues in the policy debate concerning the impact of human activities on climate is how assess what is the extent of variation in weather conditions for the existing climate. Everyone accepts that weather is variable and even if there is no change in the climate there will be surprises, even disasterous surprises from time to time. The question is how far out of line can some event be and still be within the normal variation of the weather.

The usual approach is to compile a record of the past values of the condition in question and compute the mean and standard deviation. Those statistics are then used to compute how many standard deviation units away from the mean is the condition in question. Assuming a normal bell-shaped distribution the probability of getting an event that far or farther from the mean can be computed. If that probability is very, very low then the usual conclusion is that the event in question was not due to normal variation but the result of some change in climate; i.e., the underlying distribution of the weather variables.

There is often good reason to assume normal bell-shaped distributions because of the Central Limit Theorem. The Central Limit Theorem is a powerful, marvelous mathematical result that says that sums of statistically independent random variables tend toward normal distributions. The larger the number of independent variables the closer the distribution of the sum is to a normal distribution. The Central Limit Theorem is applied to random sampling. Usually if the sample size is 30 or greater it is presumed that the sample means can safely be presumed to have a normal distribution. A normal distribution is completely charactized by two parameters: the mean and standard deviation. Once those two parameters are known then the probabilities are completely given. Normal statistics, meaning standard statistics, is founded on the normal bell-shaped distribution. The point that is made here is that such normal statistics cannot be safely applied to weather condition statistics.

The Central Limit Theorem (CLT) was discovered in the 19th century after normal distributions were found in empirical investigations. Statisticians thought the CLT was of universal application. By the 20th century mathematicians were discovering limitations on its applicability. There was a hidden assumption. The independent variables involved in a sum had to be of finite variance.

The Stable Distributions

One significant property of normal distribution variables is that the sums of such variables also has a normal distribution. The French mathematician Paul Lévy investigated this property. He defined a stable distribution as being one such that if two variables have a stable distribution then their sum will also have a stable distribution. He found that there is a family of stable distributions characterized by four parameters. One parameter represents the central tendency of the distribution. For a normal distribution this is the mean value. Another represents the dispersiveness of the distribution. For a normal distribution the dispersiveness parameter is equal to its standard deviation. Another parameter represents skewness. Normal distributions are symmetrical about the means and their skewness is equal to zero. The fourth parameter is one that represents the shape of the distribution. It is usually called the α parameter. For normal distributions this shape parameter α is equal to 2. For all other stable distributions α is less than 2. The graph below shows the influence of α on the distribution.

The Fat-Tailed Distributions

A decade or so before Paul Lévy's theoretical work, an economist named Wesley Claire Mitchell discovered that rates of return in stock markets did not truly have normal distributions even though the distributions looked more or less like normal distributions.

Because a higher proportion of the probability was in the tails of the distribution compared with the case of the normal distribution such distributions were called fat-tailed distributions. They were also given a name based upon Greek, leptokurtic.

As seen above, the leptokurtic distributions deviated from normal distributions not only in be fat-tailed but also being more peaked. This means the leptokurtic distributions not only had more extreme deviations from the mean but also more cases of small deviations from the mean. They differ from the normal distributions in their having fewer moderate deviations from the mean. The fact that there are an excess of small deviations would tend to lead observers to underestimate the volatility of such variables; at least until a very large deviation comes along.

The leptokurtic distributions are just special cases of the Levy stable distributions. Furthermore, there is a generalization of the Central Limit Theorem that says that the sum of a large number of independent random variables will have a stable distribution. Thus if some phenomenon such as changes in stock prices or rain from a storm is the result of a large number of independent influences then it would be expected that the distribution would be a stable distribution but not necessarily a normal distribution.

If a distribution is fat-tailed then that fact would account for the unexpected extreme cases and consequently in large changes in variables, the sort of occurrences associated with catastrophes.

An Application of Stable Distributions

The notion of Lévy stable distributions was applied to the rainfall statistics for the San Jose, California area. The resulting estimates of the parameters for the stable distributions for the monthly statistics are found at San Jose Rainfall and Parameter Estimates. San Jose does not ordinarily have extreme events but in September of 1918 there was one such. September is normally a low rainfall month. The rainy season does not normally start until November. The estimated values of the parameter α for most months were in the range 1.7 to 1.9, but for September the value of α was 1.1. September is therefore a prime candidate for an extreme weather event.

Extreme Weather Events

In 1918 the major industry of what is now called the Silicon Valley was drying plums to make prunes. The Santa Clara Valley was the prune capital of the world. In September the plums were laid out in wooden flats to dry in the sunshine. On September 11th through 13th there was six inches of rain. The whole prune crop for the year was ruined.

Although that September deluge was a disaster for the San Jose area it could not compare to the disaster that occurred in north central China in August of 1975. There were two major dams and about sixty smaller dams built on the river systems of north central China. The two major dams were each built to handle a maximum of about a half meter of rainfall over a three day period. In early August a typhoon moved over China in the south and traveled north to where its warm, humid air encountered the cold air of the north. The very first day, August 5th, each the area received a half meter of rainfall and then the storm continued raining for another 13 hours the second day and 16 hours the third day. Because of planning errors and operational policy errors the dams could not hold the water or even pass it through. Instead the dams collapsed along with about sixty of the smaller dams on the river system. It was a colossal disaster in which about eighty five thousand people were killed outright and about eleven million people severely affected.

Clearly a major source of the disaster was the underestimate of the chances of a severe storm. The major dams were built to withstand storms with probabilities of only 1/500 and 1/1000 per year. These were drastic underestimates based upon a limited record and theoretical distributions which did not take into account the probabilities of storms of severity not yet seen. The stable distributions allow the shape of the distribution from the past record to provide information about the probabilities for storms of such severity that they have not yet occurred in the record.

The Standard Deviation as a Measure of Variation

Normal statistics uses the standard deviation as the measure of variability and this is appropriate when the distributions involved are normal bell-shaped ones. For the non-normal stable distributions the standard deviation is infinite. The standard deviation from a finite sample never settles down to single value and is basically meaningless. The dispersion parameter for a non-normal stable distribution is finite but it is not the same as the standard deviation. Therefore it is meaningless to evaluate an unusual event in terms of the number of standard deviation units away from the mean.

It should be noted that there are many perfectly legitimate probability distributions for which the standard deviation is infinite. The standard deviation for a distribution is finite only if the probability goes to zero faster than 1/x² where x is the deviation from the mean value.

Sample Means

Where the Central Limit Theorem applies the mean of a random sample is an unbiased estimate of the population mean. For a sufficiently large sample size the distribution of the sample mean can be take to be a normal distribution. The standard deviation of the normal distribution of the sample means can be estimated from the sample values.

If the data has a non-normal stable distribution the sample means will also have a non-normal stable distribution and that will hold true no matter how large the sample size. For this case the sample standard deviation is meaningless.

Conclusions

Some weather/climate statistics have been shown to have non-normal stable distributions. Therefore it can never be safely assumed than any arbitrary statistics will have a normal distribution. This applies to sample means as well as the original variables. Therefore statistical tests based upon the standard deviation cannot be validly applied. This means that statistical tests purporting to show that some observation is beyond the normal variation are unlikely to be valid.

(To be continued.)


HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins