San José State University
Department of Economics

applet-magic.com
Thayer Watkins
Silicon Valley
& Tornado Alley
USA

The Nature of the Dependence
of the Expected Value of
the Sample Maximum
on Sample Size

Let x be a stochastic variable with a probability density function p(x) and a cumulative probability distribution function of P(x). Thus the probability of an observation being less than or equal to x is P(x). This function P(x) necessarily has an inverse function P-1(z). Note that P(½)=xmedian. Let xmax be defined to be the lowest value of x such that P(x)=1. Likewise xmin is the largest x such that P(x)=0. Note that xmax and xmin may or may not be finite.

The probability density function for the sample maximum of a sample of size n, qn(x) is given by the probability of getting (n-1) observations which are less than or equal to x and one that is exactly x. The one observation that is exactly x can occur at any one of n places in the sample. Thus the probability density is


qn(x) = n[P(x)]n-1p(x)
 

Note that


qn(x)dx = n[P(x)]n-1p(x)dx = d[P(x)n]
 

The expected value of the sample maximum is


Mn = ∫-∞ xqn(x)dx
 

Let z=[P(x)]n so x=P-1(z1/n). Then changing the variable of integration in the above expression to z results in


Mn = ∫01 P-1(z1/n)dz
 

When the distribution of x is bounded; i.e., P-1(1)=xmax is finite; then the limit of Mn as n increases without bound is is this same finite value. Since M1=xmean the dependence of the expected value of the sample maximum on the sample size starts at xmean for n=1 and asymptotically approaches xmax as n→∞. That case is relatively simple.

Now the dependence of Mn up on n when the distribution of x has no upper bound will be examined. For mathematical and typographic convenience let the above formula be expressed as


M(α) = ∫01 R(zα)dz = ∫01 R(eln(z)α)dz
 

where α=1/n. Consider α as a continuous variable and determine


∂M/∂α = ∫01 R'(zα)ln(z)(zα)dz
 

Consider


∂R(zα)/∂z = R'(zα)zα-1
which is the same as
∂R(zα)/∂z = R'(zα)zα/z
and thus
R'(zα)zα = (∂R(zα)/∂z)z
 

This means that


∂M/∂α = ∫01 (∂R/∂z)(zln(z))dz
 

Integration by parts may be applied to this formula to obtain


∂M/∂α = [R(zα)zln(z)]01 − ∫01 Q(zα)[1+ln(z)]dz
which reduces to
∂M/∂α = − ∫01 Q(zα)[1+ln(z)]dz − ∫01 Q(zα)ln(z)dz
and finally to
∂M/∂α = − M(α) − ∫01 Q(zα)ln(z)dz
 

Now consider n a continuous variable. For (∂M/∂n) note that


∂M/∂n = (∂M/∂α)(∂α/∂n) = (∂M/∂α)(−1/n²)
and thus
∂M/∂n = [M(α) + ∫01 P-1(z1/n)ln(z)dz]/n²
 

In establishing the behavior of (∂M/∂n) as n→∞ it is reasonable to speculate that


limn→∞ [M(α) + ∫01 P-1(z1/n)ln(z)dz]/n
 

would be some finite value, say β. Thus for large values of n


∂M/∂n = β/n
and thus
M(n) − M(1) = βln(n)
 

Note again that M(1) = M1 = xmean. Thus


E{xmax} = Mn ≅ xmean + βln(n)
 

Below are shown the relationships between the expected sample maximum and the sample size and the logarithm of the sample size when the variable x is a standard normal distribution ( mean of zero and unit standard deviation).


The normal distribution has a finite variance. The case for a distribution with infinite variance might be quite different. Below are are relationships for the case of a Levy stable distribution with α=1.2, β=0.0, μ=0.0 and ν=1.0.


Clearly in this case the expected value of the sample maximum is more closely a linear function of sample size rather than being proportional to the logarithm of sample size.

Conclusions

The lesson from the analysis is that sample maximums (and minimums) can be very sensitive to the size of the sample. This applies to overt sampling but just as well applies to times series as samples. It is obvious that times series would exhibit apparent trends for the maximums and minimums but those apparent trends could be more significant than might be expected. It all depends upon the nature of the distribution of the underlying variable. It must be noted and emphasized that there may be trends in the maximum and minimum levels without there being any change in the mean value of the variable.

The normal distribution appears frequently in natural phenomena but there is evidence that the other Levy stable distribution also appears. For a case of rainfall statistics see San Jose rainfall.

The normal distribution is the only Levy stable distribution that has a finite variance. The expected value of the sample maximum and minimum would be extremely sensitive to sample size if the variance of the underlying variable is infinite. When unprecedented events occur such as the catastrophic rains that led to the dam failures in north central China in August of 1975 these may indicate the unbounded nature of the underlying variables.

(To be continued.)


HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins