The Nature of the Dependence of the Expected Value of the Sample Maximum on Sample Size

San José State University Department of Economics

applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA

The Nature of the Dependence
of the Expected Value of
the Sample Maximum
on Sample Size

Let x be a stochastic variable with a probability density function p(x) and a cumulative probability distribution function of P(x). Thus the probability of an observation being less than or equal to x is P(x). This function P(x) necessarily has an inverse function P^-1(z). Note that P(½)=x_median. Let x_max be defined to be the lowest value of x such that P(x)=1. Likewise x_min is the largest x such that P(x)=0. Note that x_max and x_min may or may not be finite.
The probability density function for the sample maximum of a sample of size n, q_n(x) is given by the probability of getting (n-1) observations which are less than or equal to x and one that is exactly x. The one observation that is exactly x can occur at any one of n places in the sample. Thus the probability density is

q_n(x) = n[P(x)]^n-1p(x)

Note that

q_n(x)dx = n[P(x)]^n-1p(x)dx = d[P(x)ⁿ]

The expected value of the sample maximum is

M_n = ∫_-∞^∞ xq_n(x)dx

Let z=[P(x)]ⁿ so x=P^-1(z^1/n). Then changing the variable of integration in the above expression to z results in

M_n = ∫₀¹ P^-1(z^1/n)dz

When the distribution of x is bounded; i.e., P^-1(1)=x_max is finite; then the limit of M_n as n increases without bound is is this same finite value. Since M₁=x_mean the dependence of the expected value of the sample maximum on the sample size starts at x_mean for n=1 and asymptotically approaches x_max as n→∞. That case is relatively simple.
Now the dependence of M_n up on n when the distribution of x has no upper bound will be examined. For mathematical and typographic convenience let the above formula be expressed as

M(α) = ∫₀¹ R(z^α)dz = ∫₀¹ R(e^ln(z)α)dz

where α=1/n. Consider α as a continuous variable and determine

∂M/∂α = ∫₀¹ R'(z^α)ln(z)(z^α)dz

Consider

∂R(z^α)/∂z = R'(z^α)z^α-1
which is the same as
∂R(z^α)/∂z = R'(z^α)z^α/z
and thus
R'(z^α)z^α = (∂R(z^α)/∂z)z

This means that

∂M/∂α = ∫₀¹ (∂R/∂z)(zln(z))dz

Integration by parts may be applied to this formula to obtain

∂M/∂α = [R(z^α)zln(z)]₀¹ − ∫₀¹ Q(z^α)[1+ln(z)]dz
which reduces to
∂M/∂α = − ∫₀¹ Q(z^α)[1+ln(z)]dz − ∫₀¹ Q(z^α)ln(z)dz
and finally to
∂M/∂α = − M(α) − ∫₀¹ Q(z^α)ln(z)dz

Now consider n a continuous variable. For (∂M/∂n) note that

∂M/∂n = (∂M/∂α)(∂α/∂n) = (∂M/∂α)(−1/n²)
and thus
∂M/∂n = [M(α) + ∫₀¹ P^-1(z^1/n)ln(z)dz]/n²

In establishing the behavior of (∂M/∂n) as n→∞ it is reasonable to speculate that

lim_n→∞ [M(α) + ∫₀¹ P^-1(z^1/n)ln(z)dz]/n

would be some finite value, say β. Thus for large values of n

∂M/∂n = β/n
and thus
M(n) − M(1) = βln(n)

Note again that M(1) = M₁ = x_mean. Thus

E{x_max} = M_n ≅ x_mean + βln(n)

Below are shown the relationships between the expected sample maximum and the sample size and the logarithm of the sample size when the variable x is a standard normal distribution ( mean of zero and unit standard deviation).

The normal distribution has a finite variance. The case for a distribution with infinite variance might be quite different. Below are are relationships for the case of a Levy stable distribution with α=1.2, β=0.0, μ=0.0 and ν=1.0.

Clearly in this case the expected value of the sample maximum is more closely a linear function of sample size rather than being proportional to the logarithm of sample size.
Conclusions

The lesson from the analysis is that sample maximums (and minimums) can be very sensitive to the size of the sample. This applies to overt sampling but just as well applies to times series as samples. It is obvious that times series would exhibit apparent trends for the maximums and minimums but those apparent trends could be more significant than might be expected. It all depends upon the nature of the distribution of the underlying variable. It must be noted and emphasized that there may be trends in the maximum and minimum levels without there being any change in the mean value of the variable.
The normal distribution appears frequently in natural phenomena but there is evidence that the other Levy stable distribution also appears. For a case of rainfall statistics see San Jose rainfall.
The normal distribution is the only Levy stable distribution that has a finite variance. The expected value of the sample maximum and minimum would be extremely sensitive to sample size if the variance of the underlying variable is infinite. When unprecedented events occur such as the catastrophic rains that led to the dam failures in north central China in August of 1975 these may indicate the unbounded nature of the underlying variables.
(To be continued.)

HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins

qn(x) = n[P(x)]n-1p(x)

qn(x)dx = n[P(x)]n-1p(x)dx = d[P(x)n]

Mn = ∫-∞∞ xqn(x)dx

Mn = ∫01 P-1(z1/n)dz

M(α) = ∫01 R(zα)dz = ∫01 R(eln(z)α)dz

∂M/∂α = ∫01 R'(zα)ln(z)(zα)dz

∂R(zα)/∂z = R'(zα)zα-1 which is the same as ∂R(zα)/∂z = R'(zα)zα/z and thus R'(zα)zα = (∂R(zα)/∂z)z

∂M/∂α = ∫01 (∂R/∂z)(zln(z))dz

∂M/∂α = [R(zα)zln(z)]01 − ∫01 Q(zα)[1+ln(z)]dz which reduces to ∂M/∂α = − ∫01 Q(zα)[1+ln(z)]dz − ∫01 Q(zα)ln(z)dz and finally to ∂M/∂α = − M(α) − ∫01 Q(zα)ln(z)dz

∂M/∂n = (∂M/∂α)(∂α/∂n) = (∂M/∂α)(−1/n²) and thus ∂M/∂n = [M(α) + ∫01 P-1(z1/n)ln(z)dz]/n²

limn→∞ [M(α) + ∫01 P-1(z1/n)ln(z)dz]/n

∂M/∂n = β/n and thus M(n) − M(1) = βln(n)

E{xmax} = Mn ≅ xmean + βln(n)

Conclusions

q_n(x) = n[P(x)]^n-1p(x)

q_n(x)dx = n[P(x)]^n-1p(x)dx = d[P(x)ⁿ]

M_n = ∫_-∞^∞ xq_n(x)dx

M_n = ∫₀¹ P^-1(z^1/n)dz

M(α) = ∫₀¹ R(z^α)dz = ∫₀¹ R(e^ln(z)α)dz

∂M/∂α = ∫₀¹ R'(z^α)ln(z)(z^α)dz

∂R(z^α)/∂z = R'(z^α)z^α-1
which is the same as
∂R(z^α)/∂z = R'(z^α)z^α/z
and thus
R'(z^α)z^α = (∂R(z^α)/∂z)z

∂M/∂α = ∫₀¹ (∂R/∂z)(zln(z))dz

∂M/∂α = [R(z^α)zln(z)]₀¹ − ∫₀¹ Q(z^α)[1+ln(z)]dz
which reduces to
∂M/∂α = − ∫₀¹ Q(z^α)[1+ln(z)]dz − ∫₀¹ Q(z^α)ln(z)dz
and finally to
∂M/∂α = − M(α) − ∫₀¹ Q(z^α)ln(z)dz

∂M/∂n = (∂M/∂α)(∂α/∂n) = (∂M/∂α)(−1/n²)
and thus
∂M/∂n = [M(α) + ∫₀¹ P^-1(z^1/n)ln(z)dz]/n²

lim_n→∞ [M(α) + ∫₀¹ P^-1(z^1/n)ln(z)dz]/n

∂M/∂n = β/n
and thus
M(n) − M(1) = βln(n)

E{x_max} = M_n ≅ x_mean + βln(n)