applet-magic.com
Thayer Watkins
Silicon Valley
& Tornado Alley
USA

The Scaling of Sample Extremes
as a Function of Sample Size

For the sample mean the dispersion of the distribution is given by the rule


σn = σ1/√n
 

where n is the sample size and σn is the standard deviation of the sample mean for samples of size n. Thus as the sample size increases the distribution of sample means becomes less dispersed. The material below analyzes the dispersion of the sample maximum and minimum as a function of sample size. The anaysis for the two extremes is the same but for definiteness the maximum will be used.

Statistical Simulations

Consider the distribution of sample maximums for samples of a random variable uniformly distributed between -0.5 and +0.5. For n=1 the sample minimum is just the sample value.

Analysis

If p(x) is the probability density function for a random variable x, let P(x) be the cumulative probability function; i.e.,


P(x) = ∫−∞xp(z)dz.
 

The probability density that the maximum of a sample of size n is x is given by


n[P(x)]n-1p(x)
 

This is the probability density function q(x) for the sample maximum.

The quantity P(x) represents the probability that the random variable has a value less than or equal to x. The (n-1)-th power of P(x) is the probability that all but one value of the sample has a value less than or equal to x. The factor of n represents the fact that the maximum could occur for anyone of the n sample values.

Let Q(x) represent the cumulative probability function for the sample maximum. Since the derivative of the cumulative probability function is just the probability density function the above relation is


dQ(x)/dx = n[P(x)]n−1dP(x)/dx
which upon integration
from −∞ to x gives
Q(x) = [P(x)]n
 

If the probability density function is nonzero only over a finite range, say [xmin, xmax] then P(xmin)=0.0 and P(xmax=1.0. The cumulative probability function for the sample maximum will have the same range; i.e., Q(xmin)=0.0 and Q(xmax=1.0. The position of the median point for the sample distribution is closer and closer to xmax the larger the sample size and thus the probability density function is more and more concentrated close to xmax the larger the sample size. Let xmed(n) represent the median point for the probability distribution for the sample maximum for samples of size n. Then


xmed(n) is such that
Q(xmed(n)) = 0.5
but
P(xmed(n)) = (0.5)1/n
 

For example, P(xmed(10)) = 0.933 and P(xmed(100)) = 0.9931. Thus the dispersion of q(x) must decrease with the sample size.

To see the relation between dispersion as measured by standard deviation and sample size consider a simple probability density function p(x).


p(x) = 1/σ for −σ/2≤x≤+σ/2
p(x) = 0 for all other values of x
 

then P(x) = (x−(−σ/2))/σ=(x+σ/2)/σ for −σ/2≤x≤+σ/2, Thus the probability density function for the sample maximum is then given by:


q(x) = 0 for x≤−σ/2
q(x) = n[(x+σ/2)/σ]n-1(1/σ)
for −σ/2≤x≤+σ/2
q(x) = 0 for +σ≤x
 

Let z=(x+σ/2). Then the probability density function for z is


Q(z) = 0 for z<0,
Q(z) = nzn−1n for 0≤z≤σ
Q(z) = 0 for z>σ
 

The expected value of z, E{z}, is given by


0σzq(z)dz = ∫0σ(nznn)dz
= (n/σn)[zn+1/(n+1)]0σ
= (n/(n+1))σ.
 

Likewise the second moment, E{z²q(z)} is given by


0σzq(z)dz = (n/(n+2))σ²
 

The variance of z is given by


Var(z) = E{z²}−(E{z})²
= (n/(n+2))σ² − ((n/(n+1))σ)²
which reduces to
[n/((n+2)(n+1)²)]σ²
 

Finally the standard deviation of z for a sample of size n, σn reduces to


σn = σ/((n+1)(1+2/n)½)
 

The standard deviation for the probability density function p(x) is not the parameter σ. Instead


σ1 = σ/((2(3)½)
 

Thus


σn = σ1[(2√3)/((n+1)(1+2/n)½]
 

So in constrast to the case of the sample mean in which the dispersion of the sample mean is inversely proportional to √n, the dispersion of the sample maximum for the simple case being considered is inversely proportional to (n+1)(1+2/n)½.

Since z=x+σ/2 and E{z}=(n/(n+1))σ,


E{x,n} = (n/(n+1))σ −σ/2 = ((n-1)/n)(σ/2)
but the maximum for the distribution p(x) is σ/2 so
E{x,n} = ((n-1)/n)xmax
 

Thus the expected value of the sample maximum is an asymptotically unbiased estimate of the population maximum.

Sample Minimum

For the sample minimum the same relations apply. The only modification is that the appropriate cumulative probability function is defined as


P(x) = ∫x+∞p(z)dz
 


HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins