The Scaling of Sample Extremes as a Function of Sample Size

applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA

The Scaling of Sample Extremes
as a Function of Sample Size

For the sample mean the dispersion of the distribution is given by the rule

σ_n = σ₁/√n

where n is the sample size and σ_n is the standard deviation of the sample mean for samples of size n. Thus as the sample size increases the distribution of sample means becomes less dispersed. The material below analyzes the dispersion of the sample maximum and minimum as a function of sample size. The anaysis for the two extremes is the same but for definiteness the maximum will be used.
Statistical Simulations

Consider the distribution of sample maximums for samples of a random variable uniformly distributed between -0.5 and +0.5. For n=1 the sample minimum is just the sample value.

Analysis

If p(x) is the probability density function for a random variable x, let P(x) be the cumulative probability function; i.e.,

P(x) = ∫_−∞^xp(z)dz.

The probability density that the maximum of a sample of size n is x is given by

n[P(x)]^n-1p(x)

This is the probability density function q(x) for the sample maximum.
The quantity P(x) represents the probability that the random variable has a value less than or equal to x. The (n-1)-th power of P(x) is the probability that all but one value of the sample has a value less than or equal to x. The factor of n represents the fact that the maximum could occur for anyone of the n sample values.
Let Q(x) represent the cumulative probability function for the sample maximum. Since the derivative of the cumulative probability function is just the probability density function the above relation is

dQ(x)/dx = n[P(x)]ⁿ⁻¹dP(x)/dx
which upon integration
from −∞ to x gives
Q(x) = [P(x)]ⁿ

If the probability density function is nonzero only over a finite range, say [x_min, x_max] then P(x_min)=0.0 and P(x_max=1.0. The cumulative probability function for the sample maximum will have the same range; i.e., Q(x_min)=0.0 and Q(x_max=1.0. The position of the median point for the sample distribution is closer and closer to x_max the larger the sample size and thus the probability density function is more and more concentrated close to x_max the larger the sample size. Let x_med(n) represent the median point for the probability distribution for the sample maximum for samples of size n. Then

x_med(n) is such that
Q(x_med(n)) = 0.5
but
P(x_med(n)) = (0.5)^1/n

For example, P(x_med(10)) = 0.933 and P(x_med(100)) = 0.9931. Thus the dispersion of q(x) must decrease with the sample size.
To see the relation between dispersion as measured by standard deviation and sample size consider a simple probability density function p(x).

p(x) = 1/σ for −σ/2≤x≤+σ/2
p(x) = 0 for all other values of x

then P(x) = (x−(−σ/2))/σ=(x+σ/2)/σ for −σ/2≤x≤+σ/2, Thus the probability density function for the sample maximum is then given by:

q(x) = 0 for x≤−σ/2
q(x) = n[(x+σ/2)/σ]^n-1(1/σ)
for −σ/2≤x≤+σ/2
q(x) = 0 for +σ≤x

Let z=(x+σ/2). Then the probability density function for z is

Q(z) = 0 for z<0,
Q(z) = nzⁿ⁻¹/σⁿ for 0≤z≤σ
Q(z) = 0 for z>σ

The expected value of z, E{z}, is given by

∫₀^σzq(z)dz = ∫₀^σ(nzⁿ/σⁿ)dz
= (n/σⁿ)[zⁿ⁺¹/(n+1)]₀^σ
= (n/(n+1))σ.

Likewise the second moment, E{z²q(z)} is given by

∫₀^σzq(z)dz = (n/(n+2))σ²

The variance of z is given by

Var(z) = E{z²}−(E{z})²
= (n/(n+2))σ² − ((n/(n+1))σ)²
which reduces to
[n/((n+2)(n+1)²)]σ²

Finally the standard deviation of z for a sample of size n, σ_n reduces to

σ_n = σ/((n+1)(1+2/n)^½)

The standard deviation for the probability density function p(x) is not the parameter σ. Instead

σ₁ = σ/((2(3)^½)

Thus

σ_n = σ₁[(2√3)/((n+1)(1+2/n)^½]

So in constrast to the case of the sample mean in which the dispersion of the sample mean is inversely proportional to √n, the dispersion of the sample maximum for the simple case being considered is inversely proportional to (n+1)(1+2/n)^½.
Since z=x+σ/2 and E{z}=(n/(n+1))σ,

E{x,n} = (n/(n+1))σ −σ/2 = ((n-1)/n)(σ/2)
but the maximum for the distribution p(x) is σ/2 so
E{x,n} = ((n-1)/n)x_max

Thus the expected value of the sample maximum is an asymptotically unbiased estimate of the population maximum.
Sample Minimum

For the sample minimum the same relations apply. The only modification is that the appropriate cumulative probability function is defined as

P(x) = ∫_x^+∞p(z)dz

HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins

σn = σ1/√n

Statistical Simulations

Analysis

P(x) = ∫−∞xp(z)dz.

n[P(x)]n-1p(x)

dQ(x)/dx = n[P(x)]n−1dP(x)/dx which upon integration from −∞ to x gives Q(x) = [P(x)]n

xmed(n) is such that Q(xmed(n)) = 0.5but P(xmed(n)) = (0.5)1/n

p(x) = 1/σ for −σ/2≤x≤+σ/2 p(x) = 0 for all other values of x

q(x) = 0 for x≤−σ/2 q(x) = n[(x+σ/2)/σ]n-1(1/σ) for −σ/2≤x≤+σ/2 q(x) = 0 for +σ≤x

Q(z) = 0 for z<0, Q(z) = nzn−1/σn for 0≤z≤σ Q(z) = 0 for z>σ

∫0σzq(z)dz = ∫0σ(nzn/σn)dz = (n/σn)[zn+1/(n+1)]0σ = (n/(n+1))σ.

∫0σzq(z)dz = (n/(n+2))σ²

Var(z) = E{z²}−(E{z})² = (n/(n+2))σ² − ((n/(n+1))σ)² which reduces to [n/((n+2)(n+1)²)]σ²

σn = σ/((n+1)(1+2/n)½)

σ1 = σ/((2(3)½)

σn = σ1[(2√3)/((n+1)(1+2/n)½]

E{x,n} = (n/(n+1))σ −σ/2 = ((n-1)/n)(σ/2) but the maximum for the distribution p(x) is σ/2 so E{x,n} = ((n-1)/n)xmax

Sample Minimum

P(x) = ∫x+∞p(z)dz

σ_n = σ₁/√n

P(x) = ∫_−∞^xp(z)dz.

n[P(x)]^n-1p(x)

dQ(x)/dx = n[P(x)]ⁿ⁻¹dP(x)/dx
which upon integration
from −∞ to x gives
Q(x) = [P(x)]ⁿ

x_med(n) is such that
Q(x_med(n)) = 0.5
but
P(x_med(n)) = (0.5)^1/n

p(x) = 1/σ for −σ/2≤x≤+σ/2
p(x) = 0 for all other values of x

q(x) = 0 for x≤−σ/2
q(x) = n[(x+σ/2)/σ]^n-1(1/σ)
for −σ/2≤x≤+σ/2
q(x) = 0 for +σ≤x

Q(z) = 0 for z<0,
Q(z) = nzⁿ⁻¹/σⁿ for 0≤z≤σ
Q(z) = 0 for z>σ

∫₀^σzq(z)dz = ∫₀^σ(nzⁿ/σⁿ)dz
= (n/σⁿ)[zⁿ⁺¹/(n+1)]₀^σ
= (n/(n+1))σ.

∫₀^σzq(z)dz = (n/(n+2))σ²

Var(z) = E{z²}−(E{z})²
= (n/(n+2))σ² − ((n/(n+1))σ)²
which reduces to
[n/((n+2)(n+1)²)]σ²

σ_n = σ/((n+1)(1+2/n)^½)

σ₁ = σ/((2(3)^½)

σ_n = σ₁[(2√3)/((n+1)(1+2/n)^½]

E{x,n} = (n/(n+1))σ −σ/2 = ((n-1)/n)(σ/2)
but the maximum for the distribution p(x) is σ/2 so
E{x,n} = ((n-1)/n)x_max

P(x) = ∫_x^+∞p(z)dz