San José State University

applet-magic.com
Thayer Watkins
Silicon Valley
& Tornado Alley
USA

The Probability Distribution
of the Sample Percentiles
as a Function of Sample Size

The purpose of the material below is to illustrate what happens to the distribution of sample percentiles as the size of the samples increases. This purpose is accomplished by drawing 2000 samples, computing their α percentiles and constructing the histogram of those sample percentiles. (Each time the screen is refreshed a new batch of 2000 samples is created.)

The random variable is uniformly distributed from -0.5 to +0.5; i.e.,

p(x) = 1 for -0.5≤x≤+0.5
p(x) = 0 for all other values of x

For a sample of size n the α percentile is found by ranking the sample values. For n=100k the percentile is the value in the kα place in the ranking. Thus for n=100 the α value in the ranking is taken. For n=200 the 2α value in the ranking is taken.

Statistical Simulations

Below are shown the histograms for samples of various sizes.

Analysis

Let p(x) be the probability density function for a random variable X and let P(x) be the cumulative probability function; i.e.,

P(x) = ∫−∞xp(z)dz.

The percentile of a distribution, denoted as xpercent, is defined as the value of x such that the probability of getting a value less than or equal to x is α/100 and the probability of getting value greater than or equal to x is (100-α)/100. In other words, P(xpercent)=α/100.

The probability density q(x) that the sample percentile has a value of x for a sample size of n, n being a multiple of 100, is the probability density p(x) times the probability that α(n-1)/100 of the sample are equal or below x and (100-α)(n-1)/100 are equal or above x; i.e.,

q(x) = cnP(x)α(n-1)/100(1-P(x))(100-α)(n-1)/100p(x)
which is the same as
q(x) = cn[P(x)α(1-P(x))100-α](n-1)/100p(x)
 

where cn is a coefficient that represents the number of ways a sample of α(n-1)/100 values equal or above x and (100-α)(n-1)/100 equal or below x can be arranged.

The term [P(x)α(1-P(x))100-α](n-1)/100 reaches its maximum at the same value of x for which P(x)α(1-P(x))100-α reaches its maximum. This is for a value of P such that the derivative of Pα(1-P)100-α with respect to P is zero; i.e.,

αPα-1(1-P)100-α − (100-α)Pα(1-P)100-α-1 = 0
dividing by Pα(1-P)100-α gives
α/P − (100-α)/(100-P) = 0
which reduces to
α(100-P) = (100-α)P
or
P = α

Thus the term [P(x)α(1-P(x))100-α](n-1)/100 reaches its maximum for x such that P(x)=α; i.e., for x equal to the percentile, xpercent.

For large enough n the value of p(x) away from xpercent becomes irrelevant; the percentile of the sample has to be arbitrarily close to where the above term is a maximum and that is at the percentile for the probability density function p(x), xpercent. As the sample size increases the probability density function for the sample percentile becomes more concentrated near xpercent, the dispersion of the probability density function for the sample percentile becomes smaller and smaller. Thus as the sample size increases the dispersion of the distribution of the sample percentile approaches a limit of zero.

Conclusions

The expected value of the percentile of the sample is asymptotically approaches the percentile of the distribution p(x). In other words, the sample percentile is an asymptotically unbiased estimate of the population percentile. Furthermore the limit of the dispersion of the distribution of sample percentiles is zero as sample size increases without bound.


HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins