The Probability Distribution of the Sample Median as a Function of Sample Size

San José State University Department of Economics

applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA

The Probability Distribution of the Sample Median
as a Function of Sample Size

The purpose of the material below is to illustrate what happens to the distribution of sample medians as the size of the samples increases. This purpose is accomplished by drawing 2000 samples, computing their medians and constructing the histogram of those sample medians. (Each time the screen is refreshed a new batch of 2000 samples is created.)
The random variable is uniformly distributed from -0.5 to +0.5; i.e.,

p(x) = 1 for -0.5≤x≤+0.5
p(x) = 0 for all other values of x

For a sample of size n the median is found by ranking the sample values. For n odd the median is the value in the (n+1)/2 place in the ranking. For n even the median is taken to be the average of the values at (n-1)/2 and (n+1)/2 places in the ranking. Thus for n=3 the second value in the ranking is taken. For n=4 the average of the second and third in the ranking is taken.
Statistical Simulations

Below are shown the histograms for samples of various sizes.

Analysis

Let p(x) be the probability density function for a random variable X and let P(x) be the cumulative probability function; i.e.,

P(x) = ∫_−∞^xp(z)dz.

The median of a distribution, denoted as x_med, is defined as the value of x such that are equal probabilities of getting a larger value and getting a smaller value than x_med. In other words, P(x_med)=0.5.
The probability density q(x) that the sample median has a value of x for a sample size of n, n being odd, is the probability density p(x) times the probability that (n-1)/2 of the sample are above x and (n-1)/2 are below x; i.e.,

q(x) = c_nP(x)^(n-1)/2(1-P(x))^(n-1)/2p(x)
which is the same as
q(x) = c_n[P(x)(1-P(x))]^(n-1)/2p(x)

where c_n is a coefficient that represents the number of ways a sample of (n-1)/2 values above x and (n-1)/2 below x can be arranged.
The term [P(x)(1-P(x))]^(n-1)/2 reaches its maximum for the value of x such that P(x)=0.5; i.e. for the median of the probability distribution p(x). Denote that median value as x_med. Because q(x) is the product of [P(x)(1-P(x))]^(n-1)/2 and p(x), q(x) might reach a maximum for some value of x other than x_med. But as n increases the term [P(x)(1-P(x))]^(n-1)/2 becomes more and more concentrated aroung x_med. All of the value of P(x)(1−P(x)) are less than or equal to 0.5. For values of x not equal to x_med the values of P(x)(1−P(x)) are smaller than 0.5 and get smaller faster for higher powers than does the value for x_med.
For large enough n the value of p(x) away from x_med becomes irrelevant; the median of the sample has to be arbitrarily close to x_med. Likewise the dispersion of the distribution of the sample median has to become smaller and smaller as the sample size increases and approaches zero as a limiting value.
The limiting of the analysis to sample of only an odd size is not a significant limitation.
Conclusions

The expected value of the median of the sample is equal to the median of the distribution p(x). Furthermore the limit of the dispersion of the distribution of sample medians as sample size increases without bound is zero.
For the distribution of other sample statistics see Sample Statistics, Sample Quartile and Sample Percentile.

HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins

p(x) = 1 for -0.5≤x≤+0.5 p(x) = 0 for all other values of x

Statistical Simulations

Analysis

P(x) = ∫−∞xp(z)dz.

q(x) = cnP(x)(n-1)/2(1-P(x))(n-1)/2p(x) which is the same as q(x) = cn[P(x)(1-P(x))](n-1)/2p(x)

Conclusions

p(x) = 1 for -0.5≤x≤+0.5
p(x) = 0 for all other values of x

P(x) = ∫_−∞^xp(z)dz.

q(x) = c_nP(x)^(n-1)/2(1-P(x))^(n-1)/2p(x)
which is the same as
q(x) = c_n[P(x)(1-P(x))]^(n-1)/2p(x)