San José State University
Department of Economics
Thayer Watkins
Silicon Valley
& Tornado Alley

The Effect of Averaging Variables Which
Are the Cumulative Sum
of Random Disturbances

Consider variables which are of the form

T(t) = T(t-1) + U(t)
and thus
T(t) = U(0) + U(1) + U(2) + … + U(t-1) + U(t)

where the U(s)'s are independent variables, random or otherwise.

Now considering averaging over intervals. First take two-period intervals.

T(t) = U(0) + U(1) + U(2) + … + U(t-1) + U(t)
T(t+1) = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + U(t+1)


½[T(t)+T(t+1)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + ½U(t+1)

The weight of U(t) in the average is twice that of U(t+1).

The formulas for three-period and four-period averages are

(1/3)[T(t)+T(t+1)+T(t+2)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + (2/3)U(t+1) + (1/3)U(t+2)
(1/4)[T(t)+T(t+1)+T(t+2)+T(t+3)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + (3/4)U(t+1) + (1/2)U(t+2) + (1/4)U(t+3)

The weight of the first disturbance, U(t), is three and four times, respectively, of the last disturbance in the average.

The general formula is clear

(1/n)[T(t)+T(t+1)+T(t+2)+ … +T(t+(n-1))] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + [(n-1)/n]U(t+1) + [(n-2)/n]U(t+2) + … + (1/n)U(t+(n-1))

For annual averages the disturbances during Januaries have twelve times the weight of disturbances during Decembers and disturbances on January firsts have 365 times the weght of disturbances occurring on December thirtyfirsts. Likewise for daily averages the disturbances occurring between midnight and 1 A.M. have 24 times the weight of disturbances occurring between 11 P.M. and midnight. This suggests that for statistical analysis it is not a good idea to work with interval averages. Instead the values at a specified point in the interval, say the ends of the interval or the midpoints of the intervals, should be used.

The First Differences of Interval Averages

Consider the two-period averages T(t)=½[T(t)+T(t+1)].


T(t)=½[T(t)+T(t+1)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + ½U(t+1)
T(t+1)=½[T(t+1)+T(t+2)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + U(t+1) + ½U(t+2)

it follows that

T(t+1)T(t) = U(t+1)+½U(t+2)−½U(t+1)
or, equivalently
T(t+1)T(t) = ½U(t+1)+½U(t+2)


T(t+2)T(t+1) = ½U(t+2)+½U(t+3)

Because (T(t+1)T(t)) and (T(t+2)T(t+1)) both depend upon ½U(t+2) there is a positive serial correlation for the first differences of the averages even if there is no serial correlation for the U(t)'s.

Also since (T(t+1)T(t)) and T(t) both depend upon U(t+1) there will be a positive correlation between the change in T(t) and its value. There would be no such correlation between the unaveraged T(t) and (T(t+1)−T(t)). Thus averaging introduces spurious correlations into the statistical series.

The serial correlation can extend beyond a one period lag. Consider now an averaging over a three period interval. Then


This means

T(t) = U(0) + U(1) + U(2) + … + U(t-2) + U(t-1) + (2/3)U(t) + (1/3)U(t+1)
where this can be more clearly represented as
T(t) = T(t-2) + (3/3)U(t-1) + (2/3)U(t) + (1/3)U(t+1)


T(t+1) = T(t-2) + U(t-1) + (3/3)U(t) + (2/3)U(t+1) + (1/3)U(t+2)
and therefore
T(t+1)T(t) = (0)U(t-1) + (1/3)U(t) + (1/3)U(t+1) + (1/3)U(t+2)
or, equivalently
T(t+1)T(t) = (1/3)[U(t)+U(t+1)+U(t+2)]


T(t+2)T(t+1) = (1/3)[U(t+1)+U(t+2)+U(t+3)]
T(t+3)T(t+) = (1/3)[U(t+2)+U(t+3)+U(t+4)]

This means there will be a positive correlation between [T(t+1)T(t)] and [T(t+2)T(t+1)] and also between [T(t+1)T(t)] and [T(t+3)T(t+2)] because of their common dependencies.

HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins