San José State University

applet-magic.com
Thayer Watkins
Silicon Valley,
Tornado Alley
& the Gateway
to the Rockies
USA

A Real Derivation
of Regression Analysis
in Matrix Form

This is a real derivation of regression analysis in matrix form. An internet search for such a derivation brings up purported derivations which are not really derivations. They set up the material for a derivation and then simply give the final result.

Let Y be an n-dimensional row vector, a 1 by n matrix. This is the data for the dependent variable. The data for the m explanatory variables is given as an m by n matrix. Let B be an m dimensional row vector, a 1 by m matrix.

The deviations D between the dependent variables and the estimates based upon the coefficients B are given by

D = Y − BX

The sum S of the squared deviations can be expressed as

S(B) = D·DT

The First Order Conditions

The best B is the one that minimizes S. They are called the Least Squares estimates. Here is how they are derived.

S(B) = (Y − BX)(Y − BX)T = (Y − BX)(YT − XTB)
= YYT −YXTBT − BXYT + BXXTBT

Now consider S(B+ΔB) which is

YYT −YXT(B+ΔB)T − (B+ΔB)XYT + (B+ΔB)XXT(B+ΔB)T

In the expression for ΔS=[S(B+ΔB) − S(B)] all of the terms not involving ΔB are eliminated. That leaves

ΔS = −YXTΔBT − ΔBXYT + ΔBXXTBT + BXXTΔBT + ΔBXXTΔBT

Now let ΔB go to an infinitesimal dB. This eliminates the term ΔBXXTΔBT which would involve the products of infinitesimals. Thus

dS = −YXTdBT − dBXYT + dBXXTBT + BXXTdBT
which can be rearranged to
dS = (BXXT−YXT)dBT + dB(XXTBT− XYT)

The term dB(XXTBT− XYT) is just the transpose of (BXXT−YXT)dBT and both are 1x1 matrices; i.e., scalars.

Thus

dS = 2dB(XXTBT− XYT) = 2(BXXT−YXT)dBT

This means that

(∂S/∂B) = 2(XXTBT− XYT) = 2(BXXT−YXT)

For an extremum (∂S/∂B) must be the zero row vector 0 and hence

BXXT = YXT
and thus
B = YXT(XXT)−1

Second Order Conditions

The second order conditions for an unconstained minimum are that the matrix of the second derivatives of the parameters is positive definite. Let the second derivatives of the regression coefficients B be denoted as

(∂²S/∂B²)

Since

(∂S/∂B) = 2(BXXT−YXT)
this means that
d(∂S/∂B) = 2dB(XXT)
and hence
(∂²S/∂B²) = (XXT)

The matrix (XXT) by the nature of its construction is positive definite. Therefore the regression estimates

B = YXT(XXT)−1

minimize the sum of the squared deviations between the actual dependent variables and their regression estimates.

Conclusions

The Least Squares estimates of the regression coefficients for the depemdent variables in terms of the independent explanatory variables X are given by

B = YXT(XXT)−1


HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins