Thayer Watkins

Principal Component Analysis
in Remote Sensing

Principal components analysis is a method in which original data is transformed into a new set of data which may better capture the essential information. Often some variables are highly correlated such that the information contained in one variable is largely a duplication of the information contained in another variable. Instead of throwing away the redundant data principal components analysis condenses the information in intercorrelated variables into a few variables, called principal components.

Principal components analysis is a special case of transforming the original data into a new coordinate system. If the original data involves n different variables then each observation may be considered a point in an n-dimensional vector space. The change of coordinate system for a two dimensional space is shown below.

Change of Coordinate System

In the above diagram the original data are given terms of X1 and X2 values which are plotted with respect to the X1,X2 axes. The axes for the new coordinate system are shown as red lines. The origin of the new coordinate system at the intersection of the red lines is different from the origin of the old coordinate system. The coordinates of a point P, in the new coordinate system are measured along the X1' and X2' axes.

The determination of the coordinates of point P with respect tot he new axes can determined by geometry. Let (X1c,X2c) be the coordinates of the origin of the new system as measured in the old system. The diamgram show below shows the derivation of the new coordinates in terms of the origin and the angle of the new axes with respect to the old axes.

The net result of the derivation is that the new coordinates can be computed by equations that take a simple form. There is simpler way to express the relationship between the new coordinates and the old that makes use of what is called the dot product of vectors.

The Data Ellipsoid

When the data is plotted as points in an n-dimensional vector space they often fall roughly within an ellipsoid. For the two dimensional case the situation might be as shown below.

For this case the new coordinate system that would be appropriate might be the one in which the origin is at the mean values for all the observations. The axes for the new coordinate system would be the major and minor axes of the data ellipse.

In the new coordinate system there would be negative values for coordinates which is inappropriate for remote sensing work. This is handled by shifting the origin to a point given by the minimum values of the new coordinates.

The General Case

While the plot of the data may be roughly an ellipsoid it also might not be. The work of Kauth and Thomas argues that the data plot resembles a fuzzy tasseled cap rather than an ellipsoid. Furthermore there is the problem of determinining the ellipsoid if one does exist. There is needed a more general method for determining the axes of the new coordinate system. The general method involves the covariances between the variables. This amounts to the intercorrelations among the original variables. The process for computing the covariances is described below.

Once the covariance matrix has been determined the next question is how are the new coordinate axes to determined from it. The answer is that there are certain vectors, called characteristic vectors or eigen vectors, that give the direction of the new axes.

The Eigenvectors and Eigenvalues of a Square Matrix

Let M be an n-by-n matrix. If X is an n-dimensional vector then one can multiply the n-by-n matrix M times the n-dimensional vector X and get another n-dimensional vector, say Y. This is expressed by the equation:

Y = MX

The vector X is said to be transformed by M into the vector Y. A vector X that is transformed into a multiple of itself is special; i.e., if the transformed vector Y is equal to X with every component multiplied the same constant, say λ. Such a vector X is said to be an eigenvector of the matrix M and the constant λ is said to be an eigenvalue of the matrix. (Eigen is a German word meaning self or characteristic.) The equation defining eigenvectors and eigenvalues is:

MX = λX

The computational method for finding the eigenvalues and then the eigenvectors are not of great concern here. It suffices to say that the eigenvalues are solutions to an n-th degree polynomial equation. the details are given in the appendix.

For an n-by-n matrix there will be n eigenvalues and n orthogonal (perpendicular) eigenvectors. The eigenvectors correspond to the axes of the n-dimensional ellipsoid that best fits the data. The eigenvectors can be normalized to unit length.

The eigenvalues given the amount of the variation (sum of squared deviations from their means) in the original data that is explained by the variation along the corresponding axis (as represented by the corresponding eigenvector). Thus the sum of the eigenvalues is equal to the total variation in the original variables. The components in principal component analysis are labeled according to the size of the corresponding eigenvalue. In practice the first component in remote sensing explains the lion's share of the variation. The second and third components explain much smaller shares and those beyond the third generally account for trivially small shares of the variation. The higher components may be consider to be random variation or noise. This means that with small or no loss of information the n variables of the original data can be reduced to three or so principal components.

Appendix A: Derivation of Principal Components Analysis

Assume the original data is given in the form of zij where the index i stands for the variable number and j for the observation number. For simplicity it is assumed that the mean value of each variable is zero; i.e., the variables are reduced to deviations from their means. For each observation j there is a value pj, called the principal component, which is used with a set of coefficients xi to approximate the j-th observation of the i-th variable zij as xipj. The values of the xi's and the pj are to be chosen so as to minimize the sum of the squared deviations between the actual data values and the approximations based upon the principal component; i.e.,

S = ΣiΣj(zij - xipj)2.

In order for the minimization problem to have a solution it is necessary to impose a constraint on the values of the xi's. The most convenient form of the constraint is that

Σixi2 = 1.

The minimization problem is to find values for the xi's and pj's such that

S = ΣiΣj(zij - xipj)2 is minimized subject to the constraint that Σixi2 = 1.

The solution to this constrained minimization problem is found by The Lagrangian multiplier method; i.e.,

minimize S = ΣiΣj(zij - xipj)2 - μ(Σixi2 - 1),

where μ is a parameter chosen at a value such that the constraint is satisfied.

The first order conditions for a minimum with respect to the pj)'s are:

∂S/∂pj) = Σi2(zij - xipj)xi = 0
for all j.

These conditions reduce to:

pj = Σizijxi
since Σixi2 = 1.

This says that the proper values for the principal component pj is the dot product of the vedctor of the coefficients xi's with the vector of the variables for observation j.

The problem now is find a way of determining the coefficients xi's. The first order condition for a minimim of the Lagrangian with respect to these variables are:

∂S/∂xi = Σj2(zij - xipj)pj -2μxi = 0

for all i.

These conditions reduce to:

xi = (Σjzijpj)/(Σjpj2 + μ)

If (Σjpj2 + μ) is replaced by λ and each pj is replaced by its value of Σkzkjxk the first order conditions for the xi's are:

λxi = Σjzijkzkjxk)
      = Σjkzijzkjxk)
      = ΣjΣkzijzkjxk
      = ΣkΣjzijzkjxk

Thus the first order conditions for the xi's can be put in the form

λxi = Σkjzijzkj)xk
= Σkmikxk
where mik = Σjzijzkj
is proportional to the covariance between variables i and j.

The first order conditions for the coefficients xi's indicate that the minimizing values for the xi's are components of an eigenvector of the matrix M. If all elements of the matrix M are divided by the number of observations t the result is the covariance matrix which has the same eigenvectors as the matrix M. The eigenvalues are affected by the division by the number of observations t but in the simple relation that the eigenvalues of the covariance matrix are equal to the eigenvalues of the moment matrix M divided by t.

Appendix B: Matrix Algebra Derivation of Principal Components Analysis

The mathematics of the principal component analysis is expressed more succinctly in terms of matrix operations. The basics of the matrix approach are given below.

Let the data be represented as a matrix Z with n rows and t columns where n is the number of variables and t is the number of observations.

The matrix of approximations of the data based upon the vector of coefficeints X and the vector of the principal component P is obtained by multiplying the nx1 vector X times the 1xt vector which is the transpose of the vector P; i.e.,


The matrix of deviations of the actual data from the values based upon X and P is


The sum of the squared deviations can be obtained in terms of matrix operations by multiplying deviations matrix times the transpose of the deviations matrix itself; i.e.,

(Z - XPT)(Z - XPT)T

The result is an nxn matrix. The sum of the elements on the principal diagonal is the sum of the squared deviations for all of the variables.

The constraint on the choice of X is that

XTX = 1

The first order condition for a minimimum with respect to the elements of P are, in matrix form,

or, equivalently,

The first order condition for a minimimum with respect to the elements of X are, in matrix form,

λXT = ZP
or, equivalently, since P = ZTX
letting M = ZZT
λX = MX
Thus X is an eigenvector
of the matrix M

The equation defining eigenvalues and eigenvectors can be expressed as:

(M-λI)X = 0

where I is an identity matrix and the zero on the rightside represents a vector of zeroes. For this equation to have a nontrivial solution the determinant of the coefficient matrix (M-λI) must be zero. This is the condition determining the eigen values. It reduces to an n-th degree polynomial equation. Once an eigenvalue is known the system of equations can be solved for the components of X by setting one component equal to an arbitrary nonzero value and solving for the rest. The solution can be normalized to put it into any desired form.

In general there are n eigenvalues and n orthonormal eigenvectors. The eigenvalues correspond to the amount of variation in the variables that is explained by a component so the first component should be the one utilized in computing the first principal component. The second largest eigenvalue identifies the eigenvector to be used in computing the second principal component and so forth. The proportion of the variation explained by a component is just its eigenvalue divided by the sum of the eigenvalues.