Today I have thought about how one can formulate the Principal Component Analysis (PCA) method. In particular I want to reformulate PCA as a solution for a regression problem. The idea of reformulation PCA as a solution for some regression problem is useful in Sparse PCA , in which a regularization term is inserted into a ridge regression formula to enforce spareness of the coefficients (i.e. elastic net). There are at least two equivalent ways to motivate PCA. In this post I will first give a formulation of PCA based on orthogonal projection, and then discuss a regression-type reformulation of PCA.

1. Orthogonal projection onto optimal hyperplane:

Given -dimensional column vectors (assumed pre-processing is done )

Suppose that we want to project these vectors orthogonally onto a unit vector , so that the projected vectors will be:

If we think of the projected vectors as an approximation of the original , then a measure of how good the overall approximation is is the sum of lengths of the discrepancies . So it is natural to adopt the following sum of square as the objective function, and judge base on how small it can make the objective function be.

Expanding yields

Let . Choosing to minimize is equivalent to choosing to maximize with the condition . With a change of variable, one can prove that the optimal is the (normalized) eigenvector of corresponding to the largest eigenvalue of .

Suppose after we have fixed at , another unit vector with the property is given, and we are asked how to choose the best if the same orthogonal projection is carried out onto . With the same reasoning, the best is the one that maximizing with subject to and . Again with a change of variable, it can be proved that the best is the (normalized) eigenvector of corresponding to the second largest eigenvalue of .

A natural question to ask is whether and are still the best if we consider orthogonal projection onto the plane spanned by two orthogonal unit vectors ,.

In other words, we consider the following objective function

and ask whether the following statement holds or not

with subject to and

By Pythagoras theorem, this is indeed true. So in fact the plane spanned by is the optimal plane consider all -dimensional plane.

This property can be generalized to the case of -dimensional hyperplane (). The optimal (in the orthogonal projection sense) hyperplane in this case is spanned by the first eigenvectors of (in the order of descending eigenvalues). If we define then the projection matrix onto the optimal hyperplane is . (One can prove this generalized property by considering an arbitrary orthonormal basis and the corresponding projection matrix with defined as , then expanding the objective function with the condition .)

The are -dimensional vectors, but are completely confined to the -dimensional subspace spanned by . In other words, the “effective” dimension of in this case is only . For this reason, in a typical Principal Component Analysis, one is not interested in the themselves, but in the coordinates of in the new coordinate system defined by the orthonormal basis . The coordinates of in this system are . Therefore, further analyses after PCA often work directly with the -dimensional column vector defined as

On a side note, the have a diagonal sample covariance matrix (i.e. the dimensions of are uncorrelated). Since has (sample) covariance matrix and is a transformation of with , the covariance matrix of is , which is a diagonal matrix.

2. PCA as a solution of a ridge regression problem:

[the need for sparse PCA here]

The above formulation of PCA based on orthogonal projection onto optimal hyperplanes is actually a ridge regression formulation, and the final reformulation for SparsePCA will be very close to this.

[sparse PCA ]