Formulating the PCA

Today I have thought about how one can formulate the Principal Component Analysis (PCA) method. In particular I want to reformulate PCA as a solution for a regression problem. The idea of reformulation PCA as a solution for some regression problem is useful in Sparse PCA , in which a L_1 regularization term is inserted into a ridge regression formula to enforce spareness of the coefficients (i.e. elastic net). There are at least two equivalent ways to motivate PCA. In this post I will first give a formulation of PCA based on orthogonal projection, and then discuss a regression-type reformulation of PCA.

1. Orthogonal projection onto optimal hyperplane:

Given n p-dimensional column vectors \mathbf{y}_1,\cdots, \mathbf{y}_n (assumed pre-processing is done \sum_{i=1}^{n}\mathbf{y}_i = \mathbf{0})

Suppose that we want to project these vectors orthogonally onto a unit vector \mathbf{a}_1, so that the projected vectors will be:

\mathbf{x}_i = \mathbf{a}_1\mathbf{a}_1^T\mathbf{y}_i

If we think of the projected vectors \mathbf{x} _i as an approximation of the original \mathbf{y}_i, then a measure of how good the overall approximation is is the sum of lengths of the discrepancies \mathbf{y}_i - \mathbf{x}_i. So it is natural to adopt the following sum of square as the objective function, and judge \mathbf{a}_1 base on how small it can make the objective function be.

L_1=\dfrac{1}{2}\sum_{i=1}^{n}(\mathbf{y}_i - \mathbf{x}_i)^T (\mathbf{y}_i - \mathbf{x}_i)

Expanding L_1 yields

L_1 = \sum_{i=1}^n\mathbf{y}_i^T\mathbf{y} - \mathbf{a}_1^T\sum_{i=1}^n\mathbf{y}_i\mathbf{y}_i^T\mathbf{a}_1

Let \mathbf{\Omega} = \dfrac{1}{n}\sum_{i=1}^n\mathbf{y}_i\mathbf{y}_i^T. Choosing \mathbf{a}_1 to minimize L_1 is equivalent to choosing \mathbf{a}_1 to maximize \mathbf{a}_1^T\mathbf{\Omega} \mathbf{a}_1 with the condition \mathbf{a}^T\mathbf{a} = 1. With a change of variable, one can prove that the optimal \mathbf{a}_1 is the (normalized) eigenvector \mathbf{v}_1 of \mathbf{\Omega}corresponding to the largest eigenvalue of \mathbf{\Omega}.

Suppose after we have fixed \mathbf{a}_1 at \mathbf{v}_1, another unit vector \mathbf{a}_2 with the property a_2 \perp a_1 is given, and we are asked how to choose the best \mathbf{a}_2 if the same orthogonal projection is carried out onto \mathbf{a}_2. With the same reasoning, the best \mathbf{a}_2 is the one that maximizing \mathbf{a}_2^T\mathbf{\Omega}\mathbf{a}_2 with subject to \mathbf{a}_2^T\mathbf{a}_2 = 1 and \mathbf{a}_2^T\mathbf{a}_1 = 0. Again with a change of variable, it can be proved that the best \mathbf{a}_2 is the (normalized) eigenvector \mathbf{v}_2 of \mathbf{\Omega}corresponding to the second largest eigenvalue of \mathbf{\Omega}.

A natural question to ask is whether \mathbf{v}_1 and \mathbf{v}_2 are still the best if we consider orthogonal projection onto the plane spanned by two orthogonal unit vectors \mathbf{a}_1,\mathbf{a}_2.

In other words, we consider the following objective function

L_{plane} = \dfrac{1}{2}\sum_{i=1}^n(\mathbf{y}_i - \mathbf{a}_1\mathbf{a}_1^T\mathbf{y}_i - \mathbf{a}_2\mathbf{a}_2^T\mathbf{y}_i)^T(\mathbf{y}_i - \mathbf{a}_1\mathbf{a}_1^T\mathbf{y}_i - \mathbf{a}_2\mathbf{a}_2^T\mathbf{y}_i)

and ask whether the following statement holds or not

\mathbf{v}_1,\mathbf{v}_2 = \text{argmax}_{\mathbf{a}_1,\mathbf{a}_2} L_{\text{plane}}with subject to \mathbf{a}_1^T\mathbf{a}_1 = \mathbf{a}_{2}^T\mathbf{a}_2 = 1 and \mathbf{a}_1^T\mathbf{a}_2 = 0

By Pythagoras theorem, this is indeed true. So in fact the plane spanned by \mathbf{v}_1,\mathbf{v}_2 is the optimal plane consider all 2-dimensional plane.

This property can be generalized to the case of q-dimensional hyperplane (q \le p). The optimal (in the orthogonal projection sense) hyperplane in this case is spanned by the first q eigenvectors \mathbf{v}_1,\cdots,\mathbf{v}_q of \mathbf{\Omega}(in the order of descending eigenvalues). If we define \mathbf{V}_q = [\mathbf{v}_1 \ \mathbf{v}_2 \ \cdots \ \mathbf{v}_q] then the projection matrix onto the optimal hyperplane is \mathbf{V}_{q} \mathbf{V}_q^T . (One can prove this generalized property by considering an arbitrary orthonormal basis \mathbf{a}_1,\cdots,\mathbf{a}_q and the corresponding projection matrix \mathbf{A}\mathbf{A}^T with \mathbf{A} defined as \mathbf{A} = [\mathbf{a}_1\ \mathbf{a}_2 \ \cdots \ \mathbf{a}_q], then expanding the objective function L_{\text{plane}} with the condition \mathbf{A}^T\mathbf{A} = \mathbf{I}.)

The \mathbf{x}_i = \mathbf{V}_q\mathbf{V}_q^T\mathbf{y}_i are p-dimensional vectors, but are completely confined to the q-dimensional subspace spanned by \mathbf{v}_1,\cdots,\mathbf{v}_q. In other words, the “effective” dimension of \mathbf{x} in this case is only q. For this reason, in a typical Principal Component Analysis, one is not interested in the \mathbf{x}_i themselves, but in the coordinates of \mathbf{x}_i in the new coordinate system defined by the orthonormal basis \mathbf{v}_1,\cdots, \mathbf{v}_q. The coordinates of \mathbf{x}_i in this system are \mathbf{v}_1^T\mathbf{y}_i, \cdots, \mathbf{v}_q^T\mathbf{y}_i. Therefore, further analyses after PCA often work directly with the q-dimensional column vector \mathbf{z}_i defined as

\mathbf{z}_i = (\mathbf{v}_1^T\mathbf{y}_i, \cdots,\mathbf{v}_q^T\mathbf{y}_i)^T

On a side note, the \mathbf{z}_i have a diagonal sample covariance matrix (i.e. the dimensions of \mathbf{z} are uncorrelated). Since \mathbf{y} has (sample) covariance matrix \mathbf{\Omega} and \mathbf{z} is a transformation of \mathbf{y} with \mathbf{z} = \mathbf{V}_q^T\mathbf{y}, the covariance matrix of \mathbf{z} is \mathbf{V}_q^T\mathbf{\Omega}\mathbf{V}_q, which is a diagonal matrix.

2. PCA as a solution of a ridge regression problem:

[the need for sparse PCA here]

The above formulation of PCA based on orthogonal projection onto optimal hyperplanes is actually a ridge regression formulation, and the final reformulation for SparsePCA will be very close to this.

[sparse PCA ]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: