Principal Component Analysis (PCA) is one of the widely applied statistical techniques to reduce the dimension of the data without losing information or losing minimum information. Nowadays, PCA is also a popular in unsupervised machine learning. PCA leverages the concept of spectral decomposition of matrix, which is quite popular in Linear Algebra.

Source: Understanding Principal Component Analysis | by Trist'n Joseph | Towards Data Science

In this article, we discuss how to implement PCA manually though many statistical software, such as EViews and STATA, will conduct PCA within few clicks. Also, open-source software, such as R and Python, has packages that produces principal components with ease.

Principal Component Analysis in Python Manually - Jovian

We use matrix to derive principal components. We denote matrix with capital letter.

Suppose we have `X_{m \times n}` and we need to reduce its dimension.

We need to carry out some preprocessing steps.

First, we need to standardize `X` and mean centering of standardized `X`.

` X_\text{std}=\frac{X-\bar{X}}{\sigma_X}`

`Z=X_\text{std}-\bar{X_\text{std}}`

We store the value of mean centered standardized `X` into `Z`.

We finish the preprocessing steps after completing these two steps.

Now, We proceed to estimate the variance-covariance matrix of `Z`.

`C = \frac{1}{n}Z^TZ`

`C` is the matrix containing variance and covariance of `Z`. It is a square matrix of `n \times n` in our example.

Furthermore, we estimate the eigenvalues and eigenvectors following spectral decomposition. We must be familiar with a popular technique called spectral decomposition of matrix or eigen decomposition of matrix. The spectral decomposition of matrix decomposes matrix into eigen value and eigen vector.

` C = \Phi \Lambda \Phi^{-1}`

The spectral decomposition of `C` - where `C` must be non-defective square matrix - decomposes `C` into `\Phi`, the matrix of eigenvectors, and `\Lambda`, a diagonal matrix containing eigenvalues.

Now, we need to order eigenvalues in descending order and order eigenvectors according to eigenvalues.

We need to obtain principal component.

` PC = Z\Phi^*`

Here, `\Phi^*` is the eigenvectors corresponding to eigenvalues arranged in descending order.

Now, factor loadings are obtained by multiplying eigenvectors `(\Phi^*)` by the square root of corresponding eigenvalues.

`\text{Loadings} = \sqrt{\text{eigenvalues}} \times \text{eigenvectors}`