What is Principal Component Analysis (With Steps)
Imagine you are an investment analyst with access to a large number of investment options such as stocks, bonds, commodities, ETFs, etc. Your goal is to create a diversified portfolio with maximum returns and minimum risk.
While analysing these investments, one problem you’ll face is that all these assets are correlated with each other and their risk/return is affected by common factors such as market conditions, economic indicators, industry trends, etc. In order to create the most diversified portfolio, it’s important for you to account for the correlation between these assets.
If you’re considering asset classes such as stocks, bonds, commodities, and ETFs, then some of the common factors affecting them could be the underlying market returns, interest rate sensitivity, liquidity risk, currency fluctuations, underlying volatility, and other factors The actual factors will depend on your data set. This is a dataset with high dimensionality, each factor being one dimension. As humans, we can’t easily visualize more than three dimensions. Analyzing and visualizing this data with so many dimensions is not easy. What we want to do is work with fewer dimensions, may be 2 or 3. This is where Principal Component Analysis comes to help.
What is Principal Component Analysis
Principal Component Analysis (PCA) is a powerful unsupervised statistical technique that helps us reduce dimensionality and visualize multivariate data. PCA transforms our dataset into a new set of orthogonal* variables, which we call principal components.
** Orthogonality refers to the concept of statistical independence. Orthogonal variables are variables that are statistically independent from each other, meaning they are uncorrelated and their covariance is zero. In simple terms, knowing the value of one variable does not provide any information about the value of another orthogonal variable.*
These principal components are uncorrelated with each other and are ranked by the amount of variance they explain in the dataset.
The first principal component accounts for the most possible variance in the data, and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.
This technique is beneficial for a variety of financial analyses where many variables could be interdependent. Using PCA, we reduce the dimensions without losing much information. This makes the dataset easier to work with.
After performing Principal Component Analysis (PCA), it's common to select the first few components that explain the most variance in the dataset for further analysis. PCA also makes visualizing this complex data easier.
