Calculate Percentage by Group in R
In this article, we will learn how to calculate percentage by group in a dataset in R programming. It is done using the dplyr
library. We will first create a dataset and then calculate the percentage by group. We will also look at how to format these percentages.
Dataset
Let’s say an investor maintains two portfolios, A and B, with investment in certain stocks in each portfolio. In R, we can have this dataset in the form of a data frame.
Our Dataframe
portfolios <- data.frame(portfolio=c('Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B'),
stock=c('P','Q','R','S','T','U','V','W','X','Y'),
amount=c(21, 62, 43, 15, 20, 32, 54, 43, 25, 31))
View Data
portfolios
portfolio stock amount
1 Portfolio A P 21
2 Portfolio A Q 62
3 Portfolio A R 43
4 Portfolio A S 15
5 Portfolio A T 20
6 Portfolio B U 32
7 Portfolio B V 54
8 Portfolio B W 43
9 Portfolio B X 25
10 Portfolio B Y 31
As you can see, there are two portfolios, ‘Portfolio A’ and ‘Portfolio B’. In each portfolio, the investor has invested a certain amount in different stocks. What we want to do is calculate the percentage of investment in each stock compared to the total investment in that portfolio. So, we are calculating % of investment by group (portfolio).
Load and Install dplyr Package
In R, we can achieve this using dplyr
library. Let’s start with installing and loading the dplyr
library.
# install and load dplyr package
install.packages('dplyr')
library(dplyr)
Calculate Percentage by Group
We can now calculate percentage by group, percentage of investment in each stock grouped by portfolio, using the following formula:
portfolios %>%
group_by(portfolio) %>%
mutate(percent = amount/sum(amount))
The results are shown below:
# A tibble: 10 × 4
# Groups: portfolio [2]
portfolio stock amount percent
<chr> <chr> <dbl> <dbl>
1 Portfolio A P 21 0.130
2 Portfolio A Q 62 0.385
3 Portfolio A R 43 0.267
4 Portfolio A S 15 0.0932
5 Portfolio A T 20 0.124
6 Portfolio B U 32 0.173
7 Portfolio B V 54 0.292
8 Portfolio B W 43 0.232
9 Portfolio B X 25 0.135
10 Portfolio B Y 31 0.168
The mutate()
function is used to create a new variable (in this case percent) from the dataset. This new column contains values using the formula amount/sum(amount)
and the calculation is grouped by portfolio
.
The percentages are in the last column. Let’s verify this. In Portfolio A, the total investment in 5 stocks is 161 (21 + 62 + 43 + 15 + 20). The investment in stock P is 21, which is 21/161 = 0.13 or 13%, same as shown in the table.
As you can see, the results are in decimal numbers. We can format these with percentage symbols using the formattable package.
install.packages('formattable')
library(formattable)
result <- portfolios %>%
group_by(portfolio) %>%
mutate(percent = formattable::percent(amount / sum(amount)))
result
# A tibble: 10 × 4
# Groups: portfolio [2]
portfolio stock amount percent
<chr> <chr> <dbl> <formttbl>
1 Portfolio A P 21 13.04%
2 Portfolio A Q 62 38.51%
3 Portfolio A R 43 26.71%
4 Portfolio A S 15 9.32%
5 Portfolio A T 20 12.42%
6 Portfolio B U 32 17.30%
7 Portfolio B V 54 29.19%
8 Portfolio B W 43 23.24%
9 Portfolio B X 25 13.51%
10 Portfolio B Y 31 16.76%
We now have much more presentable results. The formattable::percent()
converts the values in percent column to percentages with symbol.
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $29 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.