Get full access to all Data Science, Machine Learning, and AI courses built for finance professionals.
One-time payment - Lifetime access
Or create a free account to start
A step-by-step guide covering Python, SQL, analytics, and finance applications.
Or create a free account to access more
Get full access to all Data Science, Machine Learning, and AI courses built for finance professionals.
One-time payment - Lifetime access
Or create a free account to start
A step-by-step guide covering Python, SQL, analytics, and finance applications.
Or create a free account to access more
In this article, we will learn how to calculate percentage by group in a dataset in R programming. It is done using the dplyr library. We will first create a dataset and then calculate the percentage by group. We will also look at how to format these percentages.
Let’s say an investor maintains two portfolios, A and B, with investment in certain stocks in each portfolio. In R, we can have this dataset in the form of a data frame.
1 portfolios <- data.frame(portfolio=c('Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B'),
2 stock=c('P','Q','R','S','T','U','V','W','X','Y'),
3 amount=c(21, 62, 43, 15, 20, 32, 54, 43, 25, 31))
4
5### View Data
6
7 portfolios
8
9 portfolio stock amount
10 1 Portfolio A P 21
11 2 Portfolio A Q 62
12 3 Portfolio A R 43
13 4 Portfolio A S 15
14 5 Portfolio A T 20
15 6 Portfolio B U 32
16 7 Portfolio B V 54
17 8 Portfolio B W 43
18 9 Portfolio B X 25
19 10 Portfolio B Y 31
20As you can see, there are two portfolios, ‘Portfolio A’ and ‘Portfolio B’. In each portfolio, the investor has invested a certain amount in different stocks. What we want to do is calculate the percentage of investment in each stock compared to the total investment in that portfolio. So, we are calculating % of investment by group (portfolio).
In R, we can achieve this using dplyr library. Let’s start with installing and loading the dplyr library.
1 # install and load dplyr package
2
3 install.packages('dplyr')
4 library(dplyr)
5We can now calculate percentage by group, percentage of investment in each stock grouped by portfolio, using the following formula:
1 portfolios %>%
2 group_by(portfolio) %>%
3 mutate(percent = amount/sum(amount))
portfolio stock amount percent
The mutate() function is used to create a new variable (in this case percent) from the dataset. This new column contains values using the formula amount/sum(amount) and the calculation is grouped by portfolio.
The percentages are in the last column. Let’s verify this. In Portfolio A, the total investment in 5 stocks is 161 (21 + 62 + 43 + 15 + 20). The investment in stock P is 21, which is 21/161 = 0.13 or 13%, same as shown in the table.
As you can see, the results are in decimal numbers. We can format these with percentage symbols using the formattable package.
1 install.packages('formattable')
2 library(formattable)
3
4 result <- portfolios %>%
5
result
portfolio stock amount percent
We now have much more presentable results. The formattable::percent() converts the values in percent column to percentages with symbol.