Calculate Percentage by Group in R
In this article, we will learn how to calculate percentage by group in a dataset in R programming. It is done using the dplyr
library. We will first create a dataset and then calculate the percentage by group. We will also look at how to format these percentages.
Dataset
Let’s say an investor maintains two portfolios, A and B, with investment in certain stocks in each portfolio. In R, we can have this dataset in the form of a data frame.
Our Dataframe
1 portfolios <- data.frame(portfolio=c('Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B'),
2 stock=c('P','Q','R','S','T','U','V','W','X','Y'),
3 amount=c(21, 62, 43, 15, 20, 32, 54, 43, 25, 31))
4
5### View Data
6
7 portfolios
8
9 portfolio stock amount
10 1 Portfolio A P 21
11 2 Portfolio A Q 62
12 3 Portfolio A R 43
13 4 Portfolio A S 15
14 5 Portfolio A T 20
15 6 Portfolio B U 32
16 7 Portfolio B V 54
17 8 Portfolio B W 43
18 9 Portfolio B X 25
19 10 Portfolio B Y 31
20
As you can see, there are two portfolios, ‘Portfolio A’ and ‘Portfolio B’. In each portfolio, the investor has invested a certain amount in different stocks. What we want to do is calculate the percentage of investment in each stock compared to the total investment in that portfolio. So, we are calculating % of investment by group (portfolio).
Load and Install dplyr Package
In R, we can achieve this using dplyr
library. Let’s start with installing and loading the dplyr
library.
1 # install and load dplyr package
2
3 install.packages('dplyr')
4 library(dplyr)
5
Calculate Percentage by Group
We can now calculate percentage by group, percentage of investment in each stock grouped by portfolio, using the following formula:
1 portfolios %>%
2 group_by(portfolio) %>%
3 mutate(percent = amount/sum(amount))
4
5The results are shown below:
6
7 # A tibble: 10 × 4
8 # Groups: portfolio [2]
9 portfolio stock amount percent
10 <chr> <chr> <dbl> <dbl>
11 1 Portfolio A P 21 0.130
12 2 Portfolio A Q 62 0.385
13 3 Portfolio A R 43 0.267
14 4 Portfolio A S 15 0.0932
15 5 Portfolio A T 20 0.124
16 6 Portfolio B U 32 0.173
17 7 Portfolio B V 54 0.292
18 8 Portfolio B W 43 0.232
19 9 Portfolio B X 25 0.135
20 10 Portfolio B Y 31 0.168
21
The mutate()
function is used to create a new variable (in this case percent) from the dataset. This new column contains values using the formula amount/sum(amount)
and the calculation is grouped by portfolio
.
The percentages are in the last column. Let’s verify this. In Portfolio A, the total investment in 5 stocks is 161 (21 + 62 + 43 + 15 + 20). The investment in stock P is 21, which is 21/161 = 0.13 or 13%, same as shown in the table.
As you can see, the results are in decimal numbers. We can format these with percentage symbols using the formattable package.
1 install.packages('formattable')
2 library(formattable)
3
4 result <- portfolios %>%
5 group_by(portfolio) %>%
6 mutate(percent = formattable::percent(amount / sum(amount)))
7
8 result
9
10 # A tibble: 10 × 4
11 # Groups: portfolio [2]
12 portfolio stock amount percent
13 <chr> <chr> <dbl> <formttbl>
14 1 Portfolio A P 21 13.04%
15 2 Portfolio A Q 62 38.51%
16 3 Portfolio A R 43 26.71%
17 4 Portfolio A S 15 9.32%
18 5 Portfolio A T 20 12.42%
19 6 Portfolio B U 32 17.30%
20 7 Portfolio B V 54 29.19%
21 8 Portfolio B W 43 23.24%
22 9 Portfolio B X 25 13.51%
23 10 Portfolio B Y 31 16.76%
24
We now have much more presentable results. The formattable::percent()
converts the values in percent column to percentages with symbol.