Calculate Percentage by Group in R

In this article, we will learn how to calculate percentage by group in a dataset in R programming. It is done using the dplyr library. We will first create a dataset and then calculate the percentage by group. We will also look at how to format these percentages.

Dataset

Let’s say an investor maintains two portfolios, A and B, with investment in certain stocks in each portfolio. In R, we can have this dataset in the form of a data frame.

Our Dataframe

portfolios <- data.frame(portfolio=c('Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B'),
    stock=c('P','Q','R','S','T','U','V','W','X','Y'),
    amount=c(21, 62, 43, 15, 20, 32, 54, 43, 25, 31))

View Data

portfolios

portfolio stock amount
1  Portfolio A  P  21
2  Portfolio A  Q  62
3  Portfolio A  R  43
4  Portfolio A  S  15
5  Portfolio A  T  20
6  Portfolio B  U  32
7  Portfolio B  V  54
8  Portfolio B  W  43
9  Portfolio B  X  25
10 Portfolio B  Y  31

As you can see, there are two portfolios, ‘Portfolio A’ and ‘Portfolio B’. In each portfolio, the investor has invested a certain amount in different stocks. What we want to do is calculate the percentage of investment in each stock compared to the total investment in that portfolio. So, we are calculating % of investment by group (portfolio).

Load and Install dplyr Package

In R, we can achieve this using dplyr library. Let’s start with installing and loading the dplyr library.

# install and load dplyr package

install.packages('dplyr')
library(dplyr)

Calculate Percentage by Group

We can now calculate percentage by group, percentage of investment in each stock grouped by portfolio, using the following formula:

portfolios %>%
    group_by(portfolio) %>%
    mutate(percent = amount/sum(amount))

The results are shown below:

# A tibble: 10 × 4
# Groups:  portfolio [2]
portfolio  stock amount percent
<chr>  <chr>  <dbl>  <dbl>
1 Portfolio A P  21  0.130
2 Portfolio A Q  62  0.385
3 Portfolio A R  43  0.267
4 Portfolio A S  15  0.0932
5 Portfolio A T  20  0.124
6 Portfolio B U  32  0.173
7 Portfolio B V  54  0.292
8 Portfolio B W  43  0.232
9 Portfolio B X  25  0.135
10 Portfolio B Y  31  0.168

The mutate() function is used to create a new variable (in this case percent) from the dataset. This new column contains values using the formula amount/sum(amount) and the calculation is grouped by portfolio.

The percentages are in the last column. Let’s verify this. In Portfolio A, the total investment in 5 stocks is 161 (21 + 62 + 43 + 15 + 20). The investment in stock P is 21, which is 21/161 = 0.13 or 13%, same as shown in the table.

As you can see, the results are in decimal numbers. We can format these with percentage symbols using the formattable package.

install.packages('formattable')
library(formattable)

result <- portfolios %>%
    group_by(portfolio) %>%
    mutate(percent = formattable::percent(amount / sum(amount)))

result

# A tibble: 10 × 4
# Groups:  portfolio [2]
portfolio  stock amount percent
<chr>  <chr>  <dbl> <formttbl>
1 Portfolio A P  21 13.04%
2 Portfolio A Q  62 38.51%
3 Portfolio A R  43 26.71%
4 Portfolio A S  15 9.32%
5 Portfolio A T  20 12.42%
6 Portfolio B U  32 17.30%
7 Portfolio B V  54 29.19%
8 Portfolio B W  43 23.24%
9 Portfolio B X  25 13.51%
10 Portfolio B Y  31 16.76%

We now have much more presentable results. The formattable::percent() converts the values in percent column to percentages with symbol.

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.