# Calculate Percentage by Group in R

In this article, we will learn how to calculate percentage by group in a dataset in R programming. It is done using the `dplyr`

library. We will first create a dataset and then calculate the percentage by group. We will also look at how to format these percentages.

### Dataset

Let’s say an investor maintains two portfolios, A and B, with investment in certain stocks in each portfolio. In R, we can have this dataset in the form of a data frame.

### Our Dataframe

```
portfolios <- data.frame(portfolio=c('Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio A', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B', 'Portfolio B'),
stock=c('P','Q','R','S','T','U','V','W','X','Y'),
amount=c(21, 62, 43, 15, 20, 32, 54, 43, 25, 31))
```

### View Data

```
portfolios
portfolio stock amount
1 Portfolio A P 21
2 Portfolio A Q 62
3 Portfolio A R 43
4 Portfolio A S 15
5 Portfolio A T 20
6 Portfolio B U 32
7 Portfolio B V 54
8 Portfolio B W 43
9 Portfolio B X 25
10 Portfolio B Y 31
```

As you can see, there are two portfolios, ‘Portfolio A’ and ‘Portfolio B’. In each portfolio, the investor has invested a certain amount in different stocks. What we want to do is calculate the percentage of investment in each stock compared to the total investment in that portfolio. So, we are calculating % of investment by group (portfolio).

### Load and Install dplyr Package

In R, we can achieve this using `dplyr`

library. Let’s start with installing and loading the `dplyr`

library.

```
# install and load dplyr package
install.packages('dplyr')
library(dplyr)
```

### Calculate Percentage by Group

We can now calculate percentage by group, percentage of investment in each stock grouped by portfolio, using the following formula:

```
portfolios %>%
group_by(portfolio) %>%
mutate(percent = amount/sum(amount))
```

The results are shown below:

```
# A tibble: 10 × 4
# Groups: portfolio [2]
portfolio stock amount percent
<chr> <chr> <dbl> <dbl>
1 Portfolio A P 21 0.130
2 Portfolio A Q 62 0.385
3 Portfolio A R 43 0.267
4 Portfolio A S 15 0.0932
5 Portfolio A T 20 0.124
6 Portfolio B U 32 0.173
7 Portfolio B V 54 0.292
8 Portfolio B W 43 0.232
9 Portfolio B X 25 0.135
10 Portfolio B Y 31 0.168
```

The `mutate()`

function is used to create a new variable (in this case percent) from the dataset. This new column contains values using the formula `amount/sum(amount)`

and the calculation is grouped by `portfolio`

.

The percentages are in the last column. Let’s verify this. In Portfolio A, the total investment in 5 stocks is 161 (21 + 62 + 43 + 15 + 20). The investment in stock P is 21, which is 21/161 = 0.13 or 13%, same as shown in the table.

As you can see, the results are in decimal numbers. We can format these with percentage symbols using the formattable package.

```
install.packages('formattable')
library(formattable)
result <- portfolios %>%
group_by(portfolio) %>%
mutate(percent = formattable::percent(amount / sum(amount)))
result
# A tibble: 10 × 4
# Groups: portfolio [2]
portfolio stock amount percent
<chr> <chr> <dbl> <formttbl>
1 Portfolio A P 21 13.04%
2 Portfolio A Q 62 38.51%
3 Portfolio A R 43 26.71%
4 Portfolio A S 15 9.32%
5 Portfolio A T 20 12.42%
6 Portfolio B U 32 17.30%
7 Portfolio B V 54 29.19%
8 Portfolio B W 43 23.24%
9 Portfolio B X 25 13.51%
10 Portfolio B Y 31 16.76%
```

We now have much more presentable results. The `formattable::percent()`

converts the values in percent column to percentages with symbol.