• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Finance Train

Finance Train

High Quality tutorials for finance, risk, data science

  • Home
  • Data Science
  • CFA® Exam
  • PRM Exam
  • Tutorials
  • Careers
  • Products
  • Login

Apply Functions in R

Data Science

R Programming for Data Science Apply Functions in R

In the earlier lessons, we learned about how we can use the for loop to iterate over various R objects. R also provides other functions for implicit looping such as apply, lapply and sapply which are even easier to use. It is a whole family of functions that are commonly referred to as ‘Apply’ family of functions.

The apply functions are not just for compactness of code, but also for speed. If speed is an issue, such as when working with large data sets or long-running simulations, one must avoid explicit loops (read ‘for’ loop, or ‘while’ loop) as much as possible, because with apply() function and its variants, R can do them a lot faster than you can.

The apply Function

The apply() function applies a simple function over dimensions of a data structure. The apply() function has the following structure:

> args(apply)
function (X, MARGIN, FUN, ...)
  • X is any R structure with dimensions (matrix, data frame, array .. NOT lists or vectors)
  • MARGIN is the dimension number (1 = rows, 2 = columns)
  • FUN is the function to apply (example: mean())
  • … represents additional arguments to the function

Example

Let’s define a simple matrix myMatrix which we will use to understand the apply() function. The rpois() function is a built-in function that can be used to simulate N independent Poisson random variables. For example, we can generate 30 Poisson random numbers with parameter λ = 3 as follows: > rpois(30, 3)

> myMatrix <- matrix(rpois(30,3),5)
> myMatrix
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    3    1    7    4    4    3
[2,]    2    3    5    5    2    3
[3,]    2    5    1    2    2    1
[4,]    3    5    1    4    5    3
[5,]    0    0    4    2    1    2
>

Now let’s take a few examples to understand how we can use the apply() function.

Calculate the sum of each row.

> apply(myMatrix,1,sum)
[1] 22 20 13 21  9

The apply function applied the function sum to each row (specified with MARGIN=1) of the matrix myMatrix.

We can do the same column-wise by changing MARGIN=2, as shown below:

> apply(myMatrix,2,sum)
[1] 10 14 18 17 14 12

The following two examples show row minima and column maxima:

> apply(myMatrix,1,min) #row minima
[1] 1 2 1 1 0
> apply(myMatrix,2,max) #col maxima
[1] 3 5 7 5 5 3
>

The apply function can be used with matrixes, arrays, and data frames. There are other functions which can be applied to lists and vectors.

The lapply() Function

The analog of apply() for lists is lapply(). It applies the given function to all elements of the specified list.

Example

> lapply(list(1:5,20:26),median)
[[1]]
[1] 3
[[2]]
[1] 23
>

As you can see, the lapply() function returns a list (always). You can use the unlist() to flatten the list and produce a simple vector.

> unlist(lapply(list(1:5,20:26),median))
[1]  3 23

We can also use the apply family of functions on our own custom functions as shown below:

#The custom oilCost function - oil cost for two gallons
oilCost <− function ( x=0 ) {
  output <− x*2
}
#sample vector containing oil prices for the past 5 days (per gallon)
oil_price <- c(10,9,8,9,11)
#Use apply function to calculate two gallon oil cost each day
lapply(oil_price,oilCost)

Result will be a list with cost for two gallons.

> lapply(oil_price,oilCost)
[[1]]
[1] 20
[[2]]
[1] 18
[[3]]
[1] 16
[[4]]
[1] 18
[[5]]
[1] 22
>

Let’s say our custom function allows us to specify the quantity (instead of the fixed quantity of 2). We can modify our function as follows:

#The custom oilCost function - oil cost for a specified quantity of gallons
oilCost <− function ( x,qty ) {
  output <− x*qty
}
#sample vector containing oil prices for the past 5 days (per gallon)
oil_price <- c(10,9,8,9,11)
#Use apply function to calculate three gallon oil cost each day
unlist(lapply(oil_price,oilCost,qty=3))
#Use apply function to calculate four gallon oil cost each day
unlist(lapply(oil_price,oilCost,qty=4))

The function oilCost is applied to the vector oil_price and the qty is supplied as the third argument.

Result:

> #Use apply function to calculate three gallon oil cost each day
> unlist(lapply(oil_price,oilCost,qty=3))
[1] 30 27 24 27 33
> 
> #Use apply function to calculate three gallon oil cost each day
> unlist(lapply(oil_price,oilCost,qty=4))
[1] 40 36 32 36 44

The sapply() Function

As we saw, the lapply function takes a list or a vector, but always returns a list. This is because the input list can have different elements of different classes. That’s why the output must be a list to accomodate output elements having different classes. However, there are many scenarios where all the elements of the output are of the same class (e.g. integer), and we would rather get the output as a vector. Earlier we achieved it using the unlist function. However, the sapply is a better choice as it automatically does this for us. It takes a list or a vector as an input and returns a vector or a matrix when possible. internally, it calls lapply and simplifies the output. The following example shows the difference between lapply and sapply.

> myList <- list(Pois = rpois(10,3), Norm = rnorm(5), Unif = runif(5,1,10))
> myList
$Pois
 [1] 7 5 3 3 2 2 0 1 4 4
$Norm
[1]  1.56840563 -0.08843082  0.19965929  0.20645959  0.85374249
$Unif
[1] 8.729636 8.109225 7.884665 9.558795 1.016197
> lapply(myList,mean)
$Pois
[1] 3.1
$Norm
[1] 0.5479672
$Unif
[1] 7.059704
> sapply(myList,mean)
     Pois      Norm      Unif 
3.1000000 0.5479672 7.0597036 
>

In this lesson, we saw three functions in the apply family, namely, apply(), lapply(), and sapply(). The entire family is made up of the apply(), lapply() , sapply(), vapply(), mapply(), rapply(), and tapply() functions.

Previous Lesson
Back to Course
Next Lesson

Primary Sidebar

In this Course

Course Home
R - Core Programming Principles
Relational Operators in R
Logical Operators in R
Conditional Statements in R
For Loop in R Programming
While and Repeat Loop in R Programming
Functions in R Programming
Creating Functions in R
Apply Functions in R
Importing Data in R
Importing Data from External Data Sources in R
Importing Data Using read.csv in R
Import Data using read.table in R
Importing Data Using data.table – fread in R
Importing Data from Excel in R
Using XLConnect in R Programming
Importing Data from a Database in R
SQL Queries from R
Importing Data from Web in R
Return to R Programming for Data Science

Latest Tutorials

    • Data Visualization with R
    • Derivatives with R
    • Machine Learning in Finance Using Python
    • Credit Risk Modelling in R
    • Quantitative Trading Strategies in R
    • Financial Time Series Analysis in R
    • VaR Mapping
    • Option Valuation
    • Financial Reporting Standards
    • Fraud
Facebook Group

Membership

Unlock full access to Finance Train and see the entire library of member-only content and resources.

Subscribe

Footer

Recent Posts

  • How to Improve your Financial Health
  • CFA® Exam Overview and Guidelines (Updated for 2021)
  • Changing Themes (Look and Feel) in ggplot2 in R
  • Coordinates in ggplot2 in R
  • Facets for ggplot2 Charts in R (Faceting Layer)

Products

  • Level I Authority for CFA® Exam
  • CFA Level I Practice Questions
  • CFA Level I Mock Exam
  • Level II Question Bank for CFA® Exam
  • PRM Exam 1 Practice Question Bank
  • All Products

Quick Links

  • Privacy Policy
  • Contact Us

CFA Institute does not endorse, promote or warrant the accuracy or quality of Finance Train. CFA® and Chartered Financial Analyst® are registered trademarks owned by CFA Institute.

Copyright © 2021 Finance Train. All rights reserved.

  • About Us
  • Privacy Policy
  • Contact Us