Creating Functions in R

In the previous lesson, we learned about how to use functions in R. This is very useful because R has a lot of built-in functions which we can use to make coding easy. The beauty of functions is that we can just use them in our programs without knowing anything about their inner workings.

Remember the sd() function? We just used it in our program to calculate the standard deviation. Inside of it, it is actually a detailed program which calculates the standard deviation. However, since we are just users of the function, we simply use it and don't have to know anything about what's happening inside of it. Another important thing about functions is that we can use them multiple times in our code and because they abstract away functionality, they make our programs short, clean and error-free.

The good thing is that in R we can write our own functions. This can be useful when we know that we will be using a custom piece of functionality multiple times in our programs.

Functions are created using the function() directive and are stored as R objects just like anything else. In particular, they are R objects of class "function".

f <- function(<arguments>) {
  ## Do something interesting
}

Functions in R are “first class objects”, which means that they can be treated much like any other R object. Importantly,

  • Functions can be passed as arguments to other functions
  • Functions can be nested, so that you can define a function inside of another function

The return value of a function is the last expression in the function body to be evaluated.

Example: DoubleIt Function

Let's create a simple function that will take a numeric value as an argument and returns the double of the number as its output.

doubleMe <− function ( x = 0 ) {
  output <− 2 * x
}

A few points:

  • Our function is named doubleMe
  • When the function is executed, the last named thing is the one that comes back from the function. Note, explicit use of a return function is not required (though we can do it with return() statement). The last named thing comes back to us.
  • The function takes one argument (x), which has a default value of 0. So, even if you don't specify any input argument, it will use the default value 0 and execute the function

Sample Results:

> #print double of 7
> print(doubleMe(7))
[1] 14
> 
> #print double of 8
> print(doubleMe(8))
[1] 16
> #print double of vector. vector in ⇒ vector out
> print(doubleMe(c(2,5,7)))
[1]  4 10 14
>

Example: Duration of a Bond

Let's now take a more concrete example. Given yield per coupon period y, no. of periods to maturity n, and Coupon rate per period, c, the Macauley Modified Duration of the bond can be calculated as follows:

Macauley Modified Bond Duration: BD = 1/(1+y){(1+y)/y - [1+y + n(c-y)] / {c[(1+y)n - 1] + y}}

Suppose you have many bonds with different y, n and c. We can write a new function which can help us calculate the duration of R.

#Function to calculate duration in R
MDur=function(y,n,c){
  mduration=(1/(1+y))*((1+y)/y - (1+y+n*(c-y)) / (c*((1+y)^n - 1) + y))
  mduration
}

The inputs are y, n and c. The equation defines how we get the modified duration by using the input values. And the last statement gives the return value (the duration)

Results:

> #Calculate Modified duration (y=4%, n=20, and c=6%)
> print(MDur(0.04,20,.06))
[1] 12.57829

Example: Coefficient of Variation

Here's a function for calculating the coefficient of variation (the ratio of the standard deviation to the mean) for a vector:

coef.of.var <- function(x){
  meanval <- mean(x,na.rm=TRUE) # recall this means "ignore NAs"
  sdval <- sd(x,na.rm=TRUE)
  return(sdval/meanval)
}

This function says "if you give me an object, that I will call x, I will store its mean() as meanval, then its sd() as sdval, and then return their ratio sdval/meanval."

R has some interesting datasets which can be directly loaded into an R session. One such data set is airquality. This can be loaded into our R session using data(airquality). Once it is loaded you can use the data, check its summary, structure, etc. This dataset contains New York Air Quality Measurements. This particular dataset contains a data frame with 154 observations on 6 variables.

[,1]    Ozone    numeric    Ozone (ppb)
[,2]    Solar.R    numeric    Solar R (lang)
[,3]    Wind    numeric    Wind (mph)
[,4]    Temp    numeric    Temperature (degrees F)
[,5]    Month    numeric    Month (1--12)
[,6]    Day    numeric    Day of month (1--31)

In the following example, we load this dataset and then calculate Coefficient of Variation on the Ozone column.

> data(airquality)
> coef.of.var(airquality$Ozone)
[1] 0.7830151
>
Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.