• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Finance Train

Finance Train

High Quality tutorials for finance, risk, data science

  • Home
  • Data Science
  • CFA® Exam
  • PRM Exam
  • Tutorials
  • Careers
  • Products
  • Login

Factors in R Programming

Data Science

Getting Started with R Programming Factors in R Programming

In R programming, factors are variables that take on a limited number of different values. Factors are used to represent categorical data.

Some examples of factors:

  • A common example of a factor is gender, which can have category values as Male and Female.
  • A data field such as marital status may contain only values from single, married, separated, divorced, or widowed.
  • For stocks, we can have them categorized as Large-cap, Mid-cap, and Small-cap

In R, the function factor() is used to encode a vector as a factor. In the following example, we first create a vector which for this example categorizes stocks as Large-cap, Mid-cap, and Small-cap. And then we use the factor() function to encode this vector as a factor.

#The following vector classifies 5 stocks
    stock_vector <- c("large-cap","small-cap","large-cap","mid-cap","small-cap")
    # Convert the stock vector to a factor
    stock_factor <- factor(stock_vector)
    #Print the stock_factor
    stock_factor

When you print this vector, the results will look as follows:

> stock_factor
    [1] large-cap small-cap large-cap mid-cap   small-cap
    Levels: large-cap mid-cap small-cap
    >

Levels and Order

When you print the factor, you can see that it also prints the Levels. By default, the levels are sorted based on their character value. However, you can change the order in which the levels will be displayed from their default sorted order, the levels= argument can be given a vector of all the possible values of the variable in the order you desire.

Factors can be unordered or ordered. For example, we can consider the gender factor (Male and Female) to be an unordered factor as it is not important which ones come first. However, some other categories may have an order associated with them, for example, in our stock factor we may want to have them ordered as per their market capitalization (Mid-cap being the smallest and large-cap being the largest). If the ordering should also be used when performing comparisons, use the optional ordered=TRUE argument. In this case, the factor is known as an ordered factor.

We can now update our factor to have pre-defined levels and set the order to TRUE.

# Convert the stock vector to a factor
    stock_factor <- factor(stock_vector, ordered=TRUE, levels=c("small-cap", "mid-cap", "large-cap"))
    #Print the stock_factor
    stock_factor

Results:

> #Print the stock_factor
    > stock_factor
    [1] large-cap small-cap large-cap mid-cap   small-cap
    Levels: small-cap < mid-cap < large-cap
    >

Changing Levels

Sometimes, you may have a factor with values in it and you may want to change the names of those levels for more clarity or for relating it to something else in your model. In R, you can do so using the levels() function. Let’s say that our original factor contained letters L, M and S to represent the three types of stocks. We can change the levels to Large-cap, Mid-cap and Small-cap using the levels() function.

# The following vector classifies 5 stocks
    stock_vector <- c("L","S","L","M","S")
    # Convert the stock vector to a factor
    stock_factor <- factor(stock_vector, ordered=TRUE, levels=c("S","M","L"))
    levels(stock_factor) <- c("small-cap", "mid-cap", "large-cap")
    #Print the stock_factor
    stock_factor

In the results, you will have new levels applied to the factor.

> #Print the stock_factor
    > stock_factor
    [1] large-cap small-cap large-cap mid-cap   small-cap
    Levels: small-cap < mid-cap < large-cap
    >
</div>
<h2>Summarize a Factor</h2>
<p>We can use the <code>summarize()</code> function to summarize the contents of the factor variable. As you can see, it prints a quick snapshot of how many stocks you have of each type in your portfolio.
</p>
<pre class="lang-r">> #Summarize stock_factor
> summary(stock_factor)
small-cap   mid-cap large-cap 
        2         1         2 
>

Use of Ordered Factor

In R the most apparent effect of using ordered vs. unordered factor is in pretty printing of the output. Apart from this, ordering and levels can be important in linear modelling because the first level is used as the baseline level. We will learn about these use cases when we learn linear modelling, however, here we will take a simple example to understand the use of ordering.

Let’s say you have five stock traders in your team, and you have their performance evaluated as "Poor", "Average", and "Good". The following R script shows how we can compare the performances of these traders.

> # The following vector classifies 5 stocks
> performance_vector <- c("Good","Average","Poor","Poor","Good")
> 
> # Convert the stock vector to a factor
> performance_factor <- factor(performance_vector, ordered=TRUE, levels=c("Poor","Average","Good"))
> 
> #Print the stock_factor
> performance_factor
[1] Good    Average Poor    Poor    Good   
Levels: Poor < Average < Good
> 
> #Summarize stock_factor
> summary(performance_factor)
   Poor Average    Good 
      2       1       2 
> 
> #Performance value of 2nd and 4th trader
> pv2 <- performance_factor[2]
> pv4 <- performance_factor[4]
> 
> #Is trader 2 better than trader 4?
> pv2 > pv4
[1] TRUE
>

Note that if the factor was not ordered this comparison would not work. It will give you a warning message that comparison operator '>' is not meaningful. However, once you set the ordered=TRUE, it will recognize the comparison operator.

Statistical Modelling

One of the most important uses of factors is in statistical modeling. Since categorical variables enter into statistical models differently than continuous variables, storing data as factors insures that the modeling functions will treat such data correctly. (Note: Categorical variables are different from continuous variables in that a categorical variable can take on a limited number of categories while a continuous variable can have an infinite number of values.)

Previous Lesson
Back to Course

Primary Sidebar

In this Course

Course Home
Installing R Software on Your Computer
Performing Basic Math Operations in R
Setting Up a Working Directory in R
Installing and Using RStudio with R
Using Variables in R
Data Types in R
Creating and Using Vectors in R
Matrices in R Programming
Factors in R Programming
Return to Getting Started with R Programming

Latest Tutorials

    • Data Visualization with R
    • Derivatives with R
    • Machine Learning in Finance Using Python
    • Credit Risk Modelling in R
    • Quantitative Trading Strategies in R
    • Financial Time Series Analysis in R
    • VaR Mapping
    • Option Valuation
    • Financial Reporting Standards
    • Fraud
Facebook Group

Membership

Unlock full access to Finance Train and see the entire library of member-only content and resources.

Subscribe

Footer

Recent Posts

  • How to Improve your Financial Health
  • CFA® Exam Overview and Guidelines (Updated for 2021)
  • Changing Themes (Look and Feel) in ggplot2 in R
  • Coordinates in ggplot2 in R
  • Facets for ggplot2 Charts in R (Faceting Layer)

Products

  • Level I Authority for CFA® Exam
  • CFA Level I Practice Questions
  • CFA Level I Mock Exam
  • Level II Question Bank for CFA® Exam
  • PRM Exam 1 Practice Question Bank
  • All Products

Quick Links

  • Privacy Policy
  • Contact Us

CFA Institute does not endorse, promote or warrant the accuracy or quality of Finance Train. CFA® and Chartered Financial Analyst® are registered trademarks owned by CFA Institute.

Copyright © 2021 Finance Train. All rights reserved.

  • About Us
  • Privacy Policy
  • Contact Us