Factors in R Programming

In R programming, factors are variables that take on a limited number of different values. Factors are used to represent categorical data.

Some examples of factors:

  • A common example of a factor is gender, which can have category values as Male and Female.
  • A data field such as marital status may contain only values from single, married, separated, divorced, or widowed.
  • For stocks, we can have them categorized as Large-cap, Mid-cap, and Small-cap

In R, the function factor() is used to encode a vector as a factor. In the following example, we first create a vector which for this example categorizes stocks as Large-cap, Mid-cap, and Small-cap. And then we use the factor() function to encode this vector as a factor.

#The following vector classifies 5 stocks
    stock_vector <- c("large-cap","small-cap","large-cap","mid-cap","small-cap")
    # Convert the stock vector to a factor
    stock_factor <- factor(stock_vector)
    #Print the stock_factor

When you print this vector, the results will look as follows:

> stock_factor
    [1] large-cap small-cap large-cap mid-cap   small-cap
    Levels: large-cap mid-cap small-cap

Levels and Order

When you print the factor, you can see that it also prints the Levels. By default, the levels are sorted based on their character value. However, you can change the order in which the levels will be displayed from their default sorted order, the levels= argument can be given a vector of all the possible values of the variable in the order you desire.

This content is for paid members only.

Join our membership for lifelong unlimited access to all our data science learning content and resources.