R Programming

Exploring Crude Oil (CL) Future Data from Quandl in R

It would be useful to download a dataset of historical future contract prices and explore it using R programming language. For this purpose we will load the futures historical data of Crude Oil (CME_CL1) from the CHRIS database of Quandl. 

The nomenclature to download futures data from Quandl is the following: 


CHRIS refers to the database of the future continuous series.  EXCHANGE refers to the name of the Exchange that offers the asset, CODE is the future contract code and NUMBER refers to the depth of the chained contract.

Download Historical Futures Data

We will download historical future data from CME exchange, of WTI Crude Oil which have the CL code and with a depth of 1 as the rollover is performed with the front contract. This future series uses the last trading day of the front contract as the roll date rule. 

# Ensure that you have loaded the Quandl package in your R environment and hasv setup the Quandl API key.

#  Get Historical Futures Prices: Crude Oil Futures from Quandl. Contract Code is CL

CME_CL_Data <- Quandl('CHRIS/CME_CL1')

# Show the first 10 rows of the CME_CL_Data

head(CME_CL_Data, n=10)
       Date  Open  High   Low  Last Change Settle Volume Previous Day Open Interest
1  2019-07-31 58.34 58.82 57.81 57.89   0.53  58.58 604858                     406009
2  2019-07-30 57.07 58.47 56.96 58.34   1.18  58.05 543567                     411795
3  2019-07-29 56.20 57.10 55.81 57.05   0.67  56.87 541398                     410680
4  2019-07-26 55.93 56.57 55.68 56.16   0.18  56.20 474196                     417324
5  2019-07-25 55.90 56.99 55.81 55.91   0.14  56.02 557841                     421203
6  2019-07-24 57.23 57.64 55.33 55.90   0.89  55.88 742383                     431404
7  2019-07-23 56.17 57.47 55.74 57.17   0.55  56.77 520910                     438301
8  2019-07-22 56.22 56.84 55.72 56.09   0.59  56.22  17547                      19557
9  2019-07-19 55.72 56.36 54.99 55.74   0.33  55.63 101400                      55908
10 2019-07-18 56.60 57.32 54.72 55.63   1.48  55.30 204452                      83809

Explore Datatypes of Columns

The str() command is very useful to know the different data types of the column’s dataset , as well as the object type of the dataset. The dataset is called CME_CL_Data which in terms of R,  is a data frame with numeric columns except the Date column that has date format.

# Explore the datatypes of the column variables


'data.frame':   9129 obs. of  9 variables:
 $ Date                      : Date, format: "2019-07-29" "2019-07-26" "2019-07-25" "2019-07-24" ...
 $ Open                      : num  56.2 55.9 55.9 57.2 56.2 ...
 $ High                      : num  57.1 56.6 57 57.6 57.5 ...
 $ Low                       : num  55.8 55.7 55.8 55.3 55.7 ...
 $ Last                      : num  57 56.2 55.9 55.9 57.2 ...
 $ Change                    : num  0.67 0.18 0.14 0.89 0.55 0.59 0.33 1.48 0.84 1.96 ...
 $ Settle                    : num  56.9 56.2 56 55.9 56.8 ...
 $ Volume                    : num  527689 474196 557841 742383 520910 ...
 $ Previous Day Open Interest: num  410680 417324 421203 431404 438301 ...
 - attr(*, "freq")= chr "daily"

Plot the Futures Prices

We can plot the futures prices using the ggplot library in R.

# Plot the last price field of the Crude Oil Futures Historical Prices

ggplot(CME_CL_Data, aes(Date, Last)) + geom_line(color = "blue") + xlab("") + ylab("Daily Prices")+   ggtitle("Future Daily Prices WTI Crude Oil") 

The graph shows that future prices of WTI Crude Oil had a significant volatility in the last 12 years with an extreme volatility period on 2008-2009. Later we will explore some properties of future prices as well as analyze the volatility of the CL Crude Oil contract by year.

Future Prices CL 1983-2019

Understanding the Variables of the Dataset

Let us try to understand the variables of the dataset of CL futures that we have downloaded from Quandl. The first three columns are the Open, High and Low price of the contract for each trading day. The Last column is the price of the last trade on the day.

The Settle column shows the settlement price for each day. The settlement price is the price that is set by each settlement or pit committee in order to reflect with more accuracy the contract price at the end of the trading session. In liquid markets, when trading is active there is a slight difference between the last trade price and the settlement price as trades are continuous with little fluctuations in prices.

On the other hand, in illiquid markets, the last trade could occur two or three hours before the market closes, and important events that affect a contract’s future price might arise between the last trade time and the market close time. In this situation the settlement committee may make adjustments to the price so that it reflects the current market condition for that asset. It is common that the committee observe prices for adjacent month’s contracts which have accounted for the available information.

The Settle price is key in the future market, as it is used to mark trader’s positions to market and changes in the settlement price imply changes in trader’s equity that could finish in the worst scenario in a margin call if the trader’s equity is lower than the maintenance margin

The Change column shows the difference between the settlement prices on each trading day. The next two columns are Volume and Previous Day Open Interest. These variables are used to provide information about liquidity and trading activity in the market and it is important to point out the special differences between them.

Volume is the amount of contracts that were exchanged between purchasers and sellers while Open Interest is the total number of outstanding contracts that are in the market. The Volume variable always increases along the trading session as more transactions boost up the daily volume. On the other hand Open Interest can increase or decrease. Open interest increases when new contracts are traded between buyers and sellers and decreases when traders or holders reverse their positions.

At the beginning of the contract life, Open Interest is low and tends to increase when the contract is approaching maturity. When the contract is nearer to expiration, the Open Interest usually falls as traders close their positions to avoid the delivery of the asset.

Calculate Basic Statistics 

We will create some R code to get  basic statistics and other valuable information of the dataset. We will perform some manipulation task of the dataset and then select numeric columns of the dataset such as Open,High,Last,Previous Open Interest  to get an overview of  these columns:

# Inverse the order of CME_CL_Data dataframe to have the oldest data at first and the latest data at the bottom. We would use the arrange command from dplyr to achieve this. The data frame CME_CL_Data_ starts from the oldest data of the CL future continuous series from Quandl
CME_CL_Data_ <- CME_CL_Data %>% arrange(rev(rownames(.)))
# It seems that dates are not consecutive. This issue was observed when we reverse 
# the series and is present most at the beginning of the series. So, sometimes
# we have nonconsecutive dates in subsequent rows. To avoid this, we will sort
# the CME_CL_DATA_ by consecutive dates. The command arrange from dplyr would # do this task.

CME_CL_Data_ <- CME_CL_Data_ %>%
  mutate(date = as.Date(Date, "%d-%m-%Y")) %>%
# Calculate the returns 
CME_CL_Data_$returns <- as.numeric(c('NA',diff(log(CME_CL_Data_$Last))))
# Remove outliers from the CME_CL_DATA_ , that are values with daily returns greater than 0.3

CME_CL_Data_ <- CME_CL_Data_[!abs(CME_CL_Data_$returns) > 0.3,]

# Select only numeric columns from the CME_CL_Data_ dataframe in order to summarize   the numeric columns. This can be done with the select_if command from dplyr package

CL_DATA <- select_if(CME_CL_Data_, is.numeric)

# Get the number of NAN values by column
na_cols_count <-sapply(CL_DATA, function(y) sum(length(which(is.na(y)))))
# The Change column has a lot of null values. So we would 
# separate this column from the others. Using the select command from dplyr 
# package we can do this task. The c() vector contain the columns position that we want to keep.

CL_DATA <- select(CL_DATA,c(1,2,3,4,6,7,8,9))

# Remove rows with Nan values using complete.cases command
CL_DATA <- CL_DATA[complete.cases(CL_DATA),]

# The statistic function will calculate some metrics such as mean, standard deviation, median, max and min for each of the columns. 

statistics <- do.call(data.frame, 
                      list(mean = round(apply(CL_DATA, 2, mean),4),
                           sd = round(apply(CL_DATA, 2, sd),4),
                           median = round(apply(CL_DATA, 2, median),4),
                           min = round(apply(CL_DATA, 2, min),4),
                         max = round(apply(CL_DATA, 2, max),4)))

              mean          sd     median     min          max
Open          43.0119     28.5358    29.9500 10.0000     145.1900
High          43.5897     28.9012    30.2500 11.0200     147.2700
Low           42.4054     28.1224    29.6500  9.7500     143.2200
Last          43.0150     28.5298    29.9700 10.4200     145.2900
Settle        43.0150     28.5298    29.9700 10.4200     145.2900
Volume        155852      192429     66893    0.0000     1282869
Previous Day 
Open Interest 147870      136119     95258   0.0000       642793
returns       0.0001      0.0235     0.0004 -0.1745       0.1641
Finance Train Subscription

Unlock full access to Finance Train and see the entire library of member-only content and resources.