Exploring Crude Oil (CL) Future Data from Quandl in R
It would be useful to download a dataset of historical future contract prices and explore it using R programming language. For this purpose we will load the futures historical data of Crude Oil (CME_CL1) from the CHRIS database of Quandl.
The nomenclature to download futures data from Quandl is the following:
CHRIS/{EXCHANGE}_{CODE}{NUMBER}
CHRIS refers to the database of the future continuous series. EXCHANGE refers to the name of the Exchange that offers the asset, CODE is the future contract code and NUMBER refers to the depth of the chained contract.
Download Historical Futures Data
We will download historical future data from CME exchange, of WTI Crude Oil which have the CL code and with a depth of 1 as the rollover is performed with the front contract. This future series uses the last trading day of the front contract as the roll date rule.
1# Ensure that you have loaded the Quandl package in your R environment and hasv setup the Quandl API key.
2
3# Get Historical Futures Prices: Crude Oil Futures from Quandl. Contract Code is CL
4
5CME_CL_Data <- Quandl('CHRIS/CME_CL1')
6
7# Show the first 10 rows of the CME_CL_Data
8
9head(CME_CL_Data, n=10)
10 Date Open High Low Last Change Settle Volume Previous Day Open Interest
111 2019-07-31 58.34 58.82 57.81 57.89 0.53 58.58 604858 406009
122 2019-07-30 57.07 58.47 56.96 58.34 1.18 58.05 543567 411795
133 2019-07-29 56.20 57.10 55.81 57.05 0.67 56.87 541398 410680
144 2019-07-26 55.93 56.57 55.68 56.16 0.18 56.20 474196 417324
155 2019-07-25 55.90 56.99 55.81 55.91 0.14 56.02 557841 421203
166 2019-07-24 57.23 57.64 55.33 55.90 0.89 55.88 742383 431404
177 2019-07-23 56.17 57.47 55.74 57.17 0.55 56.77 520910 438301
188 2019-07-22 56.22 56.84 55.72 56.09 0.59 56.22 17547 19557
199 2019-07-19 55.72 56.36 54.99 55.74 0.33 55.63 101400 55908
2010 2019-07-18 56.60 57.32 54.72 55.63 1.48 55.30 204452 83809
21
Explore Datatypes of Columns
The str() command is very useful to know the different data types of the column’s dataset , as well as the object type of the dataset. The dataset is called CME_CL_Data which in terms of R, is a data frame with numeric columns except the Date column that has date format.
1# Explore the datatypes of the column variables
2
3str(CME_CL_Data)
4
5'data.frame': 9129 obs. of 9 variables:
6 $ Date : Date, format: "2019-07-29" "2019-07-26" "2019-07-25" "2019-07-24" ...
7 $ Open : num 56.2 55.9 55.9 57.2 56.2 ...
8 $ High : num 57.1 56.6 57 57.6 57.5 ...
9 $ Low : num 55.8 55.7 55.8 55.3 55.7 ...
10 $ Last : num 57 56.2 55.9 55.9 57.2 ...
11 $ Change : num 0.67 0.18 0.14 0.89 0.55 0.59 0.33 1.48 0.84 1.96 ...
12 $ Settle : num 56.9 56.2 56 55.9 56.8 ...
13 $ Volume : num 527689 474196 557841 742383 520910 ...
14 $ Previous Day Open Interest: num 410680 417324 421203 431404 438301 ...
15 - attr(*, "freq")= chr "daily"
16
17
Plot the Futures Prices
We can plot the futures prices using the ggplot
library in R.
1# Plot the last price field of the Crude Oil Futures Historical Prices
2
3ggplot(CME_CL_Data, aes(Date, Last)) + geom_line(color = "blue") + xlab("") + ylab("Daily Prices")+ ggtitle("Future Daily Prices WTI Crude Oil")
4
The graph shows that future prices of WTI Crude Oil had a significant volatility in the last 12 years with an extreme volatility period on 2008-2009. Later we will explore some properties of future prices as well as analyze the volatility of the CL Crude Oil contract by year.
Future Prices CL 1983-2019
Understanding the Variables of the Dataset
Let us try to understand the variables of the dataset of CL futures that we have downloaded from Quandl. The first three columns are the Open, High and Low price of the contract for each trading day. The Last column is the price of the last trade on the day.
The Settle column shows the settlement price for each day. The settlement price is the price that is set by each settlement or pit committee in order to reflect with more accuracy the contract price at the end of the trading session. In liquid markets, when trading is active there is a slight difference between the last trade price and the settlement price as trades are continuous with little fluctuations in prices.
On the other hand, in illiquid markets, the last trade could occur two or three hours before the market closes, and important events that affect a contract’s future price might arise between the last trade time and the market close time. In this situation the settlement committee may make adjustments to the price so that it reflects the current market condition for that asset. It is common that the committee observe prices for adjacent month’s contracts which have accounted for the available information.
The Settle price is key in the future market, as it is used to mark trader’s positions to market and changes in the settlement price imply changes in trader’s equity that could finish in the worst scenario in a margin call if the trader’s equity is lower than the maintenance margin.
The Change column shows the difference between the settlement prices on each trading day. The next two columns are Volume and Previous Day Open Interest. These variables are used to provide information about liquidity and trading activity in the market and it is important to point out the special differences between them.
Volume is the amount of contracts that were exchanged between purchasers and sellers while Open Interest is the total number of outstanding contracts that are in the market. The Volume variable always increases along the trading session as more transactions boost up the daily volume. On the other hand Open Interest can increase or decrease. Open interest increases when new contracts are traded between buyers and sellers and decreases when traders or holders reverse their positions.
At the beginning of the contract life, Open Interest is low and tends to increase when the contract is approaching maturity. When the contract is nearer to expiration, the Open Interest usually falls as traders close their positions to avoid the delivery of the asset.
Calculate Basic Statistics
We will create some R code to get basic statistics and other valuable information of the dataset. We will perform some manipulation task of the dataset and then select numeric columns of the dataset such as Open,High,Last,Previous Open Interest to get an overview of these columns:
1# Inverse the order of CME_CL_Data dataframe to have the oldest data at first and the latest data at the bottom. We would use the arrange command from dplyr to achieve this. The data frame CME_CL_Data_ starts from the oldest data of the CL future continuous series from Quandl
2
3CME_CL_Data_ <- CME_CL_Data %>% arrange(rev(rownames(.)))
4
5# It seems that dates are not consecutive. This issue was observed when we reverse
6# the series and is present most at the beginning of the series. So, sometimes
7# we have nonconsecutive dates in subsequent rows. To avoid this, we will sort
8# the CME_CL_DATA_ by consecutive dates. The command arrange from dplyr would # do this task.
9
10CME_CL_Data_ <- CME_CL_Data_ %>%
11 mutate(date = as.Date(Date, "%d-%m-%Y")) %>%
12 arrange(date)
13
14# Calculate the returns
15
16CME_CL_Data_$returns <- as.numeric(c('NA',diff(log(CME_CL_Data_$Last))))
17
18# Remove outliers from the CME_CL_DATA_ , that are values with daily returns greater than 0.3
19
20CME_CL_Data_ <- CME_CL_Data_[!abs(CME_CL_Data_$returns) > 0.3,]
21
22# Select only numeric columns from the CME_CL_Data_ dataframe in order to summarize the numeric columns. This can be done with the select_if command from dplyr package
23
24CL_DATA <- select_if(CME_CL_Data_, is.numeric)
25
26# Get the number of NAN values by column
27na_cols_count <-sapply(CL_DATA, function(y) sum(length(which(is.na(y)))))
28
29# The Change column has a lot of null values. So we would
30# separate this column from the others. Using the select command from dplyr
31# package we can do this task. The c() vector contain the columns position that we want to keep.
32
33CL_DATA <- select(CL_DATA,c(1,2,3,4,6,7,8,9))
34
35# Remove rows with Nan values using complete.cases command
36CL_DATA <- CL_DATA[complete.cases(CL_DATA),]
37
38# The statistic function will calculate some metrics such as mean, standard deviation, median, max and min for each of the columns.
39
40statistics <- do.call(data.frame,
41 list(mean = round(apply(CL_DATA, 2, mean),4),
42 sd = round(apply(CL_DATA, 2, sd),4),
43 median = round(apply(CL_DATA, 2, median),4),
44 min = round(apply(CL_DATA, 2, min),4),
45 max = round(apply(CL_DATA, 2, max),4)))
46
47 mean sd median min max
48Open 43.0119 28.5358 29.9500 10.0000 145.1900
49High 43.5897 28.9012 30.2500 11.0200 147.2700
50Low 42.4054 28.1224 29.6500 9.7500 143.2200
51Last 43.0150 28.5298 29.9700 10.4200 145.2900
52Settle 43.0150 28.5298 29.9700 10.4200 145.2900
53Volume 155852 192429 66893 0.0000 1282869
54Previous Day
55Open Interest 147870 136119 95258 0.0000 642793
56returns 0.0001 0.0235 0.0004 -0.1745 0.1641
57
58
Unlock Premium Content
Upgrade your account to access the full article, downloads, and exercises.
You'll get access to:
- Access complete tutorials and examples
- Download source code and resources
- Follow along with practical exercises
- Get in-depth explanations