- Financial Time Series Data
- Exploring Time Series Data in R
- Plotting Time Series in R
- Handling Missing Values in Time Series
- Creating a Time Series Object in R
- Check if an object is a time series object in R
- Plotting Financial Time Series Data (Multiple Columns) in R
- Characteristics of Time Series
- Stationary Process in Time Series
- Transforming a Series to Stationary
- Time Series Transformation in R
- Differencing and Log Transformation
- Autocorrelation in R
- Time Series Models
- ARIMA Modeling
- Simulate White Noise (WN) in R
- Simulate Random Walk (RW) in R
- AutoRegressive (AR) Model in R
- Estimating AutoRegressive (AR) Model in R
- Forecasting with AutoRegressive (AR) Model in R
- Moving Average (MA) Model in R
- Estimating Moving Average (MA) Model in R
- ARIMA Modelling in R
- ARIMA Modelling - Identify Model for a Time Series
- Forecasting with ARIMA Modeling in R - Case Study
- Automatic Identification of Model Using auto.arima() Function in R
- Financial Time Series in R - Course Conclusion
Handling Missing Values in Time Series
In the examples we saw earlier, we had good quality data with all values available for all time indexes. However, in real life, the data may contain missing values which will influence our analysis. Depending on the nature of data, we may choose to ignore missing values. However, in some cases it might be more suitable to estimate and fill the missing values. Data scientists use various techniques to estimate missing values. One common technique is to take the mean of the time series and replace NA with the mean value. Depending on the data this may or may not be suitable. For example, if the data is about loan borrowers and there are missing values in the loan interest rate, then the data scientist may decide to use the average interest rates for missing values, or if he clearly sees a pattern such as the interest rates being higher for self-employed individuals compared to salaries individuals, then the data scientist may decide to fill the missing values with means based on the categories based on their employment status.
Handling Missing Values in R
We will use our GDP data example to understand how we can estimate and fill missing values in R. Since we don't have access to a real dataset with missing values, we will make one. We will create a copy of our GDP_data
dataset and then deliberately turn some values to NA. This is done by the following code:
> GDP_mod <-GDP_data
> GDP_mod[7] <- NA
> GDP_mod
Qtr1 Qtr2 Qtr3 Qtr4
2014 17025.2 17285.6 17569.4 17692.2
2015 17783.6 17998.3 NA 18222.8
2016 18281.6 18450.1 18675.3 18869.4
>
As you can see, we have made the GDP value for 2015 Q3 blank.
Plotting the Time Series
We can plot the new time series using the plot.ts()
function.
> plot.ts(GDP_mod)
>
As you can see the value of 2015 Q3 is missing.
Calculate GDP Mean
We will fill the missing value with the average of GDP over the three years. In R, we can calculate the mean using the mean()
function. However, in this case, the mean()
function will fail because of the presence of missing values. We can remove the missing values using the parameter na.rm=TRUE
to calculate the mean by removing all missing values.
> #Calculating mean() will fail and return NA
> mean(GDP_mod)
[1] NA
> #Calculate mean by removing all missing values
> mean(GDP_mod,na.rm=TRUE)
[1] 17986.68
>
Replace NA's with Mean
Now that we have the mean of the series, we can replace the missing values with the mean value as shown below:
> #Replace missing values with mean
> GDP_mod[7] <- mean(GDP_mod, na.rm = TRUE)
> print the series and notice that the missing value is now filled.
> print(GDP_mod)
Qtr1 Qtr2 Qtr3 Qtr4
2014 17025.20 17285.60 17569.40 17692.20
2015 17783.60 17998.30 17986.68 18222.80
2016 18281.60 18450.10 18675.30 18869.40
>
Plot Both Original and the Modified Series
We will now plot both the original GDP_data
and the modified GDP_mod
time series to see how well the mean estimates the original value.
> plot(GDP_data)
> points(GDP_mod, type = "l", col = 2, lty = 3)
>
In the below chart, the black line represents the original data and the red dotted line represents the modified data. As we can see, the mean is not really a good indicator of the actual GDP in that quarter.
Related Downloads
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $39 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.