Importing Data Using data.table – fread in R

R has a data manipulation package called data.table() which is extensively used for data manipulation. Specially the package is very useful as a data cleaning tool for big data.

The data.table package comes with a function called fread which is a very efficient and speedy function for reading data from files. It is similar to read.table but faster and more convenient. The good thing is that it detects column types (colClasses) and separators (sep) automatically, however you can always specify them manually. Similarly, it can automatically detect the header names and apply to the columns. If the headers are not found, it will conveniently name them automatically.

Installing and Loading data.table Package

Before we can use the functions in data.table package, we need to install and load the package in R. We can do so using the install.packages() and library() command.

> install.packages("data.table")
> library(data.table)

Importing Data Using fread

Once the package is loaded, we can use the fread function to read the data as shown below:

Downloads
> mydata <-fread("GS-Stock-Prices.txt")
> mydata
          Time   Open   High    Low   Last  Volume
 1:  1/24/2017 231.86 236.06 230.84 233.68 4448100
 2:  1/23/2017 231.86 233.75 230.75 232.67 3136100
 3:  1/20/2017 231.62 233.23 230.54 232.20 5211800
 4:  1/19/2017 234.07 234.75 230.62 231.41 4561800
 5:  1/18/2017 236.00 237.69 231.52 234.29 7590400
 6:  1/17/2017 242.94 243.06 235.61 235.74 6277100
 7:  1/13/2017 245.43 247.77 242.91 244.30 4186000
 8:  1/12/2017 245.06 245.47 241.57 243.84 4022300
 9:  1/11/2017 242.77 245.84 242.00 245.76 3532500
10:  1/10/2017 240.87 243.44 239.05 242.57 3432900
11:   1/9/2017 243.25 244.69 241.47 242.89 3022700
12:   1/6/2017 242.29 246.20 241.37 244.90 3591000
13:   1/5/2017 242.72 243.23 236.78 241.32 3562600
14:   1/4/2017 241.44 243.32 240.03 243.13 2728700
15:   1/3/2017 242.70 244.97 237.97 241.57 4384200
16: 12/30/2016 238.51 240.50 237.40 239.45 2355500
17: 12/29/2016 240.75 241.07 236.64 238.18 2619000
18: 12/28/2016 243.69 244.50 240.44 240.65 3052900
19: 12/27/2016 241.95 242.59 240.40 241.56 1998100
>

fread - Drop and Select

The fread command has two special arguments called drop and select which can be used to select or drop the variables/columns that we need to import.

In our dataset, we have six columns and we can use these arguments to select or drop the columns we want. Some examples below:

# Drop columns 2 to 4. Import only Time, last and Volume
fread("GS-Stock-Prices.txt", drop = 2:4)
# Import only column 1 and 5, i.e., Time and Last price.
fread("GS-Stock-Prices.txt", select = c(1, 5))
# Drop 'Open', 'Last' and 'Volume' columns
fread("GS-Stock-Prices.txt", drop = c("Open", "Last", "Volumn")
#import only 'Time' and 'Last Price' columns
fread("GS-Stock-Prices.txt", select = c("Time", "Last")

Lesson Resources

All Users

You may find these interesting

Operational Risk Data
For any bank, the measurement and management of operational risk is of prime importance. One of the...
Finance Train Premium
Accelerate your finance career with cutting-edge data skills.
Join Finance Train Premium for unlimited access to a growing library of ebooks, projects and code examples covering financial modeling, data analysis, data science, machine learning, algorithmic trading strategies, and more applied to real-world finance scenarios.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Accelerate your finance career with cutting-edge data skills.

Join Finance Train Premium for unlimited access to a growing library of ebooks, projects and code examples covering financial modeling, data analysis, data science, machine learning, algorithmic trading strategies, and more applied to real-world finance scenarios.