Importing Data Using data.table – fread in R

R has a data manipulation package called data.table() which is extensively used for data manipulation. Specially the package is very useful as a data cleaning tool for big data.

The data.table package comes with a function called fread which is a very efficient and speedy function for reading data from files. It is similar to read.table but faster and more convenient. The good thing is that it detects column types (colClasses) and separators (sep) automatically, however you can always specify them manually. Similarly, it can automatically detect the header names and apply to the columns. If the headers are not found, it will conveniently name them automatically.

Installing and Loading data.table Package

Before we can use the functions in data.table package, we need to install and load the package in R. We can do so using the install.packages() and library() command.

> install.packages("data.table")
> library(data.table)

Importing Data Using fread

Once the package is loaded, we can use the fread function to read the data as shown below:

Downloads
> mydata <-fread("GS-Stock-Prices.txt")
> mydata
          Time   Open   High    Low   Last  Volume
 1:  1/24/2017 231.86 236.06 230.84 233.68 4448100
 2:  1/23/2017 231.86 233.75 230.75 232.67 3136100
 3:  1/20/2017 231.62 233.23 230.54 232.20 5211800
 4:  1/19/2017 234.07 234.75 230.62 231.41 4561800
 5:  1/18/2017 236.00 237.69 231.52 234.29 7590400
 6:  1/17/2017 242.94 243.06 235.61 235.74 6277100
 7:  1/13/2017 245.43 247.77 242.91 244.30 4186000
 8:  1/12/2017 245.06 245.47 241.57 243.84 4022300
 9:  1/11/2017 242.77 245.84 242.00 245.76 3532500
10:  1/10/2017 240.87 243.44 239.05 242.57 3432900
11:   1/9/2017 243.25 244.69 241.47 242.89 3022700
12:   1/6/2017 242.29 246.20 241.37 244.90 3591000
13:   1/5/2017 242.72 243.23 236.78 241.32 3562600
14:   1/4/2017 241.44 243.32 240.03 243.13 2728700
15:   1/3/2017 242.70 244.97 237.97 241.57 4384200
16: 12/30/2016 238.51 240.50 237.40 239.45 2355500
17: 12/29/2016 240.75 241.07 236.64 238.18 2619000
18: 12/28/2016 243.69 244.50 240.44 240.65 3052900
19: 12/27/2016 241.95 242.59 240.40 241.56 1998100
>

fread - Drop and Select

The fread command has two special arguments called drop and select which can be used to select or drop the variables/columns that we need to import.

In our dataset, we have six columns and we can use these arguments to select or drop the columns we want. Some examples below:

# Drop columns 2 to 4. Import only Time, last and Volume
fread("GS-Stock-Prices.txt", drop = 2:4)
# Import only column 1 and 5, i.e., Time and Last price.
fread("GS-Stock-Prices.txt", select = c(1, 5))
# Drop 'Open', 'Last' and 'Volume' columns
fread("GS-Stock-Prices.txt", drop = c("Open", "Last", "Volumn")
#import only 'Time' and 'Last Price' columns
fread("GS-Stock-Prices.txt", select = c("Time", "Last")

Lesson Resources

Member Only