Importing Data Using read.csv in R

R provides a variety of functions for importing data files. The most common functions which we will use are read.csv()read.delim(), and read.table(). These functions are loaded in R by default as a part of the utils package when you start R.


read.csv() is used to import csv (comma separated values) files. R imports the data into a data frame.

Let's understand how to use this function using an actual data file.



Step 1: Download the file called top-100-stocks.csv on your computer. This file contains fundamental variables for top 100 stocks in the US (Data as of January 9, 2017)

Step 2: Store this file in a location of your choice. For this example, I have stored my downloaded copy of the file in the path "C:\r-programming\data\importing data".

Step 3: Set this as working directory in R using the setwd() function. You can check if the working directory is set correctly using the getwd() function.

> setwd("C:/r-programming/data/importing data")
> getwd()
[1] "C:/r-programming/data/importing data"

Step 4: Check that the CSV file exists in the working directory using the dir() function. The dir() function will list all the files stored in the working directory.

> dir()
[1] "top-100-stocks.csv"

Writing the function

read.csv("top-100-stocks.csv", stringsAsfactors = FALSE)

The first argument of read.csv function is the path to the file you want to import in R. In our case since the file is already in the working directory, we just need to specify the file name. However, if the file was in some other location, then the things would be a bit tricky because depending on your operating system, the file paths will be formed differently. To avoid having to deal with escaping backslashes in file paths, you can use the file.path() function to construct file paths that are correct, independent of the operating system you work on.

For example, to set the file path to C:/r-programming/data/importing data, we will use the file.path() function as follows:

> path <- file.path("C:","r-programming","data","importing data","top-100-stocks.csv")
> path
[1] "C:/r-programming/data/importing data/top-100-stocks.csv"

This stores the file path in the variable path, which can be passed to the read.csv() function instead of the file name.

The second argument stringsAsfactors is very important. When you import a data table into R, the columns that contain character strings can either be imported as factors or as character data. The default option is to convert character strings into factors, i.e., categorical data (example, gender, colors, types, etc). If your data is not categorical data, then you can set stringsAsfactors to FALSE.

Execute the Command

We can now execute the command and R will import the data as a data frame, as shown below:

Analyzing Results

Now that you have the data in R as a data frame, you can do all kinds of analysis on it. To get started, you can use str() or summary() function on the data frame to get a summary of the data.

> str(top100stocks)
'data.frame':    100 obs. of  9 variables:
 $ Symbol   : chr  "WINS" "CWEI" "TCK" "CRBP" ...
 $ Name     : chr  "Wins Fin Hldgs Ord" "Clayton Williams Energy" "Teck Resources Ltd" "Corbus Pharma Cmn" ...
 $ MarketCap: chr  "4,051,490" "2,009,420" "13,887,640" "368,830" ...
 $ P.E      : chr  "0.00" "0.00" "102.04" "0.00" ...
 $ EPS      : chr  "0.00" "-17.70" "0.24" "-0.37" ...
 $ NetIncome: chr  "N/A" "-98" "-1,939" "-9" ...
 $ Beta     : chr  "N/A" "2.29" "1.49" "2.57" ...
 $ Dividend : chr  "0.00" "0.00" "0.08" "0.00" ...
 $ DivYield : chr  "0.00%" "0.00%" "0.31%" "0.00%" ...

Important Notes

  • The read.csv() function has HEADER argument as true by default indicating whether the file contains the names of the variables as its first line. However, if the data being imported doesn't have a header, this should be set to false.
  • Sometimes your data may be in the EU (European Union) format, where commas are used as decimal separators and semicolons are used as field separators. In such a case, you need to import it to R using the read.csv2() function which takes care of this format difference.