R provides a variety of functions for importing data files. The most common functions which we will use are
read.table(). These functions are loaded in R by default as a part of the
utils package when you start R.
read.csv() is used to import csv (comma separated values) files. R imports the data into a data frame.
Let’s understand how to use this function using an actual data file.
Step 1: Download the file called
top-100-stocks.csv on your computer. This file contains fundamental variables for top 100 stocks in the US (Data as of January 9, 2017)
Step 2: Store this file in a location of your choice. For this example, I have stored my downloaded copy of the file in the path
Step 3: Set this as working directory in R using the
setwd() function. You can check if the working directory is set correctly using the
> setwd("C:/r-programming/data/importing data") > getwd()  "C:/r-programming/data/importing data"
Step 4: Check that the CSV file exists in the working directory using the
dir() function. The
dir() function will list all the files stored in the working directory.
> dir()  "top-100-stocks.csv"
Writing the function
read.csv("top-100-stocks.csv", stringsAsfactors = FALSE)
The first argument of read.csv function is the path to the file you want to import in R. In our case since the file is already in the working directory, we just need to specify the file name. However, if the file was in some other location, then the things would be a bit tricky because depending on your operating system, the file paths will be formed differently. To avoid having to deal with escaping backslashes in file paths, you can use the
file.path() function to construct file paths that are correct, independent of the operating system you work on.
For example, to set the file path to
C:/r-programming/data/importing data, we will use the
file.path() function as follows:
> path <- file.path("C:","r-programming","data","importing data","top-100-stocks.csv") > path  "C:/r-programming/data/importing data/top-100-stocks.csv" >
This stores the file path in the variable
path, which can be passed to the
read.csv() function instead of the file name.
The second argument
stringsAsfactors is very important. When you import a data table into R, the columns that contain character strings can either be imported as factors or as character data. The default option is to convert character strings into factors, i.e., categorical data (example, gender, colors, types, etc). If your data is not categorical data, then you can set
Execute the Command
We can now execute the command and R will import the data as a data frame, as shown below:
Now that you have the data in R as a data frame, you can do all kinds of analysis on it. To get started, you can use
summary() function on the data frame to get a summary of the data.
> str(top100stocks) 'data.frame': 100 obs. of 9 variables: $ Symbol : chr "WINS" "CWEI" "TCK" "CRBP" ... $ Name : chr "Wins Fin Hldgs Ord" "Clayton Williams Energy" "Teck Resources Ltd" "Corbus Pharma Cmn" ... $ MarketCap: chr "4,051,490" "2,009,420" "13,887,640" "368,830" ... $ P.E : chr "0.00" "0.00" "102.04" "0.00" ... $ EPS : chr "0.00" "-17.70" "0.24" "-0.37" ... $ NetIncome: chr "N/A" "-98" "-1,939" "-9" ... $ Beta : chr "N/A" "2.29" "1.49" "2.57" ... $ Dividend : chr "0.00" "0.00" "0.08" "0.00" ... $ DivYield : chr "0.00%" "0.00%" "0.31%" "0.00%" ... >
read.csv()function has HEADER argument as true by default indicating whether the file contains the names of the variables as its first line. However, if the data being imported doesn’t have a header, this should be set to false.
- Sometimes your data may be in the EU (European Union) format, where commas are used as decimal separators and semicolons are used as field separators. In such a case, you need to import it to R using the
read.csv2()function which takes care of this format difference.