- Relational Operators in R
- Logical Operators in R
- Conditional Statements in R
- For Loop in R Programming
- While and Repeat Loop in R Programming
- Functions in R Programming
- Creating Functions in R
- Apply Functions in R
- Importing Data from External Data Sources in R
- Importing Data Using read.csv in R
- Import Data using read.table in R
- Importing Data Using data.table – fread in R
- Importing Data from Excel in R
- Using XLConnect in R Programming
- Importing Data from a Database in R
- SQL Queries from R
- Importing Data from Web in R
Importing Data Using read.csv in R
R provides a variety of functions for importing data files. The most common functions which we will use are read.csv()
, read.delim()
, and read.table()
. These functions are loaded in R by default as a part of the utils
package when you start R.
read.csv()
read.csv()
is used to import csv (comma separated values) files. R imports the data into a data frame.
Let's understand how to use this function using an actual data file.
Setup
Step 1: Download the file called top-100-stocks.csv
on your computer. This file contains fundamental variables for top 100 stocks in the US (Data as of January 9, 2017)
Step 2: Store this file in a location of your choice. For this example, I have stored my downloaded copy of the file in the path "C:\r-programming\data\importing data"
.
Step 3: Set this as working directory in R using the setwd()
function. You can check if the working directory is set correctly using the getwd()
function.
> setwd("C:/r-programming/data/importing data")
> getwd()
[1] "C:/r-programming/data/importing data"
Step 4: Check that the CSV file exists in the working directory using the dir()
function. The dir()
function will list all the files stored in the working directory.
> dir()
[1] "top-100-stocks.csv"
Writing the function
read.csv("top-100-stocks.csv", stringsAsfactors = FALSE)
The first argument of read.csv function is the path to the file you want to import in R. In our case since the file is already in the working directory, we just need to specify the file name. However, if the file was in some other location, then the things would be a bit tricky because depending on your operating system, the file paths will be formed differently. To avoid having to deal with escaping backslashes in file paths, you can use the file.path()
function to construct file paths that are correct, independent of the operating system you work on.
For example, to set the file path to C:/r-programming/data/importing data
, we will use the file.path()
function as follows:
> path <- file.path("C:","r-programming","data","importing data","top-100-stocks.csv")
> path
[1] "C:/r-programming/data/importing data/top-100-stocks.csv"
>
This stores the file path in the variable path
, which can be passed to the read.csv()
function instead of the file name.
The second argument stringsAsfactors
is very important. When you import a data table into R, the columns that contain character strings can either be imported as factors or as character data. The default option is to convert character strings into factors, i.e., categorical data (example, gender, colors, types, etc). If your data is not categorical data, then you can set stringsAsfactors
to FALSE
.
Execute the Command
We can now execute the command and R will import the data as a data frame, as shown below:
Analyzing Results
Now that you have the data in R as a data frame, you can do all kinds of analysis on it. To get started, you can use str()
or summary()
function on the data frame to get a summary of the data.
> str(top100stocks)
'data.frame': 100 obs. of 9 variables:
$ Symbol : chr "WINS" "CWEI" "TCK" "CRBP" ...
$ Name : chr "Wins Fin Hldgs Ord" "Clayton Williams Energy" "Teck Resources Ltd" "Corbus Pharma Cmn" ...
$ MarketCap: chr "4,051,490" "2,009,420" "13,887,640" "368,830" ...
$ P.E : chr "0.00" "0.00" "102.04" "0.00" ...
$ EPS : chr "0.00" "-17.70" "0.24" "-0.37" ...
$ NetIncome: chr "N/A" "-98" "-1,939" "-9" ...
$ Beta : chr "N/A" "2.29" "1.49" "2.57" ...
$ Dividend : chr "0.00" "0.00" "0.08" "0.00" ...
$ DivYield : chr "0.00%" "0.00%" "0.31%" "0.00%" ...
>
Important Notes
- The
read.csv()
function has HEADER argument as true by default indicating whether the file contains the names of the variables as its first line. However, if the data being imported doesn't have a header, this should be set to false. - Sometimes your data may be in the EU (European Union) format, where commas are used as decimal separators and semicolons are used as field separators. In such a case, you need to import it to R using the
read.csv2()
function which takes care of this format difference.
You may find these interesting
Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.