• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Finance Train

Finance Train

High Quality tutorials for finance, risk, data science

  • Home
  • Data Science
  • CFA® Exam
  • PRM Exam
  • Tutorials
  • Careers
  • Products
  • Login

Importing Data from Excel in R

Data Science

R Programming for Data Science Importing Data from Excel in R

Microsoft Excel is another tool which is commonly used for data analysis. There are many R packages that can be used to interact with Excel. Once such package is readxl which was developed by Hadley Wickham.

An Excel file contains tabular data in multiple sheets. For this lesson, we have an excel sheet named stock-prices.xlsx (Downloadable above) that contains stock price data for December 2016 and November 2016 in two different sheets labeled – ‘Dec-2016’ and ‘Nov-2016’.

stock-pricesDownload

It is generally a good idea to first explore your data by directly opening it in MS Excel before you import it in R. Now that we know what data we are importing, let’s look at how we can import it using readxl.

readxl()

We will use two functions from readxl package:

  • excel_sheets() is used to get a list of sheets in the excel file
  • read_excel() is used to actually import the data from an excel sheet

These functions support both .xls and .xlsx file formats.

Install and Load read.xl() Package

The first thing we need to do is to install and load the desired package.

> install.packages("readxl")
> library(readxl)

Importing Data

Now that the readxl package is loaded in our R session, we can use the functions to read the data.

We will first place the excel sheet into our working directory so that it is easy to refer to in the functions. The dir() function can be used to confirm that the file is in the working directory. The function excel_sheets() can be used to get a list of all sheets in our excel file. To use the function simply pass the file name to the function and it will return a character vector with a list of sheet names.

> dir()
[1] "GS-Stock-Prices.txt" "stock-prices.xlsx"   "top-100-stocks.csv" 
> excel_sheets("stock-prices.xlsx")
[1] "Dec-2016" "Nov-2016" "Sheet3"  
>

We can now use the read_excel() function to import the data from an excel sheet. If we just pass the excel file name to the function, it will import data from the first sheet by default. The imported data is in the form of a tibble. If we want to load data from a specific sheet we can specify the sheet name or the sheet index in the sheet argument.

# The following function will load data from first sheet
read_excel("stock-prices.xlsx")
# The following two commands will load data from the second sheet
read_excel("stock-prices.xlsx", sheet=2)
read_excel("stock-prices.xlsx", sheet="Nov-2016")

Once the data is imported, we can explore it using the standard R functions.

> stockdata <- read_excel("stock-prices.xlsx", sheet="Nov-2016")
> str(stockdata)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    21 obs. of  6 variables:
 $ Time  : POSIXct, format: "2016-11-01" "2016-11-02" ...
 $ Open  : num  179 177 177 176 179 ...
 $ High  : num  179 178 178 177 182 ...
 $ Low   : num  177 176 176 175 179 ...
 $ Last  : num  178 177 176 176 181 ...
 $ Volume: num  2900600 2104700 1972700 1846400 3338700 ...
> summary(stockdata)
      Time                          Open            High            Low       
 Min.   :2016-11-01 00:00:00   Min.   :176.3   Min.   :177.3   Min.   :174.7  
 1st Qu.:2016-11-08 00:00:00   1st Qu.:180.1   1st Qu.:182.7   1st Qu.:179.0  
 Median :2016-11-15 00:00:00   Median :206.3   Median :209.7   Median :204.8  
 Mean   :2016-11-14 21:42:51   Mean   :198.1   Mean   :201.3   Mean   :197.2  
 3rd Qu.:2016-11-22 00:00:00   3rd Qu.:210.0   3rd Qu.:211.9   3rd Qu.:209.6  
 Max.   :2016-11-30 00:00:00   Max.   :215.2   Max.   :220.8   Max.   :215.0  
      Last           Volume        
 Min.   :175.9   Min.   : 1846400  
 1st Qu.:181.9   1st Qu.: 2646100  
 Median :209.2   Median : 3338700  
 Mean   :200.1   Mean   : 4306352  
 3rd Qu.:211.1   3rd Qu.: 5276100  
 Max.   :219.3   Max.   :11345400  
>

xlsx

We can also use the xlsx package to access Excel files. The first row should contain variable/column names.

# read in the first worksheet from the workbook stock-prices.xlsx
# first row contains variable names
library(xlsx)
stock_data <- read.xlsx("stock-prices.xlsx", 1)
# read in the worksheet named Nov-2016
mydata <- read.xlsx("stock-prices.xlsx", sheetName = "Nov-2016")

The imported data is loaded as a data frame.

gdata

Another alternative package for working with Excel data in R is the gdata package which is maintained by Gregory Warnes. gdata is actually an entire suite of tools for data manipulation. The package makes handling and management of data very convenient in R. In the gdata package the function we use to read Excel data is the read.xls() function. By default it only supports XLS files. However, the support for .xlsx files can be incorporated by simply installing a driver.

Internally the read.xls() makes use of the read.table() function to import the data as a data frame. So, all the arguments for read.table() are available for us to use while importing data from Excel using read.xls().

The gdata package requires perl interpreter to be installed on the system. Without that the function will not run.

Previous Lesson
Back to Course
Next Lesson

Primary Sidebar

In this Course

Course Home
R - Core Programming Principles
Relational Operators in R
Logical Operators in R
Conditional Statements in R
For Loop in R Programming
While and Repeat Loop in R Programming
Functions in R Programming
Creating Functions in R
Apply Functions in R
Importing Data in R
Importing Data from External Data Sources in R
Importing Data Using read.csv in R
Import Data using read.table in R
Importing Data Using data.table – fread in R
Importing Data from Excel in R
Using XLConnect in R Programming
Importing Data from a Database in R
SQL Queries from R
Importing Data from Web in R
Return to R Programming for Data Science

Latest Tutorials

    • Data Visualization with R
    • Derivatives with R
    • Machine Learning in Finance Using Python
    • Credit Risk Modelling in R
    • Quantitative Trading Strategies in R
    • Financial Time Series Analysis in R
    • VaR Mapping
    • Option Valuation
    • Financial Reporting Standards
    • Fraud
Facebook Group

Membership

Unlock full access to Finance Train and see the entire library of member-only content and resources.

Subscribe

Footer

Recent Posts

  • How to Improve your Financial Health
  • CFA® Exam Overview and Guidelines (Updated for 2021)
  • Changing Themes (Look and Feel) in ggplot2 in R
  • Coordinates in ggplot2 in R
  • Facets for ggplot2 Charts in R (Faceting Layer)

Products

  • Level I Authority for CFA® Exam
  • CFA Level I Practice Questions
  • CFA Level I Mock Exam
  • Level II Question Bank for CFA® Exam
  • PRM Exam 1 Practice Question Bank
  • All Products

Quick Links

  • Privacy Policy
  • Contact Us

CFA Institute does not endorse, promote or warrant the accuracy or quality of Finance Train. CFA® and Chartered Financial Analyst® are registered trademarks owned by CFA Institute.

Copyright © 2021 Finance Train. All rights reserved.

  • About Us
  • Privacy Policy
  • Contact Us