- Relational Operators in R
- Logical Operators in R
- Conditional Statements in R
- For Loop in R Programming
- While and Repeat Loop in R Programming
- Functions in R Programming
- Creating Functions in R
- Apply Functions in R
- Importing Data from External Data Sources in R
- Importing Data Using read.csv in R
- Import Data using read.table in R
- Importing Data Using data.table – fread in R
- Importing Data from Excel in R
- Using XLConnect in R Programming
- Importing Data from a Database in R
- SQL Queries from R
- Importing Data from Web in R
SQL Queries from R
In the previous lesson we learned that we can use the dbReadTable()
function to import a table into R. However, this can be complicated if our table contains millions of records and we are actually interested only in a sub-set of records that meet a certain condition. In such a scenario, we have two choices:
- Import the entire table as a data frame into R and then use R functions such as
subset()
to select the data that you're interested in. This approach can work but is a problem if the original data table is large. - We can use the
dbGetQuery()
function which accepts SQL statements specifying what data we want to fetch. For example, to perform a simple aggregate query, there is no need to pull a database table into R and apply an R function to the data frame. Instead, we issue a select statement and retrieve the results table as a data frame.
Example 1
Let's consider our enwrap86_ftstockdata
database which has two tables Stocks
and Prices
. In the following command we fetch all records from the table Prices
just for the stock with symbol TECK
.
# Load the DBI package
library(DBI)
# Create a database connection
con <- dbConnect(RMySQL::MySQL(),
dbname = "enwrap86_ftstockdata",
host = "109.199.108.179",
port = 3306,
user = "enwrap86_dataadm",
password = "X+8NHCNrFJGR")
# Select all price data for the stock TECK
TECK_Prices <- dbGetQuery(con, "SELECT * FROM Prices WHERE Symbol = 'TECK'")
# Print TECK Prices
TECK_Prices
Example 2
We can perform all kinds of queries including things such as JOIN where we fetch data combined from multiple tables. For example, let's say we want to retrieve the name of the stock (Available in Stocks table) along with it's Maximum Last (closing) prices (From prices table). The common key between the two tables in the 'Symbol field so, we can create a join between the two tables and retrieve the desired data as shown below:
# get the name of each stock along with it's maximum last price
Max_Prices <- dbGetQuery(con, "SELECT s.Name,p.Last as MaxLast FROM Stocks AS s
LEFT JOIN Prices AS p ON (p.Symbol = s.Symbol)
WHERE p.Last = (
SELECT MAX(Last) FROM Prices AS p2
WHERE p2.Symbol = s.Symbol
)")
# Print Max Prices
Max_Prices
The result will look as follows:
> # Print Max Prices
> Max_Prices
Name MaxLast
1 Teck Resources Ltd 26.25
2 Chemours Company 25.59
3 AK Steel Holding Corp 11.11
4 Kemet Corp 7.12
5 Health Insurance Inn 20.70
6 Nvidia Corporation 111.77
7 Gerdau S.A. 4.05
8 Gold Resource Corp 5.45
9 Envirostarm Inc 15.25
10 Skyline Corp 15.43
>
You may find this query a bit intimidating, but I promise you we will go indepth into SQL in a course dedicated to mastering SQL queries.
Send - Fetch - Clear
When the result table is huge, we may not want to bring it into R in its entirety, but instead fetch the tuples in batches, possibly reducing the batches to simple summaries before requesting the next batch. Instead of dbGetQuery, we use dbSendQuery to fetch results in batches. The DBI package provides functions to keep track of whether the statement produces output, how many rows were affected by the operation, how many rows have been fetched (if statement is a query), and whether there are more rows to fetch.
In the example below, rather than using dbReadTable
to pull over the entire Prices
table, the dbSendQuery
function is used to send the query to the database without retrieving the results. Then, the dbFetch
function pulls over tuples in blocks. In this example, the first 50 tuples are retrieved, then the next 200, after which we determine that there are more results to be fetched (dbHasCompleted) and clear the results object (dbClearResult) without bringing over any more tuples from the MySQL database.
ps = dbSendQuery(con, "SELECT * FROM Prices;")
firstBatch = dbFetch(ps, n = 50)
secondBatch = dbFetch(ps, n = 30)
dbHasCompleted(ps)
dbClearResult(ps)
Let's take a relook at what's happening here:
- The actual prices table contains 190 records. This is a small database so we don't have to fetch it in batches, but for this example, we will use the batch fetching principles which will work really well for large data sets.
- The
dbSendQuery
function only submits and synchronously executes the SQL query to the database engine. It does not extract any records. - We then use the
dbFetch
function to actually fetch the records. We execute this function twice, first to fetch the first 50 records and then to fetch the next 30 records. - We then use the
dbHasCompleted
function to check if there still are more records to be fetched. It returns in TRUE/FALSE so you can use it in a loop or condition. - Finally, we must call
dbClearResult
function when you finish fetching the records you need. - In addition, the n = −1 assignment for the parameter specifies that all remaining tuples are to fetched. The fetch function converts each attribute in the result set to the corresponding type in R.
- In addition, dbListResults(con) gives a list of all currently active result set objects for the connection con, and dbGetRowCount(rs) provides a status of the number of rows that have been fetched in the query.
When finished, we free up resources by disconnecting the connection:
dbDisconnect(con)
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $29 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.