SQL Queries from R

In the previous lesson we learned that we can use the dbReadTable() function to import a table into R. However, this can be complicated if our table contains millions of records and we are actually interested only in a sub-set of records that meet a certain condition. In such a scenario, we have two choices:

  1. Import the entire table as a data frame into R and then use R functions such as subset() to select the data that you're interested in. This approach can work but is a problem if the original data table is large.
  2. We can use the dbGetQuery() function which accepts SQL statements specifying what data we want to fetch. For example, to perform a simple aggregate query, there is no need to pull a database table into R and apply an R function to the data frame. Instead, we issue a select statement and retrieve the results table as a data frame.

Example 1

Let's consider our enwrap86_ftstockdata database which has two tables Stocks and Prices. In the following command we fetch all records from the table Prices just for the stock with symbol TECK.

# Load the DBI package
library(DBI)
# Create a database connection
con <- dbConnect(RMySQL::MySQL(), 
                 dbname = "enwrap86_ftstockdata", 
                 host = "109.199.108.179", 
                 port = 3306,
                 user = "enwrap86_dataadm",
                 password = "X+8NHCNrFJGR")
# Select all price data for the stock TECK
TECK_Prices <- dbGetQuery(con, "SELECT * FROM Prices WHERE Symbol = 'TECK'")
# Print TECK Prices
TECK_Prices

Example 2

We can perform all kinds of queries including things such as JOIN where we fetch data combined from multiple tables. For example, let's say we want to retrieve the name of the stock (Available in Stocks table) along with it's Maximum Last (closing) prices (From prices table). The common key between the two tables in the 'Symbol field so, we can create a join between the two tables and retrieve the desired data as shown below:

# get the name of each stock along with it's maximum last price
Max_Prices <- dbGetQuery(con, "SELECT s.Name,p.Last as MaxLast FROM Stocks AS s
    LEFT JOIN Prices AS p ON (p.Symbol = s.Symbol)
    WHERE p.Last = (
        SELECT MAX(Last) FROM Prices AS p2
            WHERE p2.Symbol = s.Symbol
    )")
# Print Max Prices
Max_Prices

The result will look as follows:

This content is for paid members only.

Join our membership for lifelong unlimited access to all our data science learning content and resources.