- Overview of Data Visualization
- When to Use Bar Chart, Column Chart, and Area Chart
- What is Line Chart and When to Use It
- What are Pie Chart and Donut Chart and When to Use Them
- How to Read Scatter Chart and Bubble Chart
- What is a Box Plot and How to Read It
- Understanding Japanese Candlestick Charts and OHLC Charts
- Understanding Treemap, Heatmap and Other Map Charts
- Visualization in Data Science
- Graphic Systems in R
- Accessing Built-in Datasets in R
- How to Create a Scatter Plot in R
- Create a Scatter Plot in R with Multiple Groups
- Creating a Bar Chart in R
- Creating a Line Chart in R
- Plotting Multiple Datasets on One Chart in R
- Adding Details and Features to R Plots
- Introduction to ggplot2
- Grammar of Graphics in ggplot
- Data Import and Basic Manipulation in R - German Credit Dataset
- Create ggplot Graph with German Credit Data in R
- Splitting Plots with Facets in ggplots
- ggplot2 - Chart Aesthetics and Position Adjustments in R
- Creating a Line Chart in ggplot 2 in R
- Add a Statistical Layer on Line Chart in ggplot2
- stat_summary for Statistical Summary in ggplot2 R
- Facets for ggplot2 Charts in R (Faceting Layer)
- Coordinates in ggplot2 in R
- Changing Themes (Look and Feel) in ggplot2 in R
Introduction to ggplot2
We have already learned about how we can create meaningful data visualizations in R using the Base Graphics package. In this section, we will learn about how we can create even more powerful data graphics using an R package called ggplot2
.
What is ggplot2?
The ggplot2 package is a very popular alternative to the base graphics package in R with over 1 million downloads in a year. At its core, it is an R package for creating statistical (data) graphics. However, it is different from other data visualization packages because it implements a very strong underlying grammar for creating these graphics. The ggplot2 package is an implementation of the ideas in the book, The Grammar of Graphics, by Leland Wilkinson, whose goal was to set out a set of general unifying principles for the visualization of data.
The package has very few base functions that makes it easy to learn and use. However, based on the grammar of graphics, we can combine these functions in many different ways to produce many different types of graphics. ggplot2
is also very good at setting reasonable default values enabling users to create good looking, hassle-free graphs. For example, it automatically adds legends to the graphs. Defaults enable us to use ggplot2 without knowing the grammar. However, knowing grammar allows us to build graphs from concepts rather than recall of commands and options. It also enables us to create new and improved graphs.
One of the important ideas in ggplot2
is that it allows us to build the graphic iteratively, one layer at a time. We can start with one layer that plots the raw data. Then add more layers showing annotations or statistical summaries. This matches with how we analyze data and think about data visualizations making it easy for use to create complex graphics iteratively.
Installation and Use
The first thing you need to do is install the ggplot2
package. This is done using the install.packages()
function as shown below:
> install.packages("ggplot2")
This command will install the ggplot2
package in your R instance along with any dependency packages
If you're using RStudio, you will see the newly installed package listed under packages in the bottom right window.
Once installed you can load it in your current R session using the following command:
> library("ggplot2")
ggplot2 Documentation
- https://ggplot2.tidyverse.org/reference/
- Contains help files for most (all?) ggplot2 functions
- Help files typically contain numerous code and graphics examples
Plotting with ggplot2
ggplot2
offers two ways to to produce plot objects: 1) qplot()
and 2) ggplot()
The qplot()
(for quick plot) can be used to create the most common graph types. It hides much of the complexity when creating standard graphs.
The ggplot()
function on the other hand brings the full power of grammar of graphics. It has slightly steeper learning curve but allows much more flexibility when building graphs
Our focus in this module will be on creating visualizations using the ggplot()
function.
Example
The following example uses the Insurance
dataset from the MASS
package. Below we show a very basic graph created using the qplot()
and ggplot()
functions using this data. The data given in data frame Insurance
consist of the numbers of policyholders of an insurance company who were exposed to risk, and the numbers of car insurance claims made by those policyholders in the third quarter of 1973. We plot a simple scatter chart with the No. of policy holders on x-axis and No. of claims on y-axis.
Since we're using the most basic settings, both the functions will produce the same chart.
Load the dataset
> library(MASS)
> data(Insurance)
Inspect data using the str() function
> str(Insurance)
Plot the chart using qplot()
or ggplot()
Format in qplot()
> qplot(Insurance$Holders ,Insurance$Claims)
Format in ggplot()
> ggplot(Insurance, aes(x = Holders, y = Claims))+geom_point()
Both these commands will draw the scatter plot with the Holders on x-axis and Claims on y-axis.
As you can see ggplot
has automatically taken care of most of the details such as axis, legends, etc.
In the following lessons, we will learn about the grammar of graphics and use it to create interesting data visualizations on some financial datasets.
Related Downloads
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $29 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.