- Overview of Data Visualization
- When to Use Bar Chart, Column Chart, and Area Chart
- What is Line Chart and When to Use It
- What are Pie Chart and Donut Chart and When to Use Them
- How to Read Scatter Chart and Bubble Chart
- What is a Box Plot and How to Read It
- Understanding Japanese Candlestick Charts and OHLC Charts
- Understanding Treemap, Heatmap and Other Map Charts
- Visualization in Data Science
- Graphic Systems in R
- Accessing Built-in Datasets in R
- How to Create a Scatter Plot in R
- Create a Scatter Plot in R with Multiple Groups
- Creating a Bar Chart in R
- Creating a Line Chart in R
- Plotting Multiple Datasets on One Chart in R
- Adding Details and Features to R Plots
- Introduction to ggplot2
- Grammar of Graphics in ggplot
- Data Import and Basic Manipulation in R - German Credit Dataset
- Create ggplot Graph with German Credit Data in R
- Splitting Plots with Facets in ggplots
- ggplot2 - Chart Aesthetics and Position Adjustments in R
- Creating a Line Chart in ggplot 2 in R
- Add a Statistical Layer on Line Chart in ggplot2
- stat_summary for Statistical Summary in ggplot2 R
- Facets for ggplot2 Charts in R (Faceting Layer)
- Coordinates in ggplot2 in R
- Changing Themes (Look and Feel) in ggplot2 in R
Introduction to ggplot2
We have already learned about how we can create meaningful data visualizations in R using the Base Graphics package. In this section, we will learn about how we can create even more powerful data graphics using an R package called
What is ggplot2?
The ggplot2 package is a very popular alternative to the base graphics package in R with over 1 million downloads in a year. At its core, it is an R package for creating statistical (data) graphics. However, it is different from other data visualization packages because it implements a very strong underlying grammar for creating these graphics. The ggplot2 package is an implementation of the ideas in the book, The Grammar of Graphics, by Leland Wilkinson, whose goal was to set out a set of general unifying principles for the visualization of data.
The package has very few base functions that makes it easy to learn and use. However, based on the grammar of graphics, we can combine these functions in many different ways to produce many different types of graphics.
ggplot2 is also very good at setting reasonable default values enabling users to create good looking, hassle-free graphs. For example, it automatically adds legends to the graphs. Defaults enable us to use ggplot2 without knowing the grammar. However, knowing grammar allows us to build graphs from concepts rather than recall of commands and options. It also enables us to create new and improved graphs.
One of the important ideas in
ggplot2 is that it allows us to build the graphic iteratively, one layer at a time. We can start with one layer that plots the raw data. Then add more layers showing annotations or statistical summaries. This matches with how we analyze data and think about data visualizations making it easy for use to create complex graphics iteratively.
Installation and Use
The first thing you need to do is install the
ggplot2 package. This is done using the
install.packages() function as shown below:
This command will install the
ggplot2 package in your R instance along with any dependency packages
If you're using RStudio, you will see the newly installed package listed under packages in the bottom right window.
Once installed you can load it in your current R session using the following command:
- Contains help files for most (all?) ggplot2 functions
- Help files typically contain numerous code and graphics examples
Plotting with ggplot2
ggplot2 offers two ways to to produce plot objects: 1)
qplot() and 2)
qplot() (for quick plot) can be used to create the most common graph types. It hides much of the complexity when creating standard graphs.
ggplot() function on the other hand brings the full power of grammar of graphics. It has slightly steeper learning curve but allows much more flexibility when building graphs
Our focus in this module will be on creating visualizations using the
The following example uses the
Insurance dataset from the
MASS package. Below we show a very basic graph created using the
ggplot() functions using this data. The data given in data frame
Insurance consist of the numbers of policyholders of an insurance company who were exposed to risk, and the numbers of car insurance claims made by those policyholders in the third quarter of 1973. We plot a simple scatter chart with the No. of policy holders on x-axis and No. of claims on y-axis.
Since we're using the most basic settings, both the functions will produce the same chart.
Load the dataset
> library(MASS) > data(Insurance)
Inspect data using the str() function
Plot the chart using
Format in qplot()
> qplot(Insurance$Holders ,Insurance$Claims)
Format in ggplot()
> ggplot(Insurance, aes(x = Holders, y = Claims))+geom_point()
Both these commands will draw the scatter plot with the Holders on x-axis and Claims on y-axis.
As you can see
ggplot has automatically taken care of most of the details such as axis, legends, etc.
In the following lessons, we will learn about the grammar of graphics and use it to create interesting data visualizations on some financial datasets.
Unlock full access to Finance Train and see the entire library of member-only content and resources.