- Overview of Data Visualization
- When to Use Bar Chart, Column Chart, and Area Chart
- What is Line Chart and When to Use It
- What are Pie Chart and Donut Chart and When to Use Them
- How to Read Scatter Chart and Bubble Chart
- What is a Box Plot and How to Read It
- Understanding Japanese Candlestick Charts and OHLC Charts
- Understanding Treemap, Heatmap and Other Map Charts
- Visualization in Data Science
- Graphic Systems in R
- Accessing Built-in Datasets in R
- How to Create a Scatter Plot in R
- Create a Scatter Plot in R with Multiple Groups
- Creating a Bar Chart in R
- Creating a Line Chart in R
- Plotting Multiple Datasets on One Chart in R
- Adding Details and Features to R Plots
- Introduction to ggplot2
- Grammar of Graphics in ggplot
- Data Import and Basic Manipulation in R - German Credit Dataset
- Create ggplot Graph with German Credit Data in R
- Splitting Plots with Facets in ggplots
- ggplot2 - Chart Aesthetics and Position Adjustments in R
- Creating a Line Chart in ggplot 2 in R
- Add a Statistical Layer on Line Chart in ggplot2
- stat_summary for Statistical Summary in ggplot2 R
- Facets for ggplot2 Charts in R (Faceting Layer)
- Coordinates in ggplot2 in R
- Changing Themes (Look and Feel) in ggplot2 in R
Add a Statistical Layer on Line Chart in ggplot2
R has become very popular not just because of its data visualization capabilities but also its ability to combine data visualizations with statistical information. The statistics layer helps us represent statistical information about the data, such as mean and variance, to help in understanding the data.
We earlier learned about geoms which stand for "geometric objects." These are the core elements that you see on the plot, and includes objects such as points, lines, areas, and curves. Stats stand for "statistical transformations." These help us summarize the data in different ways such as counting observations, creating a linear regression line that best fits the data, or adding a confidence interval to the regression line.
All geoms have a default stat. For example, when we create a bar chart using geom_bars
, the default stat is stat_count
, which counts the number of rows to create the bar chart. You can add statistics layers to the plot using the stat_
functions.
Stat | Description |
---|---|
bin | Divide continuous range into bins, and count number of points in each |
boxplot | Compute statistics necessary for boxplot |
contour | Calculate contour lines |
density | Compute 1d density estimate |
identity | Identity transformation,f(x)=x |
jitter | Jitter values by adding small random value |
Calculate values for quantile-quantile plot | |
quantile | Quantile regression |
smooth | Smoothed conditional mean of y given x |
summary | Aggregate values of y for given x |
unique | Remove duplicated observations |
Let's look at some of these statistics.
Smoothing
One of the most common statistical transformations we add to our plots is a smoothing line. When we plot a scatter chart, we can add a smoothing line using the geom_smooth()
, which in turn uses stat_smooth to plot the smooth curve.
Let's use our stock_prices dataset and create a scatter plot with AAPL prices on x-axis and GOOGL prices on y-axis. We will add a smooth line to the scatter plot (LOESS is the default) with geom_smooth()
.
ggplot(stock_prices,aes(x=AAPL,y=GOOGL))+
geom_point()+
geom_smooth()
We will now change the smoothing to linear model. The default formula is y ~ x
. We change the model by specifying method="lm"
.
ggplot(stock_prices,aes(x=AAPL,y=GOOGL))+
geom_point()+
geom_smooth(method="lm")
The smoothing line shows 95% confidence interval bands by default. We can hide the bands by adding the argument se=FALSE
.
ggplot(stock_prices,aes(x=AAPL,y=GOOGL))+
geom_point()+
geom_smooth(method="lm",se=FALSE)
Suppose we wanted to plot just the model and not the actual points. we can do so using geom_smooth()
or stat_smooth()
ggplot(stock_prices,aes(x=AAPL,y=GOOGL))+
stat_smooth(method="lm",se=FALSE)
Modifying stat_smooth
We can modify various aspects of stat_smooth
. For example, span
can be used to control the amount of smoothing for the default loess smoother. Smaller numbers produce wigglier lines, larger numbers produce smoother lines.
In the following example, we set the span to 0.3 and also change the line color to red.
ggplot(stock_prices,aes(x=AAPL,y=GOOGL))+
geom_point()+
stat_smooth(se=FALSE,col="red",span=0.3)
Related Downloads
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $29 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.