R has become very popular not just because of its data visualization capabilities but also its ability to combine data visualizations with statistical information. The statistics layer helps us represent statistical information about the data, such as mean and variance, to help in understanding the data.
We earlier learned about geoms which stand for "geometric objects." These are the core elements that you see on the plot, and includes objects such as points, lines, areas, and curves. Stats stand for "statistical transformations." These help us summarize the data in different ways such as counting observations, creating a linear regression line that best fits the data, or adding a confidence interval to the regression line.
All geoms have a default stat. For example, when we create a bar chart using geom_bars, the default stat is stat_count, which counts the number of rows to create the bar chart. You can add statistics layers to the plot using the stat_ functions.
Divide continuous range into bins, and count number of points in each
Compute statistics necessary for boxplot
Calculate contour lines
Compute 1d density estimate
Identity transformation,f(x)=x
Jitter values by adding small random value
Calculate values for quantile-quantile plot
Quantile regression
Smoothed conditional mean of y given x
Aggregate values of y for given x
Remove duplicated observations
Let's look at some of these statistics.
One of the most common statistical transformations we add to our plots is a smoothing line. When we plot a scatter chart, we can add a smoothing line using the geom_smooth(), which in turn uses stat_smooth to plot the smooth curve.
Let's use our stock_prices dataset and create a scatter plot with AAPL prices on x-axis and GOOGL prices on y-axis. We will add a smooth line to the scatter plot (LOESS is the default) with geom_smooth().
We can modify various aspects of stat_smooth. For example, span can be used to control the amount of smoothing for the default loess smoother. Smaller numbers produce wigglier lines, larger numbers produce smoother lines.
In the following example, we set the span to 0.3 and also change the line color to red.
This tutorial is a part of the course Data Visualization with R. This is a premium course. The purchase options for the course are provided below. With this course, you get access to complete course content, source code, practical exercises, and all resources that are a part of the course.
Lifetime Premium Membership
Get unlimited access to all courses and premium content