- Overview of Data Visualization
- When to Use Bar Chart, Column Chart, and Area Chart
- What is Line Chart and When to Use It
- What are Pie Chart and Donut Chart and When to Use Them
- How to Read Scatter Chart and Bubble Chart
- What is a Box Plot and How to Read It
- Understanding Japanese Candlestick Charts and OHLC Charts
- Understanding Treemap, Heatmap and Other Map Charts
- Visualization in Data Science
- Graphic Systems in R
- Accessing Built-in Datasets in R
- How to Create a Scatter Plot in R
- Create a Scatter Plot in R with Multiple Groups
- Creating a Bar Chart in R
- Creating a Line Chart in R
- Plotting Multiple Datasets on One Chart in R
- Adding Details and Features to R Plots
- Introduction to ggplot2
- Grammar of Graphics in ggplot
- Data Import and Basic Manipulation in R - German Credit Dataset
- Create ggplot Graph with German Credit Data in R
- Splitting Plots with Facets in ggplots
- ggplot2 - Chart Aesthetics and Position Adjustments in R
- Creating a Line Chart in ggplot 2 in R
- Add a Statistical Layer on Line Chart in ggplot2
- stat_summary for Statistical Summary in ggplot2 R
- Facets for ggplot2 Charts in R (Faceting Layer)
- Coordinates in ggplot2 in R
- Changing Themes (Look and Feel) in ggplot2 in R
Create ggplot Graph with German Credit Data in R
In ggplot2 package, we use the ggplot()
function to create a fully customized data visualization. We still have the German credit data loaded in the dataframe df
.
We will start by plotting a simple scatter graph that plots the duration of credit on x-axis and the amount of credit on y-axis. We can do so using the following command.
> ggplot(df,aes(x=Duration.of.Credit..in.months.,y=Credit.amount))
or
g <- ggplot(df,aes(x=Duration.of.Credit..in.months.,y=Credit.amount))
The process of creating a graph starts with the ggplot()
function. Note that we can either directly issue the command which will print the graph or we can create an object by assigning the function to a variable. In the 2nd example above, we have created an R object called g that stores the graph object.
The above command uses only the first two levels of the grammar of graphics, i.e., data and aesthetics. If we print this, it will only print an empty graph, as shown below:
We use the third level, that is, geom (geometric object) to print the actual data on the graph. Geometric objects are the actual marks we put on a plot.
- points (geom_point, for scatter plots, dot plots, etc)
- lines (geom_line, for time series, trend lines, etc)
- boxplot (geom_boxplot, for, well, boxplots!)
We use geom_points
to create scatter plots, geom_bar
for bar chart and so on. A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the +
operator.
You can get a list of available geometric objects using the code below:
help.search("geom_", package = "ggplot2")
Points (Scatterplot)
We now know the data that we want to plot, the aesthetics, and the geometric object that we want to create. We can use this information to complete our scatter plot.
We will add the points geom to our graph object g
, as shown below:
> g <- ggplot(df,aes(x=Duration.of.Credit..in.months.,y=Credit.amount))
> g + geom_point()
The scatter plot will be created as shown below:
The above scatter plot shows the relationship between the duration of credit in months and the amount loan. Each point represents a loan. In our dataset, we have 1000 loans and we also know that each loan has either defaulted or not. This is represented in the data point Loan.Quality
. We can use this variable to improve this graph, for example, by coloring the points based on the loan quality.
However, there is one small problem here. The variable Loan.Quality
is of type integer (1 for Bad Loan, and 2 for Good Loan). So, to use to to categorize data, we can convert it to Factor and then add levels.
#Convert Loan.Quality to Factor
df$Loan.Quality <- as.factor(df$Loan.Quality)
#Add levels for data points
levels(df$Loan.Quality) <- c("Bad Loan", "Good Loan")
We can now use this variable to color the points. For this we will add the parameter color to aesthetics.
> g <- ggplot(df,aes(x=Duration.of.Credit..in.months.,y=Credit.amount))
> g+geom_point(aes(color=Loan.Quality))
As you can see, the points in the scatter plot are of two colors, red for bad loans, and blue for good loans. ggplot2 has automatically selected the colors and also added the legend for convenience.
Related Downloads
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $29 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.