Creating a Line Chart in ggplot 2 in R

Apart from scatter and bar charts, another popular type of chart that is frequently used in financial analysis is the line chart. In this lesson we will learn about how to create a line chart using ggplot2.

Line charts are best suited for time-series data with time/date on the x-axis and some metrics or data on the y-axis. For our example, we will use stock prices data.

Import Data

We have created a CSV file that contains the daily stock prices of the top 5 tech stocks in the US for the past 1 year.

Download stock_data.csv

Let's first import the data into R as a data frame (called stock_prices). We can inspect the data frame using the str() function.

> stock_prices <-read.csv("stock_data.csv")
> str(stock_prices)
'data.frame':    251 obs. of  6 variables:
 $ Date : Factor w/ 251 levels "1/10/2017","1/11/2017",..: 179 180 181 182 188 204 205 206 207 189 ...
 $ AAPL : num  92 93.6 94.4 95.6 95.9 ...
 $ MSFT : num  48.4 49.4 50.5 51.2 51.2 ...
 $ GOOGL: num  681 691 695 704 710 ...
 $ IBM  : num  144 146 148 152 152 ...
 $ INTC : num  30.7 31.2 31.9 32.8 32.8 ...
>

One problem with the above data is that the data in date field is taken as Factors instead of Dates. Before proceeding, we will convert this field into a date field.

> stock_prices$Date<-as.Date(as.character(stock_prices$Date), format = '%m/%d/%Y')
> str(stock_prices)
'data.frame':    251 obs. of  6 variables:
 $ Date : Date, format: "2016-06-27" "2016-06-28" ...
 $ AAPL : num  92 93.6 94.4 95.6 95.9 ...
 $ MSFT : num  48.4 49.4 50.5 51.2 51.2 ...
 $ GOOGL: num  681 691 695 704 710 ...
 $ IBM  : num  144 146 148 152 152 ...
 $ INTC : num  30.7 31.2 31.9 32.8 32.8 ...
>

Now you can see that the format for the Date variable is Date.

Simple Line Chart

Let's plot our first line chart with date on x-axis and Apple stock prices on y-axis.

ggplot(stock_prices, aes(x = Date, y = AAPL)) +
  geom_line()

ggplot will automatically adjust both x and y-axis scales according to the data.

Line Chart in ggplot2

Customizing Line Aesthetics

You can customize various aspects of a line in the plot such as the line width, line color, and line type.

  • linetype: This parameter specifies the line type. The common options are "solid", "dashed", "dotted", "dotdash", "longdash", "twodash".
  • size: This parameter specifies the thinkness of the line specified as a number. Default is 0.
  • colour: This parameter is used to specify the color of the line.

In the following code, we change the linetype to dashed, increase the size to 1, and the colour of the line to blue.

ggplot(stock_prices, aes(x = Date, y = AAPL)) +
  geom_line(colour="blue", linetype="dashed", size=1)

Plotting Multiple Lines (Cleaning Data)

Let's say we want to plot the prices of all the stocks in a single plot with each line representing one stock. One way to do this is to use geom_line() multiple times to add each additional line.

ggplot(stock_prices, aes(x = Date, y = AAPL)) +
  geom_line()+
  geom_line(aes(x = Date, y = GOOGL))

However, this is not the correct method.

The correct way to handle this scenario is to convert the data into long form. This is also called tidy data where each column is a variable and each row is an observation. In our example data, we would want to have a separate column called 'Symbol' which will hold the stock symbol, and a new column called Price which will hold the price for that symbol.

Once we have data in this format, we can plot the line chart and group the data by Symbol to split it into multiple lines.

To convert the data into long-form, we can use the tidyr package. (We will have an elaborate course on cleaning data in R).

#install and load tidyr package
install.packages("tidyverse")
library(tidyr)

tidyverse installs many packages related to cleaning data.

We use the gather() function of the tidyr package to move data from stock_prices dataframe to a new dataframe called stock_prices.tidy. This data frame should have three columns: DateSymbol and Prices.

gather() takes four arguments: the original data frame (stock_prices), the key column (Symbol), the value column (Prices) and the name of the grouping variable with a minus in front (-Date).

> stock_prices.tidy <- gather(stock_prices,Symbol, Prices, -Date)

You can now inspect the dataframe.

> str(stock_prices.tidy)
'data.frame':    1255 obs. of  3 variables:
 $ Date  : Date, format: "2016-06-27" ...
 $ Symbol: chr  "AAPL" "AAPL" "AAPL" "AAPL" ...
 $ Prices: num  92 93.6 94.4 95.6 95.9 ...

Plotting Multiple Lines

We can now easily plot multiple lines with this new dataframe.

ggplot(stock_prices.tidy, aes(x = Date, y = Prices,color = Symbol)) +
  geom_line()

Subsetting Data

Let's say we don't want to exclude GOOGL data from the plot. To do so, we can subset the data using the subset function and store the resulting dataset in a new data frame. Then we can use the new data frame to create the line chart.

newdata <- subset(stock_prices.tidy,Symbol!='GOOGL')
ggplot(newdata, aes(x = Date, y = Prices,color = Symbol)) +
  geom_line(size=1)

Note about Dates: In the as.Date() function, the format specifies the format of the input. In this example, the date in the CSV file is structured as "1/10/2017","1/11/2017", i.e., Month, Day and the Year. That's why the format is specified as %m/%d/%y. The output of the as.Date() function is the date, which is printed as 2016/10/12 (YYYY/MM/DD)

Related Downloads

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.