Apart from scatter and bar charts, another popular type of chart that is frequently used in financial analysis is the line chart. In this lesson we will learn about how to create a line chart using
Line charts are best suited for time-series data with time/date on the x-axis and some metrics or data on the y-axis. For our example, we will use stock prices data.
We have created a CSV file that contains the daily stock prices of the top 5 tech stocks in the US for the past 1 year.
Let’s first import the data into R as a data frame (called
stock_prices). We can inspect the data frame using the
> stock_prices <-read.csv("stock_data.csv") > str(stock_prices) 'data.frame': 251 obs. of 6 variables: $ Date : Factor w/ 251 levels "1/10/2017","1/11/2017",..: 179 180 181 182 188 204 205 206 207 189 ... $ AAPL : num 92 93.6 94.4 95.6 95.9 ... $ MSFT : num 48.4 49.4 50.5 51.2 51.2 ... $ GOOGL: num 681 691 695 704 710 ... $ IBM : num 144 146 148 152 152 ... $ INTC : num 30.7 31.2 31.9 32.8 32.8 ... >
One problem with the above data is that the data in date field is taken as Factors instead of Dates. Before proceeding, we will convert this field into a date field.
> stock_prices$Date<-as.Date(as.character(stock_prices$Date), format = '%m/%d/%Y') > str(stock_prices) 'data.frame': 251 obs. of 6 variables: $ Date : Date, format: "2016-06-27" "2016-06-28" ... $ AAPL : num 92 93.6 94.4 95.6 95.9 ... $ MSFT : num 48.4 49.4 50.5 51.2 51.2 ... $ GOOGL: num 681 691 695 704 710 ... $ IBM : num 144 146 148 152 152 ... $ INTC : num 30.7 31.2 31.9 32.8 32.8 ... >
Now you can see that the format for the
Date variable is Date.
Simple Line Chart
Let’s plot our first line chart with date on x-axis and Apple stock prices on y-axis.
ggplot(stock_prices, aes(x = Date, y = AAPL)) + geom_line()
ggplot will automatically adjust both x and y-axis scales according to the data.
Customizing Line Aesthetics
You can customize various aspects of a line in the plot such as the line width, line color, and line type.
linetype: This parameter specifies the line type. The common options are “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, “twodash”.
size: This parameter specifies the thinkness of the line specified as a number. Default is 0.
colour: This parameter is used to specify the color of the line.
In the following code, we change the
dashed, increase the size to 1, and the colour of the line to blue.
ggplot(stock_prices, aes(x = Date, y = AAPL)) + geom_line(colour="blue", linetype="dashed", size=1)
Plotting Multiple Lines (Cleaning Data)
Let’s say we want to plot the prices of all the stocks in a single plot with each line representing one stock. One way to do this is to use
geom_line() multiple times to add each additional line.
ggplot(stock_prices, aes(x = Date, y = AAPL)) + geom_line()+ geom_line(aes(x = Date, y = GOOGL))
However, this is not the correct method.
The correct way to handle this scenario is to convert the data into long form. This is also called tidy data where each column is a variable and each row is an observation. In our example data, we would want to have a separate column called ‘Symbol’ which will hold the stock symbol, and a new column called Price which will hold the price for that symbol.
Once we have data in this format, we can plot the line chart and group the data by Symbol to split it into multiple lines.
To convert the data into long-form, we can use the
tidyr package. (We will have an elaborate course on cleaning data in R).
#install and load tidyr package install.packages("tidyverse") library(tidyr)
tidyverse installs many packages related to cleaning data.
We use the
gather() function of the
tidyr package to move data from
stock_prices dataframe to a new dataframe called
stock_prices.tidy. This data frame should have three columns:
gather() takes four arguments: the original data frame (stock_prices), the key column (Symbol), the value column (Prices) and the name of the grouping variable with a minus in front (-Date).
> stock_prices.tidy <- gather(stock_prices,Symbol, Prices, -Date)
You can now inspect the dataframe.
> str(stock_prices.tidy) 'data.frame': 1255 obs. of 3 variables: $ Date : Date, format: "2016-06-27" ... $ Symbol: chr "AAPL" "AAPL" "AAPL" "AAPL" ... $ Prices: num 92 93.6 94.4 95.6 95.9 ...
Plotting Multiple Lines
We can now easily plot multiple lines with this new dataframe.
ggplot(stock_prices.tidy, aes(x = Date, y = Prices,color = Symbol)) + geom_line()
Let’s say we don’t want to exclude
GOOGL data from the plot. To do so, we can subset the data using the
subset function and store the resulting dataset in a new data frame. Then we can use the new data frame to create the line chart.
newdata <- subset(stock_prices.tidy,Symbol!='GOOGL') ggplot(newdata, aes(x = Date, y = Prices,color = Symbol)) + geom_line(size=1)
Note about Dates: In the
as.Date() function, the format specifies the format of the input. In this example, the date in the CSV file is structured as “1/10/2017″,”1/11/2017”, i.e., Month, Day and the Year. That’s why the format is specified as %m/%d/%y. The output of the
as.Date() function is the date, which is printed as 2016/10/12 (YYYY/MM/DD)