- Overview of Data Visualization
- When to Use Bar Chart, Column Chart, and Area Chart
- What is Line Chart and When to Use It
- What are Pie Chart and Donut Chart and When to Use Them
- How to Read Scatter Chart and Bubble Chart
- What is a Box Plot and How to Read It
- Understanding Japanese Candlestick Charts and OHLC Charts
- Understanding Treemap, Heatmap and Other Map Charts
- Visualization in Data Science
- Graphic Systems in R
- Accessing Built-in Datasets in R
- How to Create a Scatter Plot in R
- Create a Scatter Plot in R with Multiple Groups
- Creating a Bar Chart in R
- Creating a Line Chart in R
- Plotting Multiple Datasets on One Chart in R
- Adding Details and Features to R Plots
- Introduction to ggplot2
- Grammar of Graphics in ggplot
- Data Import and Basic Manipulation in R - German Credit Dataset
- Create ggplot Graph with German Credit Data in R
- Splitting Plots with Facets in ggplots
- ggplot2 - Chart Aesthetics and Position Adjustments in R
- Creating a Line Chart in ggplot 2 in R
- Add a Statistical Layer on Line Chart in ggplot2
- stat_summary for Statistical Summary in ggplot2 R
- Facets for ggplot2 Charts in R (Faceting Layer)
- Coordinates in ggplot2 in R
- Changing Themes (Look and Feel) in ggplot2 in R

# How to Create a Scatter Plot in R

When you start analyzing a new dataset, your first requirement would be to know the variables in the dataset and the relationship between them. A scatter plot is the perfect place to start with. It is the quickest way to view the relationship between any two variables x and y.

You can create a scatter plot using the generic `plot()`

function in R.

```
plot(x,y)
```

The function itself doesn't return anything back to the console but instead draws the plot in the plot window.

The two variables, x and y, could be two separate vectors or it could be a dataframe with two columns.

The following example shows the scatter plot created using the `cars`

dataset. The data gives the speed of cars and the distances taken to stop. It has two columns, speed and dist. The first column, speed, becomes the x-axis and the second column, dist, becomes the y-axis.

```
plot(cars)
```

As you can see, the graph draws a point for each pair of speed and dist. The general observation from the scatter plot is that the higher the speed, the higher is the distance to stop.

If the dataset contains more than 2 columns, the `plot()`

function will return multiple scatter plots each representing relationship between two variables.

Let's take another dataset called `whiteside`

from the `MASS`

package. The dataset contains home insulation data containing three variables, namely, Insul, Temp and Gas.

**Insul:**A factor, before or after insulation.**Temp:**Purportedly the average outside temperature in degrees Celsius.**Gas:**The weekly gas consumption in 1000s of cubic feet.

If we call the `plot()`

function on this dataset, it will plot multiple scatter plots representing relationship between all three variables.

```
> #load the data
> data(whiteside,package="MASS")
> head(whiteside)
Insul Temp Gas
1 Before -0.8 7.2
2 Before -0.7 6.9
3 Before 0.4 6.4
4 Before 2.5 6.0
5 Before 2.9 5.8
6 Before 3.2 5.8
> #plot the data
> plot(whiteside)
```

Each scatter plot draws points for two variables. For example, the graph in the lower left corner has Insul on x-axis and Gas on y-axis. Similarly, the graph in the bottom row, middle column has Temp on x-axis and Gas on y-axis.

The `plot()`

function we use above is generic function, that is, it will change its behavior depending on the types of arguments provided and will produce different results. We just saw the use of the `plot()`

function in its most basic form. In the following lessons, we will see how we can customize the plots and enhance them using various arguments.

### Exercise

Load the `Cars93`

dataset from the `MASS`

package and use the `plot()`

function to draw scatter plots on its variables.

## Enhancing a Plot

Now that we know how to create a basic plot using the `plot()`

function, let's learn how we can enhance the chart in various ways. We will start with a new dataset which contains daily stock returns for five stocks, namely, Goldman Sachs, Citi, Apple, Facebook and JC Penny for a period of one year. The data is provided in csv format, so we will first load it into R.

### Load the Data

I've placed the data file in my working directory and then used the `read.csv()`

function to load the data into an R dataframe called 'stock_returns'.

```
> getwd()
[1] "C:/Users/Manish/Documents"
> setwd("C:/r-programming/data")
> getwd()
[1] "C:/r-programming/data"
> stock_returns <- read.csv("stock_returns.csv")
> head(stock_returns)
Date gs c aapl fb jcp
1 16/04/2015 -0.44 1.52 -0.48 -0.47 -2.69
2 15/04/2015 1.71 0.91 0.38 -0.98 -2.40
3 14/04/2015 1.09 0.13 -0.43 0.61 -2.66
4 13/04/2015 -0.03 0.44 -0.20 1.18 1.95
5 10/04/2015 0.38 0.58 0.43 -0.16 0.22
6 09/04/2015 1.21 0.46 0.76 -0.13 1.32
```

### Create a Scatter Plot

If we just call the `plot()`

function on 'stock_returns' dataset, it will plot multiple scatter plots one for each pair of columns. However, for our use, we will just create a scatter plot for GoldmanSachs and Citi's stock returns. We can do so by supplying x and y values separately as shown below:

```
> plot(stock_returns$gs,stock_returns$c)
```

The resulting plot is shown below:

### Add Title and Axis Labels

The scatter plot we created is quite plain. We can make it more readable by adding a title and labels to the X and Y axis. We will use the `plot()`

function arguments to do so.

- The
`main`

argument for the title - The
`xlab`

argument for the x-axis - The
`ylab`

argument for the y-axis

After adding the title and labels, we can also add grid to the graph by calling the `grid()`

function after calling the `plot()`

function.

```
> plot(stock_returns$gs,stock_returns$c, main="Scatter Plot: 1-year Daily Returns", xlab="GoldmanSachs Returns", ylab="Citigroup Returns")
> grid()
```

In the above example, we first plotted the graph and then added the grid to it. The alternative (and preferred) method is to plot the graph using the `plot()`

function but with the argument `type=n`

which will prevent the graph from printing. Then we call the `grid()`

function to add the grid, and then finally call the low-level graphics function such as `points()`

or `lines()`

to overlay the graph on the grid.

### Plot a Regression Line

We can add a regression line to this scatter plot of returns for GoldmanSachs and Citigroup as shown below:

1. Perform a linear regression using lm() on the two variables. lm stands for "linear model"

```
m <- lm(stock_returns$c ~ stock_returns$gs)
```

2. Draw the scatter plot

```
plot(stock_returns$c ~ stock_returns$gs, main="Scatter Plot: 1-year Daily Returns", xlab="GoldmanSachs Returns", ylab="Citigroup Returns")
```

3. Add the regression line using `abline`

function.

```
abline(m)
```

The graph will now look as follows:

Notice that we defined the plot as `plot(stock_returns$c ~ stock_returns$gs)`

. This is to keep the order of variables similar to how it is in the `lm()`

function. The alternative way is `plot(stock_returns$gs,stock_returns$c)`

that we used earlier.

### Exercise

Load the 'stock_returns' dataset into R and create a scatter plot with Apple's returns on x-axis and Facebook's returns on y-axis. Then add a title, axis labels and a regression line to the plot.

#### Lesson Resources

#### Related Downloads

## Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.