Visualization in Data Science
The advent of data science has opened doors to making discoveries from data in new and innovative ways. By combining statistics, computer science, applied mathematics and data visualizations, a data scientist or analyst can turn vast amounts of complex data into meaningful insights and new knowledge.
The fundamental purpose of any data science project is to use automated methods to analyze massive amounts of data and to extract knowledge from it. This puts data visualization at the forefront of any data science project. We can divide a data science problem into three sub-problems:
- Formulate the question
- Understand the answer
- Provide an explanation for the answer
Finding the right question is often considered more important than finding the right answer. This is the reason why the focus of data science should be on both exploratory and explanatory data analysis
Exploratory data analysis is the critical first step in analyzing the data from an experiment view point. This does not involve any statistical modeling or inference. Instead you just explore the data with an open mind to know more about it. In the process you will explore the data, find mistakes, validate assumptions, determine relationship between variables, and probably come up with important questions that you want to find answers to from the data.
Explanatory data analysis is where you want to answer specific questions and communicate those answers to the audience. This is where you will apply statistical and mathematical models to infer insights from the data and/or prove a hypothesis. Data visualization plays an important role in both exploratory and explanatory data analysis.
Why use visualization?
Data visualization is the most impactful way in which data scientists communicate their findings to their audience. Using data visualizations, you can visually display the data patterns, trends, and relationships that are easy to consume. There are several reasons why visualizations are better than reading numbers or tables.
- First, vision is the most powerful communication channel humans possess. We can detect information faster than our eye can move.
- Second, we humans are not very good at detecting patterns from numbers. For example, consider a data set containing the closing prices of five stocks over the past 30 days. It is much easier to identify the trends in these prices if presented as a line chart rather than as a table.
- Third, summary statistics can hide important information.
- Fourth, when information is complex in nature, the right representation can make it simple to understand.
However, we must remember that not all data visualizations are effective and we should spend considerable time in choosing the right visualizations for our data.
- Overview of Data Visualization
- When to Use Bar Chart, Column Chart, and Area Chart
- What is Line Chart and When to Use It
- What are Pie Chart and Donut Chart and When to Use Them
- How to Read Scatter Chart and Bubble Chart
- What is a Box Plot and How to Read It
- Understanding Japanese Candlestick Charts and OHLC Charts
- Understanding Treemap, Heatmap and Other Map Charts
- Visualization in Data Science
- Graphic Systems in R
- Accessing Built-in Datasets in R
- How to Create a Scatter Plot in R
- Create a Scatter Plot in R with Multiple Groups
- Creating a Bar Chart in R
- Creating a Line Chart in R
- Plotting Multiple Datasets on One Chart in R
- Adding Details and Features to R Plots
- Introduction to ggplot2
- Grammar of Graphics in ggplot
- Data Import and Basic Manipulation in R - German Credit Dataset
- Create ggplot Graph with German Credit Data in R
- Splitting Plots with Facets in ggplots
- ggplot2 - Chart Aesthetics and Position Adjustments in R
- Creating a Line Chart in ggplot 2 in R
- Add a Statistical Layer on Line Chart in ggplot2
- stat_summary for Statistical Summary in ggplot2 R
- Facets for ggplot2 Charts in R (Faceting Layer)
- Coordinates in ggplot2 in R
- Changing Themes (Look and Feel) in ggplot2 in R