Visualization in Data Science

The advent of data science has opened doors to making discoveries from data in new and innovative ways. By combining statistics, computer science, applied mathematics and data visualizations, a data scientist or analyst can turn vast amounts of complex data into meaningful insights and new knowledge.

The fundamental purpose of any data science project is to use automated methods to analyze massive amounts of data and to extract knowledge from it. This puts data visualization at the forefront of any data science project. We can divide a data science problem into three sub-problems:

  1. Formulate the question
  2. Understand the answer
  3. Provide an explanation for the answer

Finding the right question is often considered more important than finding the right answer. This is the reason why the focus of data science should be on both exploratory and explanatory data analysis

Exploratory data analysis is the critical first step in analyzing the data from an experiment view point. This does not involve any statistical modeling or inference. Instead you just explore the data with an open mind to know more about it. In the process you will explore the data, find mistakes, validate assumptions, determine relationship between variables, and probably come up with important questions that you want to find answers to from the data.

Explanatory data analysis is where you want to answer specific questions and communicate those answers to the audience. This is where you will apply statistical and mathematical models to infer insights from the data and/or prove a hypothesis. Data visualization plays an important role in both exploratory and explanatory data analysis.

Why use visualization?

Data visualization is the most impactful way in which data scientists communicate their findings to their audience. Using data visualizations, you can visually display the data patterns, trends, and relationships that are easy to consume. There are several reasons why visualizations are better than reading numbers or tables.

  1. First, vision is the most powerful communication channel humans possess. We can detect information faster than our eye can move.
  2. Second, we humans are not very good at detecting patterns from numbers. For example, consider a data set containing the closing prices of five stocks over the past 30 days. It is much easier to identify the trends in these prices if presented as a line chart rather than as a table.
  3. Third, summary statistics can hide important information.
  4. Fourth, when information is complex in nature, the right representation can make it simple to understand.

However, we must remember that not all data visualizations are effective and we should spend considerable time in choosing the right visualizations for our data.

Related Downloads