Big Data and the Role of the Data Scientist

The latest adventurers are into analysing Big Data to get invaluable nuggets of insights. They scour the vast tracts of data, wrangle with it, clean it and analyse it to arrive at insights that a small data set can never reveal. Genomics, physics, Internet marketing and financial data are some of the examples where high volume, high-speed data known as big data is available. Big data is also available now from a variety of sources. Companies now have access to voices of customers they never had before. If the problem earlier was how to get data, the new one is how to understand and analyse all the data that is available.

Big data is a treasure trove everyone wants to explore and use to his or her benefit.

One of the most obvious examples of great use of big data is retail giant Walmart.

According Statisticsbrain.com, as of July 2014 Walmart’s sales stand at $405 billion and the total number of customers that visit per week is 100,000,000. 42% of purchases made by families earning less than $40,000 annually are made at Walmart. The data that Walmart has to handle is indeed tremendous. In its attempt to reach customers effectively and help them make better and more purchases Walmart developed tools to mine it’s big data.

Big Genome was one such product. This product helps capture information regarding what customers are looking for. It also analyses data available on social networking sites, and in-house purchase data to suggest products customers can buy. It offers the incentive of a discount as well. This tool helps understand customer relationships and understanding their preferences.

Their ShoppyCat application scours social networks to suggest products for customers and their loved ones. These two applications in tandem help use the data flowing into Walmart to propel future purchases. Walmart has an outfit, Walmart Labs that keeps coming up with newer ways to make sense of the large tracts of data they receive.

Companies and organisations are therefore looking at folks who can make sense of the hidden patterns and use it to the company’s benefit.

As we saw in the case of Walmart big data is helping companies reach out to customers in a more meaningful manner rather than just using mass one size fits all consumer message. It is using customer usage patterns to arrive at targeted marketing messages for them.

Customers are now more vocal than ever about their product and usage experiences. Applications can now go through this data and a data scientist can help the company make major or minor changes to the product based on their feedback.

Big data keeps the company up to speed regarding all changes in their industry. By keeping their ear to the ground, big data keeps tab of news reports and other data sources to do predictive analysis.

Data is precious. The latest big data applications help the company put in place a map of the data, not unlike a library. This data can be managed and appropriate level of access be given.

Data scientists have become hot property in the industry. Though the data is large, expertise is required to ask the right questions, look at the data and make connections and suggest changes. Data scientists need to have their foundation in computer science and applications, modeling, statistics, analytics and mathematics.

Data scientists will look at multiple sources of data of high speed and volume. They need to use their skill sets to spot trends and understand what they mean. The trend spotting needs to be explained to business managers and technical IT teams. A good data scientist will go beyond the data at hand. They need to be strong at scenario analysis and understanding how to use the data.

Data scientists have to draw from a variety of skills to do their work. First they need to be comfortable handling high volume scalable data from a variety of sources. They have to be conversant with data management tools like Map Reduce, Hadoop and algorithms. Data mining tools are also part of the data scientist’s arsenal. Statistic analytical models also form part of the Data Scientists foundation. They need to relay their analytical findings through visualization.

Python, R and SQL are highly recommended if you too would like to become a Data Scientist.

Finance Train Subscription

Unlock full access to Finance Train and see the entire library of member-only content and resources.