While working on data science projects, you will come across data from many data sources in a variety of formats. It is important for a data scientist to have a solid understanding of the various data sources, data formats, how to bring data into R and how to clean the data for statistical analysis. In the next few lessons, we will focus on how to import data from the following data sources:
1. Flat Files
Flat files are spreadsheet-style files with data stored in rows and columns having one record per line. We see flat files every day in the form of comma-separated value files (CSV) and tab-delimited value files.
2. Excel Files
These are the most familiar types of files for all financial professionals. In quantitative finance, both R and Excel are the basic tools for any type of analysis. While in this course, we will learn about how to import excel files into R, in future courses we will also learn about how we can use Excel in conjunction with R to perform data analysis.
This involves importing data from various types of databases such as MySQL, MSSQL, Oracle, etc.
Nowadays a lot of information resides on the web and data scientists need to work with this data. We will learn about how to access and import data from the web using APIs, and other web protocols.
5. Statistical Software Packages
Sometimes you may have to import data from other statistical software packages such as SAS, STATA and SPSS. Each of them has their own file format and we will learn about how to import data in these formats.
In the next lesson, we will begin by learning to import flat files.