- Pandas - Install Python and Pandas
- Basic Data Structures in Pandas
- Loading and Saving Data using Pandas
- Exploring Data using pandas
- Correlation Analysis using pandas
- Handling Categorical Data and Unique Values using pandas
- Data Visualization using pandas
- Handling Missing Data in Python
- Strategies for Handling Missing Data
- Handling Missing Data - Example - Part 1
- Handling Missing Data - Example - Part 2
- Handling Missing Data - Example - Part 3 (Non-numeric Values)
- Handling Missing Data - Example - Part 4
- Data Transformation and Feature Engineering
- Converting Data Types in Python pandas
- Encoding Categorical Data in Python pandas
- Handling Date and Time Data in Python pandas
- Renaming Columns in Python pandas
- Filtering Rows in a DataFrame in Python
- Merging and Joining Datasets in Python pandas
- Sorting and Indexing Data for Efficient Analysis in Python
Exploring Data using pandas
When you first load your data, it's important to perform initial checks to understand its structure, content, and the type of data it contains.
Viewing Data
Here's how you can take a peek at your DataFrame:
# Display the first five rows of the stocks DataFrame
print(stocks_df.head())
# Display the last five rows of the sDataFrame
print(stocks_df.tail())
financials_df.head() displays the first few rows and can immediately flag missing data or anomalies.
financials_df.tail() shows you the end of the dataset, often revealing how recent the data is and whether it's been truncated.
Data Structure
An understanding of your DataFrame's structure is essential before diving into deeper analysis.
We can use stocks_df.info() to get a summary of the DataFrame, including the number of non-null entries and data types of each column. This can highlight if certain columns contain missing values that need to be addressed or if data types need conversion.
# Print a concise summary of the DataFrame
print(stocks_df.info())
Descriptive Statistics
Descriptive statistics provide a high-level summary of the attributes of your dataset
- stocks_df.describe() gives a statistical summary for numerical columns, useful for a quick assessment of distribution and variability.
- Custom aggregations like stocks_df [' GOOGL '].mean() help in understanding specific aspects like the average.
# Get a statistical summary
print(stocks_df.describe())
# Find the average price for Google
print(stocks_df['GOOGL'].mean())
Aggregation
For more specific summary statistics, you can use aggregation methods like mean(), median(), min(), max(), and sum():
# Calculate the average opening price
print(stocks_df['MSFT'].mean())
# Find the maximum closing price
print(stocks_df['MSFT'].max())
The result The result will be 61.96290836653386 and 72.52.
Related Downloads
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $29 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.