- Pandas - Install Python and Pandas
- Basic Data Structures in Pandas
- Loading and Saving Data using Pandas
- Exploring Data using pandas
- Correlation Analysis using pandas
- Handling Categorical Data and Unique Values using pandas
- Data Visualization using pandas
- Handling Missing Data in Python
- Strategies for Handling Missing Data
- Handling Missing Data - Example - Part 1
- Handling Missing Data - Example - Part 2
- Handling Missing Data - Example - Part 3 (Non-numeric Values)
- Handling Missing Data - Example - Part 4
- Data Transformation and Feature Engineering
- Converting Data Types in Python pandas
- Encoding Categorical Data in Python pandas
- Handling Date and Time Data in Python pandas
- Renaming Columns in Python pandas
- Filtering Rows in a DataFrame in Python
- Merging and Joining Datasets in Python pandas
- Sorting and Indexing Data for Efficient Analysis in Python
Data Visualization using pandas
Beyond simply listing counts and unique values, visualization can greatly aid in comprehending the categorical distribution. Plots can reveal outliers or data errors that aren't always obvious in tables.
We will plot the sector counts that we calculated earlier.
# Visualize the sector distribution
Let’s also create some visualizations on our stocks data. We can plot a time series of closing stock prices of Qualcomm using the following line of code:
# Plotting the closing stock prices
qcom_df['Close'].plot(title='Historical Closing Prices')
This line chart tells us so much more about the fluctuations in the stock’s daily prices. We can also see that the stock is currently on an upward trend. We can also create a histogram of the stock volume.
# Plotting the closing stock prices
A histogram, for example, can immediately show the distribution of trading volumes. Let’s infer some insights from it.
Central Tendency: The bulk of the data seems to cluster on the left side of the histogram, suggesting that on most days, the trading volume is lower rather than higher. Specifically, the trading volume frequently seems to fall below the 0.5 x 10^7 mark (which is 5 million if we interpret the scientific notation correctly).
Skewness: The distribution is right-skewed, with a tail extending towards the higher trading volumes. This indicates that there are days with unusually high trading volumes, but these are less frequent.
Outliers: The bars on the far right, separated from the cluster of other bars, suggest that there have been days with particularly high trading volumes that are outliers when compared to the typical trading volume.
Volatility Indication: Days with exceptionally high trading volume can be associated with significant news or events affecting the company, such as earnings reports, product announcements, or broader market volatility.
Liquidity: The consistent presence of bars—even if small—across the volume range suggests that Qualcomm's stock has a liquid market with transactions occurring at various volume levels.
Volume Peaks: There are noticeable peaks within the distribution, particularly in the lower volume range. These peaks may represent common volume levels at which trades typically consolidate.
In this section, we've completed our introductory exploration of pandas. We began by introducing pandas and its role in data analysis. This was followed by a discussion on how to install pandas and set up your development environment. We then examined the basic data structures in pandas, namely Series and DataFrame, and explored how to load and save data using these structures. Finally, we touched on basic techniques for exploring your data, including how to generate summary statistics, and create basic visualizations.
In the next section, we will learn about how to manipulate data using pandas, starting with the data cleaning and preprocessing techniques.
Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.