
Python and Data Visualization: Matplotlib and Beyond
Matplotlib: Introduction
Matplotlib is a powerful library for creating static, animated, and interactive visualizations in Python. It provides a wide range of functions and methods for creating various types of graphs, plots, and charts.
Installation
To install Matplotlib, we can use the following command:
pip install matplotlib
Basics of Matplotlib
Let’s start by importing Matplotlib and creating a basic line graph. We will plot the sales data for a fictional company over a period of time.
import matplotlib.pyplot as plt # Sales data months = ['Jan', 'Feb', 'Mar', 'Apr', 'May'] sales = [10000, 15000, 12000, 18000, 20000] # Create a line graph plt.plot(months, sales) # Add labels and title plt.xlabel('Months') plt.ylabel('Sales') plt.title('Monthly Sales Report') # Display the graph plt.show()
We begin by importing the matplotlib.pyplot
module, which provides a MATLAB-like interface for creating plots. Next, we define the sales data for each month using two lists: months
and sales
.
To create a line graph, we use the plot()
function and pass in the months
list as the x-axis values and the sales
list as the y-axis values. The resulting graph will have the months on the x-axis and the corresponding sales on the y-axis.
We add labels to the x-axis and y-axis using the xlabel()
and ylabel()
functions, respectively. We also set a title for the graph using the title()
function.
Types of Graphs
Matplotlib provides various types of graphs and plots to visualize different types of data. Let’s explore a few of them:
Bar Graphs
A bar graph is a great way to represent categorical data or compare multiple categories. Let’s create a bar graph to compare the revenue generated by different product categories for a retail company.
import matplotlib.pyplot as plt # Product categories categories = ['Electronics', 'Clothing', 'Books', 'Home'] # Revenue data revenue = [5000, 7000, 3000, 4000] # Create a bar graph plt.bar(categories, revenue) # Add labels and title plt.xlabel('Product Categories') plt.ylabel('Revenue ($)') plt.title('Revenue by Product Category') # Display the graph plt.show()
In this example, we have four product categories: Electronics, Clothing, Books, and Home. The corresponding revenue data is stored in the revenue
list.
We use the bar()
function to create a bar graph, where the x-axis represents the categories and the y-axis represents the revenue. We pass in the categories
list as the x-axis values and the revenue
list as the y-axis values.
Similar to the line graph example, we add labels to the x-axis and y-axis using the xlabel()
and ylabel()
functions, respectively. We also set a title for the graph using the title()
function.
Finally, we use the show()
function to display the graph.
Pie Charts
A pie chart is useful for representing proportions or percentages. Let’s create a pie chart to visualize the market share of different smartphone brands.
import matplotlib.pyplot as plt # Smartphone brands brands = ['Apple', 'Samsung', 'Huawei', 'Xiaomi', 'Others'] # Market share market_share = [30, 25, 15, 10, 20] # Create a pie chart plt.pie(market_share, labels=brands, autopct='%1.1f%%') # Add title plt.title('Market Share of Smartphone Brands') # Display the chart plt.show()
In this example, we have five smartphone brands: Apple, Samsung, Huawei, Xiaomi, and Others. The market share data for each brand is stored in the market_share
list.
We use the pie()
function to create a pie chart, where the sizes of the wedges represent the market share of each brand. We pass in the market_share
list as the data values and the brands
list as the labels for each wedge. The autopct='%1.1f%%'
parameter is used to display the percentage value for each wedge.
We set a title for the pie chart using the title()
function.
Beyond Matplotlib
While Matplotlib is a powerful library for data visualization, there are several other libraries that offer additional functionalities and aesthetic options. Let’s explore a few of them:
Seaborn
Seaborn is built on top of Matplotlib and provides a high-level interface for creating attractive and informative statistical graphics. It simplifies many tasks by automatically applying appropriate settings and themes. Let’s create a box plot to visualize the distribution of student scores in different subjects.
import seaborn as sns # Student scores math_scores = [80, 95, 70, 85, 90] science_scores = [75, 80, 85, 90, 95] english_scores = [85, 80, 75, 90, 95] # Create a box plot sns.boxplot(data=[math_scores, science_scores, english_scores]) # Set labels and title plt.xlabel('Subjects') plt.ylabel('Scores') plt.title('Distribution of Student Scores') # Display the plot plt.show()
In this example, we have three subjects: Math, Science, and English. The scores achieved by each student in these subjects are stored in separate lists: math_scores
, science_scores
, and english_scores
.
We use the boxplot()
function from Seaborn to create a box plot. We pass in the data as a list of lists, where each list represents the scores for a particular subject.
We set the labels for the x-axis and y-axis using the xlabel()
and ylabel()
functions, respectively. We also set a title for the plot using the title()
function.
Finally, we use the show()
function to display the plot.
Plotly
Plotly is an open-source library for creating interactive plots and dashboards. It offers a wide range of charts and graphs with built-in interactivity and animation capabilities. Let’s create an interactive scatter plot to visualize the relationship between two variables.
import plotly.express as px # Sample data x = [1, 2, 3, 4, 5] y = [2, 4, 1, 3, 5] # Create a scatter plot fig = px.scatter(x=x, y=y) # Set title fig.update_layout(title='Scatter Plot') # Display the plot fig.show()
In this example, we have two variables: x and y. We define the values for these variables as lists.
We use the scatter()
function from Plotly Express to create a scatter plot. We pass in the x and y values as arguments.
We set a title for the plot using the update_layout()
function.
Finally, we use the show()
method of the fig
object to display the plot.