Data visualization is an essential step in quantitative analysis with Python.
There are many tools at our disposal for data visualization and the topics we will cover in this guide include:
- Time Series Visualization
- Plotly & Dash
This article is based on notes from this course on Python for Financial Analysis and Algorithmic Trading. Of course this guide cannot be comprehensive with regard to data visualization using Python, instead it aims to provide an overview for the most basic and important capabilities for finance. Let's get started with the grandfather of data visualization libraries: matplotlib.
This post may contain affiliate links. See our policy page for more information.
Matplotlib has established itself as the benchmark for data visualization, and is a robust and reliable tool.
As this Python for Finance textbook describes:
It is both easy to use for standard plots and flexible when it comes to more complex plots and customizations. In addition, it is tightly integrated with NumPy and the data structures that it provides.
Matplotlib is modeled after MatLab's plotting capabilities, and creates static image files of almost any plot type.
Let's look at a few of the main types of plots we can create with Matplotlib from their gallery:
Let's start with a simple example with 2 numpy arrays.
In this example we're setting x as a linearly spaced numpy array, with 10 numbers between 0 and 10 exclusive.
We then set
We can plot with matplotlib in two different ways:
- Functional method
- Object-oriented method
With the functional method we just call
plt.plot() and then pass in
# functional method
We can also create 2 subplots with
# create 2 subplots plt.subplot(1,2,1) plt.plot(x,y,'r') plt.subplot(1,2,2) plt.plot(x,y,'b')
We can also create plots in matplotlib in an object-oriented way.
To do this we first create a
fig object, then we add axes the canvas, and finally we plot on the axes.
# create a figure object fig = plt.figure() # add axes to the canvas # left, bottom, width, height axes = fig.add_axes([0.1,0.1,1,1]) # next we plot on the axes axes.plot(x,y) axes.set_xlabel('X Label') axes.set_ylabel('Y Label') axes.set_title('OOP Method')
We can also create a plot within our canvas by passing in a list of values to
fig.add_axes() - the list we're passing in is: left, bottom, width, and height.
# create plot within canvas fig = plt.figure() axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8]) axes2 = fig.add_axes([0.2, 0.4, 0.5, 0.4]) axes1.plot(x,y) axes2.plot(y,x)
We can add a legend by specifying labels in
ax.legend() to reference.
# add legend fig = plt.figure() ax = fig.add_axes([0,0,1,1]) ax.plot(x, x**2, label='X Squared') ax.plot(x, x**3, label='X Cubed') ax.legend()
Finally we can save the figure with
fig.savefig() and then passing in the location and file type to save to.
# save figure fig.savefig('my_plot.png')
Change Plot Appearance
We can change our plot's appearance in many ways, but here are a few examples:
# change color of plot fig = plt.figure() ax = fig.add_axes([0,0,1,1]) ax.plot(x,y,color='green')
# change linewidth fig = plt.figure() ax = fig.add_axes([0,0,1,1]) ax.plot(x,y,color='purple', linewidth=10, linestyle='--')
That's it for our introduction to matplotlib, but if you want to see more examples check out these tutorials.
The main purpose of pandas is data analysis, but as we'll see pandas has amazing visualization capabilities.
If you set your DataFrame right you can create pretty much any visualization with a single line of code.
Pandas uses matplotlib on the backend through simple .plot calls. The plot method on Series and DataFrame is just a simple wrapper around
Pandas does, however, have a limited scope of plot types, and they are all static.
Let's look at a few examples:
import pandas as pd import numpy as np # Basic Plotting ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))ts = ts.cumsum()ts.plot()
We can also easily create a histogram:
# histogram ts.plot.hist(bins=30)
3. Time Series Visualization
Before moving on to other libraries, let's take a look at time-series visualization with pandas and matplotlib.
To demonstrate this let's downloaded stock data for TSLA from May 1st, 2018 to May 1st, 2019 from Yahoo Finance.
When we read in our data with
pd.read_csv() we want to pass in
It's important to note that we don't want to plot the entire DataFrame since the Volume column is on such a different scale than the other columns.
Let's instead plot the adjusted close and volume on the their own with
df['Adj Close'].plot() and
We can just plot a specific month by setting xlim argument to a list or tuple.
# plot January 2019 df['Adj Close'].plot(xlim=['2019-01-01', '2019-02-01'])
Another common visualization library is Seaborn.
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Here are a few examples from their Gallery:
Let's look at an example of visualizing linear relationships with regression.
Two main functions in seaborn are used to visualize a linear relationship as determined through regression. These functions,
lmplot() are closely related, and share much of their core functionality.
import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(color_codes=True) tips = sns.load_dataset("tips") sns.regplot(x="total_bill", y="tip", data=tips);
5. Plotly & Dash
All of the plots we've seen so far are static - that is, once you create them you can't interact with the plot in any way.
This is what Plotly solves.
Plotly the company focuses on data visualization for business intelligence, and the open source library is a general data visualization library that specializes in interactive visualizations.
Using the plotly python library creates interactive plots as .html files.
Users can interact with these plots (zoom in, select, hover, etc) - but one of the limitation is that these plots can't be connected to changing data sources.
Once the plot is generated, the data is essentially locked-in at that time, and in order to regenerate a plot to see updates you need to re-run the .py script.
This is where Plotly's Dash comes in.
Often users want plots to be able to interact with each other, interact with components, or have the plot update in real time.
To do this, we need a dashboard.
Dash is an open-source library that lets you create a full dashboard with components, interactivity, and multiple plots.
Instead of creating a .html file, Dash produces a dashboard web application at your local host, which you can then visit and interact with.
Since Dash renders a full web app we can also deploy them online.
Here's an example from their Github of a Dash app that's styled to look like a PDF report:
And here's an example a Dash app for forex trading:
Summary: Data Visualization with Python
As we've seen, Python has many data visualization libraries including Matplotlib, Pandas, Seaborn, and Plotly.
Most of these are static visualization libraries, but the open-source library Plotly lets you create interactive images, and Dash lets you create dashboard web applications.