Data visualization is an essential step in quantitative analysis with Python.

There are many tools at our disposal for data visualization, and the topics we will cover in this guide include:

  1. Matplotlib
  2. Pandas
  3. Time Series Visualization
  4. Seaborn
  5. Plotly & Dash

Of course this guide cannot be comprehensive with regard to data visualization using Python, instead it aims to provide an overview for the most basic and important capabilities for finance.

Let's get started with the Grandfather of Data Visualization: matplotlib.

1. Matplotlib

Matplotlib has established itself as the benchmark for data visualization, and  is a robust and reliable tool.

As this Python for Finance textbook describes:

It is both easy to use for standard plots and flexible when it comes to more complex plots and customizations. In addition, it is tightly integrated with NumPy and the data structures that it provides.

Matplotlib is modeled after MatLab's plotting capabilities, and creates static image files of almost any plot type.

Let's look at a few of the main types of plots we can create with Matplotlib from their gallery:

Let's start with a simple example with 2 numpy arrays.

In this example we're setting x as a linearly spaced numpy array, with 10 numbers between 0 and 10 exclusive.

We then set y to x**2

We can plot with matplotlib in two different ways:

  • Functional method
  • Object-oriented method

Functional Method

With the functional method we just call plt.plot() and then pass in x and y.

# functional method

We can also create 2 subplots with plt.subplot():

# create 2 subplots
plt.subplot(1,2,1)
plt.plot(x,y,'r')
plt.subplot(1,2,2)
plt.plot(x,y,'b')

Object-Oriented Method

We can also create plots in matplotlib in an object-oriented way.

To do this we first create a fig object, then we add axes the canvas, and finally we plot on the axes.

# create a figure object
fig = plt.figure()

# add axes to the canvas
# left, bottom, width, height
axes = fig.add_axes([0.1,0.1,1,1])

# next we plot on the axes
axes.plot(x,y)
axes.set_xlabel('X Label')
axes.set_ylabel('Y Label')
axes.set_title('OOP Method')

We can also create a plot within our canvas by passing in a list of values to fig.add_axes() - the list we're passing in is: left, bottom, width, and height.

# create plot within canvas 
fig = plt.figure()
axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8])
axes2 = fig.add_axes([0.2, 0.4, 0.5, 0.4])
axes1.plot(x,y)
axes2.plot(y,x)

We can add a legend by specifying labels in ax.plot() for ax.legend() to reference.

# add legend
fig = plt.figure()

ax = fig.add_axes([0,0,1,1])
ax.plot(x, x**2, label='X Squared')
ax.plot(x, x**3, label='X Cubed')
ax.legend()

Finally we can save the figure with fig.savefig() and then passing in the location and file type to save to.

# save figure
fig.savefig('my_plot.png')

Change Plot Appearance

We can change our plot's appearance in many ways, but here are a few examples:

# change color of plot
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.plot(x,y,color='green')
# change linewidth
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.plot(x,y,color='purple', linewidth=10, linestyle='--')

That's it for our introduction to matplotlib, but if you want to see more examples check out these tutorials.

2. Pandas

The main purpose of pandas is data analysis, but as we'll see pandas has amazing visualization capabilities.

If you set your DataFrame right you can create pretty much any visualization with a single line of code.

Pandas uses matplotlib on the backend through simple .plot calls. The plot method on Series and DataFrame is just a simple wrapper around plt.plot().

Pandas does, however, have a limited scope of plot types, and they are all static.

Let's look at a few examples:

import pandas as pd
import numpy as np

# Basic Plotting 
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))ts = ts.cumsum()ts.plot()

We can also easily create a histogram:

# histogram
ts.plot.hist(bins=30)

3. Time Series Visualization

Before moving on to other libraries, let's take a look at time-series visualization with pandas and matplotlib.

To demonstrate this let's downloaded stock data for TSLA from May 1st, 2018 to May 1st, 2019 from Yahoo Finance.

When we read in our data with pd.read_csv() we want to pass in index_col='Date', and parse_dates=True.

It's important to note that we don't want to plot the entire DataFrame since the Volume column is on such a different scale than the other columns.

Let's instead plot the adjusted close and volume on the their own with df['Adj Close'].plot() and df['Volume'].plot().

We can just plot a specific month by setting xlim argument to a list or tuple.

# plot January 2019
df['Adj Close'].plot(xlim=['2019-01-01', '2019-02-01'])

4. Seaborn

Another common visualization library is Seaborn.

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Here are a few examples from their Gallery:

Let's look at an example of visualizing linear relationships with regression.

Two main functions in seaborn are used to visualize a linear relationship as determined through regression. These functions, regplot()and lmplot() are closely related, and share much of their core functionality.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(color_codes=True)
tips = sns.load_dataset("tips")
sns.regplot(x="total_bill", y="tip", data=tips);

5. Plotly & Dash

All of the plots we've seen so far are static - that is, once you create them you can't interact with the plot in any way.

This is what Plotly solves.

Plotly is both a company and an open source library.

Plotly the company focuses on data visualization for business intelligence, and the open source library is a general data visualization library that specializes in interactive visualizations.

Plotly has libraries for JavaScript, React, R, and Python - but we'll stick with Python in this guide.

Using the plotly python library creates interactive plots as .html files.

Users can interact with these plots (zoom in, select, hover, etc) - but one of the limitation is that these plots can't be connected to changing data sources.

Once the plot is generated, the data is essentially locked-in at that time, and in order to regenerate a plot to see updates you need to re-run the .py script.

This is where Plotly's Dash comes in.

Often users want plots to be able to interact with each other, interact with components, or have the plot update in real time.

To do this, we need a dashboard.

Dash is an open-source library that lets you create a full dashboard with components, interactivity, and multiple plots.

Instead of creating a .html file, Dash produces a dashboard web application at your local host, which you can then visit and interact with.

Since Dash renders a full web app we can also deploy them online.

Here's an example from their Github of a Dash app that's styled to look like a PDF report:

And here's an example a Dash app for forex trading:

Summary: Data Visualization with Python

As we've seen, Python has many data visualization libraries including Matplotlib, Pandas, Seaborn, and Plotly.

Most of these are static visualization libraries, but open-source library Plotly lets you create interactive images, and Dash lets you create dashboard web applications.

Have any questions about data visualization with for finance?

Let us know in the comments below.

Resources