Introduction to Python for Finance and Algorithmic Trading

In this guide, we'll discuss the application of the Python programming language to quantitative finance and algorithmic trading.

This guide is based on notes from this course Python for Financial Analysis and Algorithmic Trading and is organized as follows:

  1. Review of Python Programming
  2. Key Python Library: NumPy
  3. Key Python Library: Pandas

1. Review of Python Programming

Of course, this article isn't meant to be a complete review of the Python programming language, instead, it is just enough to get started with Python for finance.

Fundamental Data Types

First, let's review fundamental data types in Python.

Numbers

# addition
1 + 1
# subtraction
2 - 1
# multiplication
2 * 2
# division
1 / 2
# exponents
2**3
# modulo operator 
5%2
# order of operations
(2+2) * (5-3)

Variables

Strings

Lists

  • Use square brackets
  • Can hold different data types
  • Lists are mutable
  • Python is 0 indexed
  • You can also use negative indexing
  • Lists can be nested
# list
[1,2,3]

# different data types
['one', 2, 3]

# append item to a list
my_list.append(4)

# zero indexing
my_list[0]

# negative indexing
my_list[-1]

# slicing a list - up to but not including index 2
my_list[0:2]

# nested list
nested = [1,2,['a','b']]

Stay up to date with AI

We're an independent group of machine learning engineers, quantitative analysts, and quantum computing enthusiasts. Subscribe to our newsletter and never miss our articles, latest news, etc.

Great! Check your inbox and click the link.
Sorry, something went wrong. Please try again.

Dictionaries

  • Dictionaries are implemented using hash table in Python that consists of a key-value pair
  • Dictionaries don't retain any order, instead, it's acting as a hash table. So instead of storing things as a sequence of items, it stores them as key-values
  • If you want to get a value, you pass in the key (unlike a list where you pass in the index)
dictionary = {'key':1, 'key2':2}
dictionary['key2']

Booleans

  • True
  • False

Tuples

  • Similar to lists but use paratheses
  • The key difference is that tuples are immutable, and a list is mutable
  • t = (1,2,3)

Below you can see there is an error if we try and add a variable to a tuple:

Sets

  • Sets use curly braces like dictionaries, but you just pass in the item—there's no key-value here
  • The thing to remember about a set is that it's an unordered collection of unique items
  • If you pass in multiple of the same element, it's only going to get the unique instances

Comparison and Logical Operators

Comparison Operators

  • Things like < > = that allows you to compare 2 items
  • Returns a boolean

Logical Operators

  • Use keyword and, or, and not (to check the opposite condition)

Control Flow in Python

  • Python makes use of whitespace, which is used to denote blocks. In other languages curly brackets ({ and }) are common. When you indent, it becomes a child of the previous line.
  • Python uses if, elif, and else for control flow
For Loops
  • For loops allow you to iterate through some iterable sequence and then execute actions for every element in the sequence

While loops

  • While loops are loops of code that continue executing while a condition is true
Built-in functions
  • range() - a generator for generating a sequence or list of integers

List Comprehension

  • Another way of rewriting or flattening a for loop that builds out a list

Functions, Lambda Expressions, Map & Filter, and Methods

Functions

To create a function in Python we start with def and then the function name.

Recall that a function is something that allows you to execute a block of code over and over, without having to retype that block.

def my_func():
    print('hello')

Or we can also pass in a parameter and return a value, as shown below:

def func_two(param):
    return param**5


Lambda Expressions

Sometimes you won't want to define a function, for example when you're using the built-in map and filter functions.

In that case, you will want to use a lambda expression.

A lambda function is known as an anonymous function and allows you to quickly create a function you can use one time, and it has no name.

To create a lambda function we use the lambda variable with a variable, var in the example below. This example shows you how to quickly multiply a variable by 2.

Here are the steps to convert a function into a lambda expression:

  • remove def, add lambda
  • remove function name + parentheses
  • instead of returning we just put what we want returned after the semi-colon
  • lambda var: var*2

Map and Filter

Let's see why a lambda expression would be useful.

The built-in function map maps some a function to an iterable sequence.

For example, let's say if we have a sequence and we want to use our lambda expression from earlier, here's what we would do:

The built-in filter function takes in a function and applies it to an iterable, but the difference is it returns an iterator yielding those items for which function(item) is true.

Let's look at an example of a function that checks if our sequence is even or not, both as a function and a lambda expression.

Useful Methods

Useful methods for string include:

  • str.lower()
  • str.upper()
  • str.split()

Useful methods for dictionaries include:

  • dictionary.keys()
  • dictionary.items()

Useful methods for lists include:

  • list.append()
  • list.pop()

Finally we can check if an element is in a list with the in keyword.

2. NumPy

NumPy, or Numerical Python, is one of the most fundamental libraries for quantitative analysis.

If you haven't already installed it you can do so with conda install numpy or pip install numpy.

NumPy is a numerical library that allows for fast data generation and data handling.

NumPy uses arrays that efficiently store data much more efficiently than the built-in Python list.

Let's look at a few of the most common functions and methods in quantitative analysis.

NumPy Arrays

NumPy arrays can either be vectors or matrices - vectors are 1D arrays and matrices are 2D arrays (but a matrix can still have only 1 row/column).

After we import numpy as np, one of the ways we can create an array is by casting a list to an array:

We can also build a matrix by creating a nested list. After calling np.array we get back an array but the dimensionality has been taken effect when displaying the output.

Let's now generate arrays instead of casting a list or matrix—recall that we have the built-in range() function.

The NumPy version of range() is np.arange().

We can generate arrays of 0's and 1's using np.zeros() and np.ones(), which returns an array of floating point numbers.

We can also create a 2D+ array by passing in a tuple of dimensions.

If we want to return evenly spaced numbers over a specified interval we use np.linspace() which takes in a start, stop, and number, and returns evenly spaced numbers over the interval.

This is different from steps in range as in this case, we specify how many numbers we want in between our start and stop.

NumPy's Random Library

NumPy has many ways to create random number arrays.

When working with financial data and want to randomly model something (like a Monte Carlo simulation, for example), we'll use the np.random modules.

There are many np.random modules, but below are some of the most common.

np.random.rand() creates an array of a given shape and populates it with random samples from a uniform distribution over [0,1].

Uniform distribution just means all numbers between 0 and 1 have an equal probability of being picked.

We can also use it to create a matrix of random numbers between 0 and 1.

We can use np.random.randn() to return a random sample with standard normal distribution.

Standard normal distribution, also known as Gaussian distribution, has a mean of 0 and variance of -1—meaning the closer you are to 0, the more likely the random number will be picked.

source

We can use np.random.randint() to return a random integer from low (inclusive) to high (exclusive), and can also specify the size of the array as 3rd argument.

Useful Numpy Array Attributes & Methods

We can reshape an array with .reshape(), which returns an array containing the same data but with a new shape.

The .shape attribute gives use the shape of the array and we can also check the data type with the .dtype attribute:

4 Key Numpy Methods

  • .max() for returning the highest number in the array
  • .argmax() to get the index of the max number
  • .min() for returning the lowest number
  • .argmin() to get the index of the min number

Numpy Operations

Let's now review arithmetic operations with numpy arrays.

You can easily perform array-wise arithmetic, which is done on an element-by-element basis.

We can also perform arithmetic with scalar values, which broadcasts the value to every single element in the array.

On top of arithmetic, we can use NumPy's many built-in universal array functions, which are mathematical operations we can use to perform the operation across every element in the array.

A few common methods include np.sqrt() to take the square root of everything in the array and np.exp() to calculate the exponential of all elements in the array.

Any basic mathematic function you can think of is built in here.

NumPy for Indexing and Selection

Let's now talk about bracket indexing and selection

The way is to pick one (or several) elements of an array is very similar to what we did with Python lists with square bracket notation arr[10], etc.

Bracket indexing and selection work the same as a Python list, NumPy is different because of its ability to broadcast.

Conditional Selection

We can get elements of an array based on a comparison operator, and this returns an array of boolean values.

We can assign that to a bool_arr variable and pass it into our original array and it will only return the values that satisfy the condition.

We could also just put our condition inside of square brackets.

3. Pandas

Now that we have the basics of Python and NumPy, we can learn about Pandas.

What is Pandas?

Named after "Panel-Data", Pandas was creating by Wes McKinney from AQR Capital Management.

Pandas was originally created to help work with datasets in Python for quantitative finance. Since Pandas wasn't part of their core business at AQR, McKinney open-sourced it in 2009.

Pandas has a fast and efficient DataFrame object for data manipulation with integrated indexing.

DataFrames will be our main workhorse when dealing with financial datasets.

Pandas also has tools for reading and writing data in-memory data structures and in different formats.

Series

Series are similar to NumPy arrays, but instead of giving it a numerical index, we give them a named or datatime index.

We're going to use series later to on work with datasets.

Let's look at how we can create a list using the objects we created.

To create a series from a list you can just pass it in—the two main arguments we pass in are the data and the index.

import numpy as np
import pandas as pd

# create list called labels
labels = ['a','b','c']

# list of numbers
my_list = [10,20,30]

# numpy array
arr = np.array([10,20,30])

# dictionary
d = {'a': 10,'b':20,'c':30}


Now we have labels matched up to our list, that is the power of Series.

Let's go over how to use an index with Series, which is fundamental for understanding how a series works.

DataFrames

DataFrames are built on top of the Series object we just discussed.

Let's look at the DataFrame command:

  • It takes in data argument and index argument (like Series), but also has an additional columns argument
  • Each of the columns is a Pandas Series, and they all share an index

We can add a column by specifying it as if it already exists, and then set it to the value we want—in this case, we're adding up the W and X column.

We can drop a column with df.drop() and passing in the column as well as axis=1, and we also need to set inplace=True.

We can access a row in two ways:

  • Either with df.loc[] and the row
  • Or with df.iloc[] and the index of the row

We can use conditional selection with Pandas, which is very similar to what we did with NumPy.

In order to define multiple conditions in a DataFrame we use & (not to be confused with and) and | (to be used instead of or):

GroupBy

Pandas has a very powerful feature called GroupBy, which lets you aggregate multiple rows and into one singular value.

The GroupBy method lets you group rows based off a column and perform an aggregate function on them.

Here's an example from the documentation:

Merge, Join, and Concatenate

Learning how to join 2 DataFrames is an important skill to have, especially when dealing with datasets from different sources:

Pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.

Here's a simple example of pd.concat() from the documentation:

For more guides on getting started with Pandas, check out 10 Minutes to pandas and this more advanced pandas recipes notebook.

4. Summary: Python for Finance

In this introductory guide, we provided a quick review of the Python programming language, as well as two key Python libraries: NumPy and Pandas.

If you want to learn more about Python for Finance, check out our guides below on the subject:

Resources