We all know that machine learning is transforming nearly every industry on the planet, and in this article we're going to look at the mathematical foundations of this revolution.
In the field of machine learning, linear algebra notation is to describe the parameters, weights, and structure of different algorithms. As a result, it is essential for any machine learning practitioner to develop a foundation of linear algebra.
This article is based on this course on the Mathematical Foundation for Machine Learning and AI, and is organized as follows:
 Scalars, Vectors, Matrices, and Tensors
 Vector and Matrix Norms
 Python Implementation of Vectors, Matrices, and Tensors
 Special Matrices and Vectors
 Eigenvalues, Eigenvectors, & Eigendecomposition
 Summary: Mathematics of Machine Learning
Having a grasp of these foundations are essential to building and deploying machine learning algorithms to solve real world problems.
This post may contain affiliate links. See our policy page for more information.
1. Scalars, Vectors, Matrices, and Tensors
The notation that follows is very important in machine learning as they're used in every deep learning algorithm today.
The inputs of neural networks are typically in the form of vectors or matrices and the output are also either scalars, vectors, matrices, or tensors.
Let's start with a few basic definitions:
Scalars
Scalars are a single number or value and is typically denoted with an $(x)$.
Vectors
Vectors are an array of numbers, either in a row or a column, and are identified by a single index. Vectors typically are denoted with a bold $(x)$.
Matrix
A matrix is a 2dimensional array of numbers, where each element is identified by two indices. A matrix is denoted with a capital and bold $(X)$.
Let's look at a few examples of scalars, vectors, and matrices:
Scalar: 5
Vector: $[𝟷 \ 𝟻 \ 𝟶]$ or
\begin{bmatrix}1 \\ 5 \\ 0\end{bmatrix}
Matrix:
\begin{bmatrix} 5 & 8 \\ 1 & 2 \\ 2 & 3 \end{bmatrix}
It's important to note that for matrices you can have any size of rows and columns.
For vectors and matrices these are typically indexed starting from 0.
In terms of dimensions, vectors are 1dimensional, matrices are 2dimensional, and dimensions are reported in a (Row, Column) format. For example, if we have 3 rows, and 5 columns we would saying this is a 3 x 5 matrix.
Now that we understand the basics, let's move on to some of the operations we can perform.
Matrix Operations
Matrix operations are frequently used in machine learning and can include addition, subtraction, and multiplication.
Matrix addition is an entrywise sum, meaning the addition of matrix A with matrix B is matrix C. A and B must have the same dimensions, and C will also have the same dimensions as A and B.
For example: $A + B = C$
$$\begin{bmatrix} 1 & 2 \\ 3& 4 \\ 5 & 6 \end{bmatrix} + \begin{bmatrix} 1 & 2 \\ 3& 4 \\ 5 & 6 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 6 & 8 \\ 10 & 12 \end{bmatrix}$$
Matrix subtraction works the same way as matrix addition  it is performed elementwise and the matrices must have the same dimensions.
Matrix multiplication is a bit different  the matrix product of A and B will be matrix C. A must have the same number of columns as B has rows.
Here is the equation to find the inputs for C:
\[C_{i, j} = \sum_k A_{i, k} B{k, j}\]
For example:
$$\begin{bmatrix} 1 & 2 \\ 3& 4 \\ 5 & 6 \end{bmatrix} * \begin{bmatrix} 1 & 2 \\ 3& 4 \end{bmatrix} = \begin{bmatrix} 7 & 10 \\ 15 & 22 \\ 23 & 34 \end{bmatrix}$$
Here are a few properties of matrix multiplication:
 Matrix multiplication is distributive: $A(B + C) = AB + AC$
 Matrix multiplication is associative: $A(BC) = (AB)C$
Matrix Transpose
One of the most important matrix operations in machine learning is the transpose.
The transpose of a matrix is an operator which flips a matix over its diagonal.
The indices of the rows and columns are switched in the transpose, for example:
$$\begin{bmatrix} a{00} & a{01} \\ a{10} & a{11} \end{bmatrix}^T = \begin{bmatrix} a{00} & a{10} \\ a{01} & a{11} \end{bmatrix}$$
We also aren't restricted to matrices, we can also take the transpose of a vector. In this case a row vector becomes a column vector.
Tensors
If we take a matrix one step further we get a tensor.
Matrices have two axes, but sometimes we need more than two axes.
For example, if we have a 2D matrix with indices $(i, j)$, a 3D tensor would have indices $(i, j, k)$
Here's a visual representation of a multidimensional tensor:
Upload

It gets harder to imagine, but we can have any number of dimensions in a tensor, although it does get more computationally expensive to handle.
2. Vector and Matrix Norms
Now that we're familiar with vectors and matrices, let's look at how they're actually used in machine learning.
The first way that vectors and matrices are used is with norms:
The magnitude of a vector can be measured using a function called a norm.
Norms are used in many ways in machine learning, but one example is measuring the loss function between a predicted and an actual point.
Here are a few important points about norms:
 Norms map vectors to nonnegative values
 The norm of a vector $x$ measures the distance from the origin to the point $x$
The general formula for a norm is the $L^P$ norm:
\[x_p = (\sum_i x_i^p)^{1/p}\]
When we fill in different values of $p$ in this equation we're going to get very different norms.
The Euclidean Norm
The most common value is $P = 2$, in which case we get the Euclidean norm, otherwise known as the $L^2$ norm:
\[x_2 = (\sum_i x_i^2)^{1/2}\]
The $L^1$ Norm
Another common value is the $L^1$ norm:
In cases where discriminating small, nonzero values and zero is important, the $L^1$ norm can be used.
\[x_1 = (\sum_i x_i\]
The $L^1$ norm increases linearly as elements of x increase.
The $L^1$ norm is useful in machine learning because it allows us to tell whether or not we have 0 or slightly nonzero values around the origin.
The Max Norm
The max norm, or the $L^\infty$ norm, is also frequently used in machine learning.
The max norm simplifies to the absolute value of the largest element in the vector.
\[x_\infty = \max_i x_i\]
The Frobenius Norm
We've been talking about using vectors, but when we have a matrix this is where the Frobenius norm comes in, which is analogous to the $L^2$ norm of a vector:
\[A_F = \sqrt{\sum_{i,j}A^2_{i,j}}\]
The Frobenius norm is used in machine learning frequently since it deals with matrices.
Now let's look at how we can use these norms to normalize a vector or a matrix and produce what are called "unit vectors".
3. Python Implementation of Vectors, Matrices, and Tensors
Let's now use Google Colab to implement vectors, matrices, and tensors in Python.
To get started we first want to import Numpy, the fundamental package for scientific computing with Python:
import numpy as np

Scalar
Now let's define a scalar, which is very easy to do:
# define a scalar x = 5

Vector
To define a vector we will use a Numpy array:
# define a vector x = np.array((1,2,3,4))

Let's print out the vector dimensions with x.shape
and the vector size
print('Vector Dimensions: {}.format(x.shape)) print('Vector Size: {}'.format(x.size))

Upload

Matrix
Next let's define a matrix with np.matrix
:
# define a matrix x = np.matrix([[1,2,3,],[4,5,6],[7,8,9]])

To define a matrix of a given dimension we can use either np.ones
, which will create a matrix with a value of 1, np.zeros
for a matrix of 0's, and then we can pass whichever dimensions we want. For example:
# define a matrix of a given dimension np.zeros((10,10))

Upload

Tensors
If we want to define a 3 dimensional tensor we can simply add a dimension:
# define a 3D tensor np.ones((3,3,3))

Upload

One thing to keep in mind is that as you add dimensions you get an exponential increase in the number of parameters, and when working with deep neural networks the parameters can get up to hundreds of millions of parameters.
Indexing
Let's first define a matrix A
of integers:
A = np.ones((5,5), dtype = np.int)

Indexing in Numpy starts at 0, so let's edit the first value:
A[0,0] = 2

Upload

Numpy uses the rowscolumns convention and we can assign an entire row with :
:
A[:, 0] = 3

Upload

Let's now add higher dimensions, which we do by adding an index:
A = np.ones((5,5,5)) # assgin first row a new value A[:,0,0] = 6

Upload

Matrix Operations
Let's now look at matrix operations, first we'll set A
to a 2x2 matrix and a matrix of ones for B
:
A = np.matrix([[1,2], [3,4]]) B = np.ones((2,2))

We can now do an element wise addition, subtraction, or multiplication of these matrices:
# Element wise addition C = A + B # Element wise subtraction C = A  B # Matrix multiplication C = A*B

Matrix Transpose
Let's create a 3x3 matrix  in Numpy you can create we can use reshape
, so if we have a linear array we can reshape it to 3x3:
A = np.array(range(9)) A = A.reshape(3,3)

To calculate the transpose of A
we just use .T
:
# Matrix transpose B = A.T

4. Special Matrices & Vectors
The matrices and vectors we're going to discuss occur more commonly than others and are particularly useful in machine learning.
Thea reason most of them are particularly useful in machine learning is because they're computationally efficient.
The matrices & vectors we're going to cover include:
 Diagonal matrices
 Symmetric matrices
 Unit vectors
 Normalization
 Orthogonal vectors
Diagonal Matrices
A matrix is diagonal if the following condition is true:
\[D_{i,j} = 0 \ for \ all \ i \neq j\]
Here's an example of a diagonal matrix, where all the entries are 0 except along the main diagonal:
$$\begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3\\ \end{bmatrix}$$
Diagonal matrices are useful in machine learning because multiplying by a diagonal matrix is computationally efficient.
Symmetric Matrices
A symmetric matrix is any matrix that is equal to its transpose:
\[A = A^T\]
The Unit Vector
A unit vector is a vector with unit norm:
\[X_2= 1\]
Vector Normalization
Normalization is the process of dividing a vector by its magnitude, which produces a unit vector:
\[\dfrac{x}{x_2} = unit \ vector\]
Normalization is a very common step in data preprocessing and has been shown to dramatically improve the performance of machine learning algorithms in many cases.
Orthogonal Vectors
A vector $x$ and a vector $y$ are orthogonal to each other if $x^Ty = 0$.
If two vectors are orthogonal and both vectors have a nonzero magnitude, they will be at a 90 degree angle to each other.
If two vectors are orthogonal and are also unit vectors, they are called orthonormal.
5. Eigenvalues, Eigenvectors, & Eigendecomposition
Now let's move on to the concept of eigendecomposition by breaking it down into eigenvalues and eigenvectors.
Eigendecomposition
Eigendecomposition is simply breaking mathematical objects into their constituent parts.
For example, integers could be decomposed into prime factors.
Similarly, we can decompose matrices in ways that reveal information about their functional properties that is not immediately obvious.
So in the process we take a matrix and decompose it into eigenvectors and eigenvalues.
Eigenvectors & Eigenvalues
The eigenvector of a square matrix $A$ is a nonzero vector $v$ such that multiplication by $A$ alters only the scale of $v$:
\[Av = \lambda v\]
Where:
 $v$ is the eigenvector
 $\lambda$ is a scalar, the eigenvalue corresponding to $v$
Let's come back to eigendecomposition.
Eigendecomposition
If a matrix $A$ has $n$ linearly independent eigenvectors, we can form a matrix $V$ with one eigenvector per column, and a vector $\lambda$ of all the eigenvalues.
The eigendecomposition of $A$ is then given by:
\[A = V diag (\lambda)V^{1}\]
One important property of eigendecomposition is that not every matrix can be decomposed into eigenvalues and eigenvectors.
The main motivation for understanding eigendecomposition in the context of machine learning is that it is used in principle components analysis (PCA).
6. Summary: Mathematics of Machine Learning  Linear Algebra
In this article we reviewed the mathematical foundation of machine learning: linear algebra.
We first defined scalars, vectors, matrices, and tensors. As discussed, the input of a neural network is typically in the form of a vector or matrix, and the output is either a scalar, vector, matrix, or tensor.
After that we looked at vector and matrix norms, and as mentioned:
The magnitude of a vector can be measured using a function called a norm.
We then looked at how to implement these concepts in Python and how to use indexing, matrix operations, and the matrix transpose.
Next we looked at a few special matrices and vectors that occur more commonly and are particularly useful in machine learning. These include:
 Diagonal matrices
 Symmetric matrices
 Unit vectors
 Normalization
 Orthogonal vectors
Finally we looked at eigenvalues, eigenvectors, and eigendecomposition.
As discussed:
Eigendecomposition is simply breaking mathematical objects into their constituent parts.
Although each one of these topics could be expanded on significantly, the goal of the article is to provide an introduction to concepts that frequently come up in machine learning.
If you want to learn more about any of these subjects, check out resources below.