**(C) 2018-2019 by Damir Cavar**

**Version:** 1.1, September 2019

**Download:** This and various other Jupyter notebooks are available from my GitHub repo.

*Linear Algebra Review and Reference* by Zico Kolter (updated by Chuong Do) from September 30, 2015. This means, many passages are literally copied, many are rewritten. I do not mark sections that are added or different. Consider this notebook a extended annotation of Kolter's (and Do's) notes. See also James E. Gentle (2017) Matrix Algebra: Theory, Computations and Applications in Statistics. Second edition. Springer. Another good resource is Philip N. Klein (2013) Coding the Matrix: Linear Algebra through Applications to Computer Science, Newtonian Press.

The following system of equations:

$Ax = b$

as matrices:

**scalar** is an element in a vector, containing a real number **value**. In a vector space model or a vector mapping of (symbolic, qualitative, or quantitative) properties the scalar holds the concrete value or property of a variable.

**vector** is an array, tuple, or ordered list of scalars (or elements) of size $n$, with $n$ a positive integer. The **length** of the vector, that is the number of scalars in the vector, is also called the **order** of the vector.

**Vectorization** is the process of creating a vector from some data using some process.

**matrix** is a list of vectors that all are of the same length. $A$ is a matrix with $m$ rows and $n$ columns, antries of $A$ are real numbers:

$A \in \mathbb{R}^{m \times n}$

**column vector**.

$x = \begin{bmatrix} x_1 \\[0.3em] x_2 \\[0.3em] \vdots \\[0.3em] x_n \end{bmatrix}$

**row vector**, that is a matrix with $1$ row and $n$ columns, we write $x^T$ (this denotes the transpose of $x$, see above).

$x^T = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}$

We denote the $j$th column of $A$ by $a_j$ or $A_{:,j}$:

We denote the $i$th row of $A$ by $a_i^T$ or $A_{i,:}$:

A $n \times m$ matrix is a two-dimensional array with $n$ rows and $m$ columns.

$C = AB \in \mathbb{R}^{m \times n}$

That is, we are multiplying the columns of $A$ with the rows of $B$:

$C_{ij}=\sum_{k=1}^n{A_{ij}B_{kj}}$

The number of columns in $A$ must be equal to the number of rows in $B$.

For two vectors $x, y \in \mathbb{R}^n$, the **inner product** or **dot product** $x^T y$ is a real number:

The **inner products** are a special case of matrix multiplication.

It is always the case that $x^T y = y^T x$.

```
In [ ]:
```x = (1, 2, 3, 4)
y = (5, 6, 7, 8)
n = len(x)
if n == len(y):
result = 0
for i in range(n):
result += x[i] * y[i]
print(result)

`result += y[i] * x[i]`

without affecting the result.

*numpy* module to apply the same operation, to calculate the **inner product**. We import the *numpy* module and assign it a name *np* for the following code:

```
In [ ]:
```import numpy as np

We define the vectors $x$ and $y$ using *numpy*:

```
In [ ]:
```x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
print("x:", x)
print("y:", y)

We can now calculate the $dot$ or $inner product$ using the *dot* function of *numpy*:

```
In [ ]:
```np.dot(x, y)

The order of the arguments is irrelevant:

```
In [ ]:
```np.dot(y, x)

**row vectors** in the above code. We can transpose them to column vectors by using the *shape* property:

```
In [ ]:
```print("x:", x)
x.shape = (4, 1)
print("xT:", x)
print("y:", y)
y.shape = (4, 1)
print("yT:", y)

**row vectors**. *Numpy* treates them differently.

*numpy* by using the *T* method on vector or matrix objects:

```
In [ ]:
```x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
print("x:", x)
print("y:", y)
print("xT:", x.T)
print("yT:", y.T)

```
In [ ]:
```x = np.array([[1, 2, 3, 4]])
y = np.array([[5, 6, 7, 8]])
print("x:", x)
print("y:", y)
print("xT:", x.T)
print("yT:", y.T)

*numpy* functions *dot* and *outer* are not affected by this distinction. We can compute the dot product using the mathematical equation above in *numpy* using the new $x$ and $y$ row vectors:

```
In [ ]:
```print("x:", x)
print("y:", y.T)
np.dot(x, y.T)

Or by reverting to:

```
In [ ]:
```print("x:", x.T)
print("y:", y)
np.dot(y, x.T)

To read the result from this array of arrays, we would need to access the value this way:

```
In [ ]:
```np.dot(y, x.T)[0][0]

**outer product** of $x$ and $y$ is:

$xy^T \in \mathbb{R}^{m\times n}$

The **outer product** results in a matrix with $m$ rows and $n$ columns by $(xy^T)_{ij} = x_i y_j$:

*reshape* function in *numpy*, the *T* method, or the *transpose()* function. The *reshape* function takes a parameter that describes the number of colums and rows for the resulting transposing:

```
In [ ]:
```x = np.array([[1, 2, 3, 4]])
print("x:", x)
print("xT:", np.reshape(x, (4, 1)))
print("xT:", x.T)
print("xT:", x.transpose())

We can now compute the **outer product** by multiplying the column vector $x$ with the row vector $y$:

```
In [ ]:
```x = np.array([[1, 2, 3, 4]])
y = np.array([[5, 6, 7, 8]])
x.T * y

*Numpy* provides an *outer* function that does all that:

```
In [ ]:
```np.outer(x, y)

*outer* function:

```
In [ ]:
```x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
np.outer(x, y)

$A = \begin{bmatrix} 1 & 2 \\[0.3em] 3 & 4 \end{bmatrix}$

We can compute the product of $A$ with a scalar $n = 2$ as:

Using *numpy* this can be achieved by:

```
In [ ]:
```import numpy as np
A = np.array([[4, 5, 6],
[7, 8, 9]])
A * 2

Assume that we have a column vector $x$:

$x = \begin{bmatrix} 1 \\[0.3em] 2 \\[0.3em] 3 \end{bmatrix}$

$A = \begin{bmatrix} 4 & 5 & 6\\[0.3em] 7 & 8 & 9 \end{bmatrix}$

To compute $Ax$, we multiply row $1$ of the matrix with column $1$ of $x$:

We do the compute the dot product of row $2$ of $A$ and column $1$ of $x$:

The resulting column vector $Ax$ is:

$Ax = \begin{bmatrix} 32 \\[0.3em] 50 \end{bmatrix}$

Using *numpy* we can compute $Ax$:

```
In [ ]:
```A = np.array([[4, 5, 6],
[7, 8, 9]])
x = np.array([1, 2, 3])
A.dot(x)

We can thus describe the product writing $A$ by rows as:

If we write $A$ in column form, then:

In this case $y$ is a **linear combination** of the *columns* of $A$, the coefficients taken from $x$.

If we represent $A$ by columns and $B$ by rows, then $AB$ is the sum of the outer products:

**Matrix multiplication is associative:** $(AB)C = A(BC)$

**Matrix multiplication is distributive:** $A(B + C) = AB + AC$

**Matrix multiplication is, in general, not commutative;** It can be the case that $AB \neq BA$. (For example, if $A \in \mathbb{R}^{m\times n}$ and $B \in \mathbb{R}^{n\times q}$, the matrix product $BA$ does not even exist if $m$ and $q$ are not equal!)

**identity matrix** $I \in \mathbb{R}^{n\times n}$ is a square matrix with the value $1$ on the diagonal and $0$ everywhere else:

$I_{ij} = \left\{ \begin{array}{lr} 1 & i = j\\ 0 & i \neq j \end{array} \right. $

For all $A \in \mathbb{R}^{m\times n}$:

$AI = A = IA$

We can generate an *identity matrix* in *numpy* using:

```
In [ ]:
```import numpy as np
A = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
[9, 10, 11]])
print("A:", A)

We can ask for the shape of $A$:

```
In [ ]:
```A.shape

*shape* property of a matrix contains the $m$ (number of rows) and $n$ (number of columns) properties in a tuple, in that particular order. We can create an identity matrix for the use in $AI$ by using the $n$ value:

```
In [ ]:
```np.identity(A.shape[1], dtype="int")

*dtype* parameter to *identity* as *int*, since the default would return a matrix of *float* values.

To generate an identity matrix for the use in $IA$ we would use the $m$ value:

```
In [ ]:
```np.identity(A.shape[0], dtype="int")

We can compute the dot product of $A$ and its identity matrix $I$:

```
In [ ]:
```n = A.shape[1]
I = np.array(np.identity(n, dtype="int"))
np.dot(A, I)

The same is true for the other direction:

```
In [ ]:
```m = A.shape[0]
I = np.array(np.identity(m, dtype="int"))
np.dot(I, A)

**diagonal matrix** non-diagonal elements are $0$, that is $D = diag(d_1, d_2, \dots{}, d_n)$, with:

$D_{ij} = \left\{ \begin{array}{lr} d_i & i = j\\ 0 & i \neq j \end{array} \right. $

The identity matrix is a special case of a diagonal matrix: $I = diag(1, 1, \dots{}, 1)$.

In *numpy* we can create a *diagonal matrix* from any given matrix using the *diag* function:

```
In [ ]:
```import numpy as np
A = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11],
[12, 13, 14, 15]])
np.diag(A)

*k* to the *diag* function allows us to extract the diagonal above the main diagonal with a positive *k*, and below the main diagonal with a negative *k*:

```
In [ ]:
```np.diag(A, k=1)

```
In [ ]:
```np.diag(A, k=-1)

**Transposing** a matrix is achieved by *flipping* the rows and columns. For a matrix $A \in \mathbb{R}^{m\times n}$ the transpose $A^T \in \mathbb{R}^{n\times m}$ is the $n\times m$ matrix given by:

$(A^T)_{ij} = A_{ji}$

Properties of transposes:

- $(A^T)^T = A$
- $(AB)^T = B^T A^T$
- $(A+B)^T = A^T + B^T$

Square metrices $A \in \mathbb{R}^{n\times n}$ are **symmetric**, if $A = A^T$.

$A$ is **anti-symmetric**, if $A = -A^T$.

For any matrix $A \in \mathbb{R}^{n\times n}$, the matrix $A + A^T$ is **symmetric**.

For any matrix $A \in \mathbb{R}^{n\times n}$, the matrix $A - A^T$ is **anti-symmetric**.

$A = \frac{1}{2} (A + A^T) + \frac{1}{2} (A - A^T)$

$\mathbb{S}^n$ is the set of all symmetric matrices of size $n$.

$A \in \mathbb{S}^n$ means that $A$ is symmetric and of the size $n\times n$.

**trace** of a square matrix $A \in \mathbb{R}^{n\times n}$ is $tr(A)$ (or $trA$) is the sum of the diagonal elements in the matrix:

$trA = \sum_{i=1}^n A_{ii}$

Properties of the **trace**:

- For $A \in \mathbb{R}^{n\times n}$, $\mathrm{tr}A = \mathrm{tr}A^T$
- For $A,B \in \mathbb{R}^{n\times n}$, $\mathrm{tr}(A + B) = \mathrm{tr}A + \mathrm{tr}B$
- For $A \in \mathbb{R}^{n\times n}$, $t \in \mathbb{R}$, $\mathrm{tr}(tA) = t \mathrm{tr}A$
- For $A,B$ such that $AB$ is square, $\mathrm{tr}AB = \mathrm{tr}BA$
- For $A,B,C$ such that $ABC$ is square, $\mathrm{tr}ABC = \mathrm{tr}BCA = \mathrm{tr}CAB$, and so on for the product of more matrices.

See for proofs the paper!

The **norm** of a vector $x$ is $\| x\|$, informally the length of a vector.

Example: the Euclidean or $\mathscr{l}_2$ norm:

$\|x\|_2 = \sqrt{\sum_{i=1}^n{x_i^2}}$

Note: $\|x\|_2^2 = x^T x$

**norm** is any function $f : \mathbb{R}^n \rightarrow \mathbb{R}$ that satisfies the following properties:

- For all $x \in \mathbb{R}^n$, $f(x) \geq 0$ (non-negativity)
- $f(x) = 0$ if and only if $x = 0$ (definiteness)
- For all $x \in \mathbb{R}^n$, $t \in \mathbb{R}$, $f(tx) = |t|\ f(x)$ (homogeneity)
- For all $x, y \in \mathbb{R}^n$, $f(x + y) \leq f(x) + f(y)$ (triangle inequality)

Norm $\mathscr{l}_1$:

$\|x\|_1 = \sum_{i=1}^n{|x_i|}$

Norm $\mathscr{l}_\infty$:

$\|x\|_\infty = \max_i|x_i|$

$\|x\|_p = \left(\sum_{i=1}^n{|x_i|^p}\right)^{\frac{1}{p}}$

*Frobenius norm* for matrices:

$\|A\|_F = \sqrt{\sum_{i=1}^m\sum_{i=1}^n A_{ij}^2} = \sqrt{\mathrm{tr}(A^T A)}$

And many more.

**(linearly) independent** if no vector can be represented as a linear combination of the remaining vectors.

**(lineraly) dependent** if one vector from this set can be represented as a linear combination of the remaining vectors.

$\begin{equation} x_n = \sum_{i=1}^{n-1}{\alpha_i x_i} \end{equation}$

Example: The following vectors are lineraly dependent, because $x_3 = -2 x_1 + x_2$

**column rank** of a matrix $A \in \mathbb{R}^{m\times n}$ is the size of the largest subset of columns of $A$ that constitute a linear independent set. Informaly this is the number of linearly independent columns of $A$.

**row rank** of a matrix $A \in \mathbb{R}^{m\times n}$ is the largest number of rows of $A$ that constitute a lineraly independent set.

- For $A \in \mathbb{R}^{m\times n}$, $rank(A) \leq \min(m, n)$. If $rank(A) = \min(m, n)$, then $A$ is said to be
**full rank**. - For $A \in \mathbb{R}^{m\times n}$, $rank(A) = rank(A^T)$
- For $A \in \mathbb{R}^{m\times n}$, $B \in \mathbb{R}^{n\times p}$, $rank(AB) \leq \min(rank(A), rank(B))$
- For $A,B \in \mathbb{R}^{m\times n}$, $rank(A + B) \leq rank(A) + rank(B)$

The same is applies to subtraction:

In Python using *numpy* this can be achieved using the following code:

```
In [ ]:
```import numpy as np
print("np.arange(9):", np.arange(9))
print("np.arange(9, 18):", np.arange(9, 18))
A = np.arange(9, 18).reshape((3, 3))
B = np.arange(9).reshape((3, 3))
print("A:", A)
print("B:", B)

*numpy* function *arange* is similar to the standard Python function *range*. It returns an array with $n$ elements, specified in the one parameter version only. If we provide to parameters to *arange*, it generates an array starting from the value of the first parameter and ending with a value one less than the second parameter. The function *reshape* returns us a matrix with the corresponding number of rows and columns.

We can now add and subtract the two matrices $A$ and $B$:

```
In [ ]:
```A + B

```
In [ ]:
```A - B

The **inverse** of a square matrix $A \in \mathbb{R}^{n\times n}$ is $A^{-1}$:

$A^{-1} A = I = A A^{-1}$

$A$ is **invertible** or **non-singular** if $A^{-1}$ exists.

$A$ is **non-invertible** or **singular** if $A^{-1}$ does not exist.

Note: **non-singular** means the opposite of **non-invertible**!

For $A$ to have an inverse $A^{-1}$, $A$ must be **full rank**.

Assuming that $A,B \in \mathbb{R}^{n\times n}$ are non-singular, then:

- $(A^{-1})^{-1} = A$
- $(AB)^{-1} = B^{-1} A^{-1}$
- $(A^{-1})^T = (A^T)^{-1}$ (often simply $A^{-T}$)

Two vectors $x, y \in \mathbb{R}^n$ are **orthogonal** if $x^T y = 0$.

A vector $x \in \mathbb{R}^n$ is **normalized** if $\|x\|^2 = 1$.

**orthogonal** if all its columns are orthogonal to each other and are **normalized**. The columns are then referred to as being **orthonormal**.

It follows immediately from the definition of orthogonality and normality that:

$U^T U = I = U U^T$

This means that the inverse of an orthogonal matrix is its transpose.

We generally only use the term orthogonal to describe the case, where $U$ is square.

$\|U_x\|^2 = \|x\|^2$

**span** of a set of vectors $\{ x_1, x_2, \dots{}, x_n\}$ is the set of all vectors that can be expressed as
a linear combination of $\{ x_1, \dots{}, x_n \}$:

**range** (sometimes also called the columnspace) of a matrix $A \in \mathbb{R}^{m\times n}$, denoted $\mathcal{R}(A)$, is the the span of the columns of $A$. In other words,

$\mathcal{R}(A) = \{ v \in \mathbb{R}^m : v = A x, x \in \mathbb{R}^n\}$

$\mathrm{Proj}(y; A) = \mathrm{argmin}_{v\in \mathcal{R}(A)}\|v − y\|^2 = A(A^T A)^{−1} A^T y$

See for more details in the notes page 13.

**nullspace** of a matrix $A \in \mathbb{R}^{m\times n}$, denoted $\mathcal{N}(A)$ is the set of all vectors that equal $0$ when multiplied by $A$, i.e.,

$\mathcal{N}(A) = \{ x \in \mathbb{R}^n : A x = 0 \}$

**orthogonal complements**, and we denote this $\mathcal{R}(A^T) = \mathcal{N}(A)^\perp$.

Given

*volume* of the set $S$. The volume here is intuitively for example for $n = 2$ the area of $S$ in the Cartesian plane, or with $n = 3$ it is the common understanding of *volume* for 3-dimensional objects.

Example:

$A = \begin{bmatrix} 1 & 3\\[0.3em] 3 & 2 \end{bmatrix}$

The rows of the matrix are:

The set S corresponding to these rows is shown in:

- The determinant of the identity is $1$, $|I| = 1$. (Geometrically, the volume of a unit hypercube is $1$).
- Given a matrix $A \in \mathbb{R}^{n\times n}$, if we multiply a single row in $A$ by a scalar $t \in \mathbb{R}$, then the determinant of the new matrix is $t|A|$,

$\left| \begin{bmatrix} -- & t a_1^T & -- \\[0.3em] -- & a_2^T & -- \\[0.3em] & \vdots & \\[0.3em] -- & a_m^T & -- \end{bmatrix}\right| = t|A|$

(Geometrically, multiplying one of the sides of the set $S$ by a factor $t$ causes the volume to increase by a factor $t$.) - If we exchange any two rows $a^T_i$ and $a^T_j$ of $A$, then the determinant of the new matrix is $−|A|$, for example

$\left| \begin{bmatrix} -- & a_2^T & -- \\[0.3em] -- & a_1^T & -- \\[0.3em] & \vdots & \\[0.3em] -- & a_m^T & -- \end{bmatrix}\right| = -|A|$

Several properties that follow from the three properties above include:

- For $A \in \mathbb{R}^{n\times n}$, $|A| = |A^T|$
- For $A,B \in \mathbb{R}^{n\times n}$, $|AB| = |A||B|$
- For $A \in \mathbb{R}^{n\times n}$, $|A| = 0$ if and only if $A$ is singular (i.e., non-invertible). (If $A$ is singular then it does not have full rank, and hence its columns are linearly dependent. In this case, the set $S$ corresponds to a "flat sheet" within the $n$-dimensional space and hence has zero volume.)
- For $A \in \mathbb{R}^{n\times n}$ and $A$ non-singular, $|A−1| = 1/|A|$

Continue on page 16.

**tensor** could be thought of as an organized multidimensional array of numerical values. A vector could be assumed to be a sub-class of a tensor. Rows of tensors extend alone the y-axis, columns along the x-axis. The **rank** of a scalar is 0, the rank of a **vector** is 1, the rank of a **matrix** is 2, the rank of a **tensor** is 3 or higher.

**hyperplane** is a sub-space in the ambient space with one dimension less. In a two-dimensional space the hyperplane is a line, in a three-dimensional space it is a two-dimensional plane, etc.

*inner product*. It is a function that returns a number computed from two vectors of the same length by summing up the product of the corresponding dimensions.

*cosine similarity*, which can be used as a metric for cimilarity of vectors. Independent of the absolute length we look at the angle between the vectors, i.e. the lenght is neutralized via normalization.

$\mathbf{a} \cdot \mathbf{b} = \left\|\mathbf{a}\right\| \left\|\mathbf{b}\right\| \cos\theta$

**entrywise product**. For two matrices $A \in \mathbb{R}^{m\times n}$ and $B \in \mathbb{R}^{m\times n}$ the Hadamard product $A\circ B$ is:

$(A\circ B)_{i,j} = (A)_{i,j} (B)_{i,j}$

For example:

**tensor product** of two vectors. Compute the resulting matrix by multiplying each element from a column vector with all alements in a row vector.

...

**(C) 2018-2019 by Damir Cavar**