Linear algebra part 8 - Matrices

If a vector represents multiple dimensions, a matrix is set a many vectors. let us start by defining three vectors, $\mathbf{u}$ , $\mathbf{v}$ and $\mathbf{w}$ :

\mathbf{u} = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \quad \mathbf{v} = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}, \quad \mathbf{w} = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}

We can represent these three vectors as a single matrix, $\mathbf{A}$ , where each vector becomes a column of the matrix:

\mathbf{A} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ -1 & 1 & 0 \end{bmatrix}

Matrix notation

Matricies are denoted using an upper case bold letter, $\mathbf{A}$ . Matrices are two dimensional objects. They have a height and a width. The first matrix we saw in this post, $\mathbf{A}$ , is a 3 x 3 matrix. It is three rows high and three columns wide. Matricies can have any width and height and do not need to be square. For example, $\mathbf{B}$ is a 2 x 4 matrix and $\mathbf{C}$ is a 5 x 3 matrix. Saying a matrix is 2 x 4 means it has a height of 2 and width of 4.

\mathbf{B} = \begin{bmatrix} 1 & 0 & 2 & 7 \\ 0 & 0 & 14 & 3 \end{bmatrix}, \quad \mathbf{C} = \begin{bmatrix} 1 & 0 \\ 0 & 0 \\ 13 & 5 \\ 9 & 4 \\ 45 & 9 \end{bmatrix}

The position of an component in a vector is described by two subscript numbers. The component in the second row and third column of matrix $\mathbf{A}$ would be denoted as $a_{2,3}$ . The number of rows in a matrix is typically called $m$ and the number of is called $n$ .

\mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} & a_{1,3} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & a_{2,3} & \cdots & a_{2,n} \\ a_{3,1} & a_{3,2} & a_{3,3} & \cdots & a_{3,n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & a_{m,3} & \cdots & a_{m,n} \end{bmatrix}

Mathematics normally uses 1-based indexing, meaning the first row is row 1, the second row is row 2, etc. Many programming languages use 0-based indexing, where the first row is row 0, the second row is row 1, etc. In this blog I will use 1-based indexing because I believe it is more intuative for people meeting linear algebra for the first time. Be aware that you are likely to encounter 0-based indexing if you use linear alegbra in programming.

Multiplying a matrix and a vector

You have already learnt how to multiple a matrix and vector without realising when you learnt about linear combinations of vectors. In the introduction to this section we built up matrix $\mathbf{A}$ from vectors $\mathbf{u}$ , $\mathbf{v}$ and $\mathbf{w}$ . Let us say we want to multiple matrix $\mathbf{A}$ by vector $\mathbf{x}$ , where $\mathbf{x}$ has a length of three.

\mathbf{Ax} = \begin{bmatrix} & & \\ \mathbf{u} & \mathbf{v} & \mathbf{w} \\ & & \end{bmatrix} \begin{bmatrix} c \\ d \\ e \end{bmatrix} = c \mathbf{u} + d \mathbf{v} + e \mathbf{w}

You can think about $\mathbf{A}$ as a matrix that acts upon the vector $\mathbf{x}$ to create an output matrix, $\mathbf{b}$ . The matrix $\mathbf{A}$ can be known as a difference matrix because it is the difference between $\mathbf{x}$ and $\mathbf{b}$ .

\mathbf{Ax} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ -1 & 1 & 0 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \left[ \begin{array}{ccc} x_1 & & \\ x_3 & & \\ -x_1 & + & x_2 \\ \end{array} \right] = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} = \mathbf{b}

You can see the way $\mathbf{b}$ was calculated was by multiplying $x_1$ by each component in the first colum of $\mathbf{A}$ , then adding $x_2$ multiplied by the second column of $\mathbf{A}$ , the adding $x_3$ multiplied by the third column of $\mathbf{A}$ .

Multiplication by row

The method of multiplication we have just learnt is multiplication by column. You can also multply a matrix and a vector by row using the dot product. I would generally recommend using the by column method as it is simplier to understand.

To multiply matrix $\mathbf{A}$ by vector $\mathbf{x}$ using the by row method we take each row of matrix $\mathbf{A}$ in turn and calculate its dot product with vector $\mathbf{x}$ .

\mathbf{Ax} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ -1 & 1 & 0 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \left[ \begin{array}{ccc} (1, 0, -1) & \cdot & (x_1, x_2, x_3) \\ (0, 0, 1) & \cdot & (x_1, x_2, x_3) \\ (0, 1, 0) & \cdot & (x_1, x_2, x_3) \end{array} \right]

If you keep going you will see that it multiplies out to the same result as the by columns method. If you need a refresher on calculating the dot product of vectors revisit this post.

Practical uses for matrix multiplication

The idea that you can transform one vector into another vector by multiplying it by a matrix is central to large language models. The matrix in this context is the weights learned by the model during training. The input vector is the prompt you give the model. The output vector is the response the model gives you back.

Problems

If you can solve this problem you have understood how to multiply matricies and vectors.

Calculate vector $\mathbf{b}$ , where $\mathbf{b}$ is the product of multplying vector $\mathbf{x}$ by matrix $\mathbf{A}$ .

\mathbf{A} = \begin{bmatrix} 7 & 0 & 0 \\ 1 & 5 & -3 \\ 0 & 19 & 0 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} 8 \\ 1 \\ 4 \end{bmatrix}

Solutions

Using the columns method, we multiply the first element in $\mathbf{x}$ by the first column of $\mathbf{A}$ and so on:

\mathbf{Ax} = \begin{bmatrix} 7 & 0 & 0 \\ 1 & 5 & -3 \\ 0 & 19 & 0 \end{bmatrix} \begin{bmatrix} 8 \\ 1 \\ 4 \end{bmatrix} = \left[ \begin{array}{ccccc} 56 & + & 0 & + & 0 \\ 8 & + & 5 & + & -12 \\ 0 & + & 19 & + & 0 \end{array} \right] = \begin{bmatrix} 56 \\ 1 \\ 19 \end{bmatrix} = \mathbf{b}

Linear algebra part 8 - Matrices

Matrix notation

Multiplying a matrix and a vector

Multiplication by row

Practical uses for matrix multiplication

Problems

Solutions

Read Next

Linear algebra part 7 - Angle between vectors

Subscribe to 15 Minute Finance

Buy me a coffee