Matrices and vectors play a significant part in mathematics since they are used in almost all branches of mathematics. Therefore, it is no surprise that they also pop up in machine learning. Due to their importance, vectors are added to the course program of Wiskunde B (for example, see chapter 10 of Getal & Ruimte VWO B Deel 3, 11th edition, 2015)[1]. Unfortunately, matrices are only included in the wiskunde D program, though they are also essential. In this tutorial, you will learn the basics of matrices, and we will extend your knowledge of vectors. Afterward, in the following tutorial, you will see how these concepts relate to machine learning and how they can be applied in the Flappy Bird game. Furthermore, some exercises are provided, which we encourage you to make since they can be used to grasp the concepts of matrices.
In a sense, a matrix is a means to store information efficiently. A matrix is a rectangular block of numbers. This block of numbers consists of a number of rows and columns. Typically, we denote a matrix with a capital letter. An example of such a matrix would be:
As we can easily see, this matrix has two rows and three columns. This might seem a bit vague and out of the blue, but in fact, you have already (more or less) used this idea in mathematics when solving a system of equations. We will elaborate on this more later on. First, we are going to discuss the basic operations with matrices.
Typically, we denote entries of a matrix with . In this notation, i represents the row of the entry, j denotes the column of the entry, and denotes the value of this entry. For example, represents the entry in row 1, column 2, which has in our example the value 1. In general, if has rows and columns, we say that is an matrix.
Since a matrix is just a block of numbers, it makes sense that we can add matrices. However, there is one slight catch: the matrices need to be identical in size. Note, we cannot add matrices when they differ in size. The addition of two matrices is calculated entrywise, as shown in the following example:
However, do note that we cannot compute the following addition in the next example:
We cannot compute this sum since we have a matrix and a matrix. Of course, subtraction works similarly to addition with the same size restriction of the matrices. Moreover, you can quickly check and see that for random matrices with equal size, we have that , similar to addition with regular numbers!
Another matrix operation is so-called scalar multiplication. In other words, we multiply the entire matrix with a scalar (a real-valued number). Like addition/subtraction, this is done entrywise, meaning that every entry has to be multiplied by the scalar. For example:
This is simply multiplying every entry in the original matrix with the scalar in front (3 in this example).
Lastly, we will discuss matrix multiplication, where we multiply a matrix with a matrix , yielding A\cdotB. Similar to addition/subtraction, matrix multiplication also has a size restriction. For matrix multiplication, it is required that the number of the columns in the left matrix equals the number of rows in the right matrix. Otherwise, the product is not defined. This is necessary because, in matrix multiplication, we multiply the row of the left matrix with the column of the right matrix. Do note that this ‘multiplication’ is simply the inner product for vectors.
Doing this for all combinations of rows and columns, we get a new matrix that is the product of two matrices. Now one can wonder: what does this new matrix look like? For the multiplication of a matrix and an matrix ; we obtain an matrix . This comes from the following rule: Multiply row from the left matrix with column from the right matrix to get the entry of the new matrix. This rule might sound vague, but an example will demonstrate this theory. Therefore, consider the following matrix multiplication between the left matrix and right matrix :
Let us call the matrix resulting from this multiplication . Then for example, would be taking the inner product of row 1 of and column 1 of , resulting in . For calculating we need to multiply the second row of with the second column of , resulting in: . Try to compute the other values of for yourself! The matrix should be a matrix with the following values:
We want to make an additional remark that matrices do not comply with the so-called commutative property. When calculating with (real) numbers we can swap values in a multiplication without changing the answer, for example and so on. Unfortunately, matrix multiplication does not possess this property. This has to do with the restriction of matrix multiplication we posed earlier: The number of the columns in the left matrix should equal the number of rows in the right matrix. As an exercise, check the previous example; if we calculate , we do not get matrix . In fact, we get nothing as the product is not defined! This example illustrates that many calculation rules are similar for real numbers and matrices, but some of them are not, so you should be cautious when working with matrices.
Now that you have seen the basic rules of calculating with matrices, try to do the exercises below. If a specific exercise is not possible, explain why it is not possible!
We subtract the entries of the left matrix from the corresponding entries of the right matrix. This yields the following matrix:
We cannot add these matrices. The reason for this is that the matrices do not have the same size. The left matrix is a matrix, whereas the right matrix is a matrix. Therefore, we cannot add them.
We multiply the matrix with a real value, meaning that we multiply every entry with 5. This yields the following matrix:
Let
and
Calculate (or argue why) , or argue why it is not possible.
In this particular case, we cannot multiply the two matrices. For matrix multiplication, it is paramount that we have that the number of columns of the left matrix equals the number of rows of the right matrix. If this is not the case, we cannot compute the product as it is in this exercise.
Let
and
Calculate , hence:
In this case, we are multiplying two matrices, which should yield a matrix. We will provide only the calculations of the first column since the rest is calculated similarly. Let denote the entries of matrix . Then the matrix multiplication yields:
We multiplied the first, second, and third row of with the first column of to get the first column of . The rest can be calculated similarly, yielding:
The last matrix operation we are going to discuss is the transpose of a matrix. This operation is used to change the size of the matrix, while not changing the values. This could, for example, be useful when we want to multiply two matrices which do not have the same size yet. The procedure is actually quite simple. All rows of the original matrix, become columns in the transposed matrix. Furthermore, we take the original matrix to the power to indicate we want to calculate its transpose. Below, we provide some examples:
As you can see, the procedure is quite trivial. The rows become the columns, and it can change the size of the matrix. In a later tutorial, you will see the practical importance of the transpose as some programming languages give an output matrix that is the transpose of a matrix, or we need to take the transpose of that matrix.
Now that you know how to calculate with matrices, it would be nice to see how they can be applied. As a matter of fact, you have already worked implicitly with matrices at high school. Matrices are often used to solve systems of equations. In high school, presumably, you have already learned a procedure to solve such equations. For example, consider the following (simple) system of equations:
Typically, the system of equations would be solved by, for example, adding equation 1 four times from equation 2, resulting in:
However, we can also use matrices to solve such a system! The linear system above is chosen with a purpose since it represents the matrix we defined at the beginning of this tutorial:
As mentioned before, a matrix is used to store information efficiently. As such, a system of equations can be transformed into a matrix. The coefficients in front of the correspond with the first column of the matrix, the coefficients in front of the correspond to the second column of the matrix, and lastly, the values to the right of the sign correspond (always) to the last column. Note that in this example, we have an and , but in the general case, the first column consists of the coefficients of the first letter in the alphabet and follows the alphabetical order for the other letters.
We have seen how one can solve such an equation without matrices, but we will solve this linear system of equations with a procedure called Gaussian Elimination. In Gaussian Elimination, three different actions are allowed. Try to argue why they are allowed (hint: think of a row as the first linear equality and what you are allowed to do with it).
In Gaussian Elimination, the following actions are allowed:
is entirely equivalent to solving
Hence in matrix notation, swapping rows is allowed!
Multiplying a row with a non-zero value
For example, take the equality . Clearly, this equality is the same as since we multiplied left and right with 2. This can be done for every non-zero value and does not change the system. In fact, the method you most probably used to solve the system thrives on this by adding row 1 four times to row 2. This immediately is also the argument of why the last action is allowed:
Adding/subtracting a multiple of a row to another row
The goal of applying these actions is that we want a matrix with as many zeroes as possible. Why? If we get such a matrix, we see that most coefficients of the system are equal to zero, from which we can quickly determine the solution. Let us clear this up with an example.
Since the matrix (see above) represents the system of equations, we can also add the entire row 1 four times to row 2, which is equivalent to the operation we just discussed. Doing this operation means that we do not change the first row of the matrix but only alter the second row. This results in:
Therefore, from the last row, we can conclude that the coefficient in front of the x-coordinate equals 6, whereas the coefficient of the y-coordinate equals 0. This results in the same equation we have seen before:
However, we can continue this process by dividing the second row of the new matrix by 6 (dividing is reverse multiplication). This yields:
Now we can perform the last step; subtracting row 2 from row 1, which yields:
From which we can easily read the solution . In Gaussian Elimination, this is always the goal in our context [2]: make sure that in all columns (except the last), there is one 1, and the rest of the entries should equal 0. This is a crucial concept of Gaussian Elimination. Then, you can easily see what the solution to the system is since, in a row, we only have one non-zero entry and a value in the last column, from which we can immediately determine the solution of the variable. Typically, we want to have the 1-entry on the diagonal. Hence, for a matrix , we want to have that and the rest 0, apart from the last column after Gaussian Elimination.
You might think: why do all this fuss over constructing matrices instead of straightforward solving the system. That might be true in this case, but what happens if we need to solve the following system:
This system is much more complicated since it involves three variables. In this case, the procedure without using matrices is much more challenging. And what to think of a system with more than three variables? That would be a nightmare to solve with the original procedure. Matrices are better to use in this case. For example, we can transfer the system above into a matrix. First, try this yourself!
Write
as a matrix.
The coefficients of the x-coordinates represent the first column, y represents the second column, z represents the third column and the value integer represents the last column. This yields the matrix:
Of course, the following step would be to solve the system with Gaussian Elimination, which should result in the values of and for which the system is solved. Try to solve this system!
Solve the following system with Gaussian Elimination:
The first step is to get all zeroes in the first column, apart from the first entry. The second row is already fine, but the third row has a value of 3 in the first column which needs to disappear. Therefore, subtract row 1 thrice from row 3. This yields:
Now we can move on to the next step. We want that entry $ and the rest in column 2 equals 0. Therefore, we subtract row 2 twice from row 1:
Furthermore, we need to add row 2 5 times to row 3:
Lastly, we need to have that and the rest in column 3 should be 0. Therefore, divide row 3 by 2:
Now we can add row 3 to row 1:
And lastly, we subtract row 3 twice from row 2:
And we are finished! From the Gaussian Elimination process, we find that the solution to the system equals . You can verify that this is the right solution by substituting it into all equations of the system.
The main advantage of Gaussian Elimination is that the procedure is relatively compact because we do not need to write the variables anymore. Furthermore, matrices are less clumsy to work with than the method you previously knew when dealing with larger linear systems. Though the procedure might sometimes be cumbersome to write, computers can nowadays solve gigantic systems of equations with many variables.
In this tutorial, you have learned the basics of matrix calculation, including the famous Gaussian Elimination. Matrices can be used to store massive amounts of information and pop up in all fields of science. An example of this is the following tutorial. In that tutorial, you will use matrices to determine the weights of the neural network. The weights indicate how much influence the particular data information has on the output prediction. Therefore, they are vital for our AI/neural network. See you in the next tutorial!
These are mathematics courses and books often used in Dutch high schools. ↩︎
This is only in our context. Gaussian Elimination does not always work this well. Sometimes, the system of equations does not have a solution. Therefore, we do not get one 1 and the rest 0 in a column. However, we will refrain from these exceptions since this tutorial is meant to give you the basics of matrices. If you are interested in more properties of matrices, including when Gaussian Elimination is possible, we would like to refer to https://www.pbte.edu.pk/text books/dae/math_113/Chapter_09.pdf, which explains a lot of extra properties. ↩︎