Vector Spaces
Vector spaces are the fundamental building blocks of linear algebra. They are collections of objects that can be added together and scaled, meaning made larger or smaller, by multiplication. These objects can represent many different things. Every vector space is defined over a field, which serves as its mathematical backbone. The field provides the rules for arithmetic operations like addition, subtraction, multiplication, and division, ensuring that they behave in a consistent and predictable way. In machine learning, the most important vector space is \(\mathbb{R}^n\). We will start with the fundamentals and build up from there.
Arithmetic
In everyday arithmetic, we learn four operations: addition, subtraction, multiplication, and division. However, subtraction and division are not actually new operations. Subtraction is just addition with an additive inverse. Example:
\(2-3 = 2 + (-3)\)
The additive inverse of \(3\) is \(-3\), because \(3 + (-3) = 0\). The number \(0\) is known as the neutral element or identity element because adding it to any number leaves the result unchanged. Division is multiplication with a multiplicative inverse. Example:
\(5 \div 3 = 5 \cdot \frac{1}{3} \)
The multiplicative inverse of \(3\) is \(\frac{1}{3}\), because \(3 \cdot \frac{1}{3} = 1\). Here, the number \(1\) is the neutral element with respect to multiplication, since multiplying by \(1\) leaves any number unchanged. This viewpoint is helpful because it lets us focus on addition and multiplication as the core operations.
Groups
Now we want to capture how objects interact given some operation we define. These objects could be anything, but for simplicity, think of them as numbers for now. We place these objects into something called a set. A set is simply a collection of objects or elements, written using curly braces. For example:
\(\{1,2,3\}\)
is the set containing the elements \(1\), \(2\), and \(3\). Two key properties of sets:
- No duplicates:
\(\hspace{1cm}\)Writing an element more than once does not change the set.
\(\hspace{1cm}\)\(\{2,2\}\) is the same set as \(\{2\}\). - Order does not matter:
\(\hspace{1cm}\)\(\{2,3\}\) and \(\{3,2\}\) represent the same set.
There is also a special set that contains no elements at all, called the empty set, written as:
\(\{\}\) or \(\emptyset\)
With a set, we have the objects we want to work with. The next step is to describe how these objects interact with each other. This is done by defining an operation on the set. An operation is simply a rule that takes two elements from the set and assigns another element to them. The operation can be defined in any way we choose. For example, we could invent a special kind of addition where:
\(1 \oplus 0 = 1\) and \(1 \oplus 1 = 0\)
Or we could define something entirely different if we wanted. To keep it simple, just think about the classical addition or multiplication for now. We usually write addition with the symbol \(+\), and multiplication with the symbol \(\cdot\). When speaking about a general operation (without specifying which one), we will use the symbol \(\circ\). Now we are ready to define what a group is:
A group \((G, \circ)\) is a set \(G\) together with one operation \(\circ\) that follows certain rules:
Closure
\(\hspace{1cm}\)For all \(a,b \in G\), then \(a \circ b \in G\).
Associativity
\(\hspace{1cm} (a \circ b) \circ c = a \circ (b \circ c)\) for all \(a,b,c \in G\).
Inverse
\(\hspace{1cm}\)For every \(a \in G\), there exists a unique \(-a \in G\) such that \(a + (-a) = 0\).
Identity
\(\hspace{1cm} a \circ 0 = a\) for all \(a \in G\).
- Closure means that when an operation is applied to elements of a set, the result is still an element of that same set. It is closed within the set and cannot escape it. For example, the set of natural numbers \(\mathbb{N} = \{1,2,3, \dots\}\) is closed under addition, since adding any two natural numbers always produces another natural number. In contrast, the set \(\{1,2,3\}\) is not closed under normal addition, because \(2 + 3 = 5\), and \(5\) is not part of the set.
- Associativity means that the way elements are grouped under an operation does not affect the result.
- The inverse property means that every element has a partner, or inverse, that cancels it out when the operation is applied. For instance, with addition on integers, \(3\) and \(-3\) are inverses because \(3 + (-3) = 0\).
- The identity element is one that doesn’t change other elements when the operation is applied. For example, adding \(0\) to a number.
To clarify the defintion look at the following example: Consider the group \((\mathbb{Z}, +)\), the integers under addition. You can check that all the group properties hold: the set of integers \(\mathbb{Z} = \{\dots, -2, -1, 0, 1, 2, \dots\}\) is closed under addition; grouping doesn’t matter (associativity); \(0\) serves as the identity element; and every integer \(a\) has an inverse \(-a\).
As a counterexample, the integers under multiplication, \((\mathbb{Z}, \cdot)\), do not form a group. While the set is closed and multiplication is associative, most integers do not have multiplicative inverses that are also integers, for example, the inverse of \(2\) would be \(\frac{1}{2}\), which is not an integer.
Why this definition of a group? To answer that, consider the following analogy: Think of a group like the basic framework of a car. To be considered a car, you need a few essential components: a chassis, wheels that roll, a steering mechanism, an engine, and brakes. With these in place, you know it will behave predictably, you can drive it, turn it, and stop safely. Similarly, we now know how a group will behave, since we know that the set and its associated operation follow the rules stated above.
Once you have this basic car (the group), you can build on it. You might add features like racing seats, subwoofer, or a massive spoiler, just as in mathematics, once we know we have a group, we can construct more complex structures like rings, fields, and vector spaces. With a group, we can predict how things behave, prove properties about them, and relate them to other “cars” (groups) that follow the same rules.
One simple upgrade to the group structure is commutativity:
\(a + b = b + a \) for all \(a,b \in G.\)
which means that the order in which we apply the operation does not matter. If this rule is satisfied along with the others, the resulting structure is called an abelian group, or simply a commutative group. Now, let’s upgrade our structure further by introducing another operation.
Rings
Consider a set \(R\) equipped with two operations: addition \((+)\) and multiplication \((\cdot)\). Together, they form a ring if the following conditions hold:
- \((R, +)\) forms an abelian group.
- Under multiplication, \(R\) is closed and associative, but not necessarily a group, only these two properties are required. (This structure is often referred to as a semigroup. Note that a semigroup does not require the existence of an identity element).
- The two operations are connected by the distributive law, which states that:
\(\hspace{1cm} a \cdot (b +c) = a \cdot b + a \cdot c\) and \((a + b) \cdot c = a \cdot c+ b \cdot c\) for all \(a,b,c \in R\).
It is important to note that rings do not require multiplication to be commutative, nor do they require multiplicative inverses for all elements.
(If a ring contains a multiplicative identity element (usually denoted \(1\)), then it is called a ring with unity or a unital ring, or a ring with identity)
In essence, a ring is a specific extension of the group structure that adds a second operation, multiplication, alongside addition. The constraints on multiplication are less strict than those on addition. If multiplication is also commutative (\(a \cdot b = b \cdot a\)) then the ring is a commutative ring. Still no multiplicative inverses needed. If every nonzero element has a multiplicative inverse, then the ring is a division ring. Multiplication does not need to be commutative in a division ring.
We’re getting closer to the definition of a vector space, we just need one more thing, one final upgrade.
Fields
If a ring is commutative and every nonzero element has a multiplicative inverse (meaning a commutative division ring), we call it a field. In other words, a set \(F\) with addition \((+)\) and multiplication \((\cdot)\) forms a field if:
- \((F, +)\) is an abelian group.
- \((F \setminus \{0\}, \cdot\)) is also an abelian group.
- The two operations are connected by the distributive law.
Below is the same definition, written without using group terminology, for those who want to see all the rules explicitly:
A set \(F\) is called a field if it contains at least two distinct elements, \(0\) and \(1\), with \(0 \neq 1\), and is equipped with two operations called addition \((+)\) and multiplication \((\cdot)\), such that:
Closure
\(\hspace{1cm}\)For all \(a,b \in F\), both \(a + b \in F\) and \(a \cdot b \in F\).
Commutativity
\(\hspace{1cm} a + b = b + a \) and \(a \cdot b = b \cdot a\) for all \(a,b \in F\).
Associativity
\(\hspace{1cm} (a + b) + c = a + (b + c)\) and \((a \cdot b) \cdot c = a \cdot (b \cdot c)\) for all \(a,b,c \in F\).
Identities
\(\hspace{1cm} a + 0 = a\) and \(a \cdot 1 = a\) for all \(a \in F\), where \(0\) and \(1\) are distinct.
Additive Inverse
\(\hspace{1cm}\)For every \(a \in F\), there exists a unique \(-a \in F\) such that \(a + (-a) = 0\).
Multiplicative Inverse
\(\hspace{1cm}\)For every \(a \in F\) with \(a \neq 0\), there exists a unique \(a^{-1} \in F\) such that \(a \cdot a^{-1} = 1\).
Distributive Property
\(\hspace{1cm} a \cdot (b + c) = a \cdot b + a \cdot c\) and \((a + b) \cdot c = a \cdot c+ b \cdot c\) for all \(a,b,c \in F\).
A field provides the perfect setting for all the arithmetic we take for granted, one where operations behave safely, predictably, and symmetrically. In simpler terms, a field is a mathematical playground where the four basic arithmetic operations, addition, subtraction, multiplication, and division (except by zero), all work together in harmony. This balance is what makes algebra and especially linear algebra possible.
Fields form the foundation of vector spaces. Whenever you work with vectors, matrices, or linear transformations, you’re implicitly relying on a field. It ensures that you can scale, combine, and manipulate numbers freely, without ever breaking the underlying arithmetic rules. In short:
- Groups give us basic structure (like addition and substraction).
- Rings let us add, substract and multiply.
- Fields let us add, subtract, multiply, and divide—consistently and predictably.
You can think of a field as the fully equipped car in our analogy. Groups gave us the basic framework, wheels, an engine, steering and braking. Rings added more systems, like comfort, safety features and of course the massive spoiler. But a field is the complete, road-ready vehicle: it can drive smoothly in any direction, handle any maneuver, and take you wherever you need to go in mathematics.
Common examples of fields in linear algebra are the real numbers \(\mathbb{R}\) and the complex numbers \(\mathbb{C}\). In these tutorials, we will focus on the real numbers \(\mathbb{R}\), since they are the ones most commonly used in machine learning. A real number is any number that can be represented on the number line, including positive numbers, negative numbers, zero, rational numbers, and irrational numbers.
We are finally ready to define a vector space.
Vector Spaces
A vector space is a set \(V\) along with an addition on \(V\) and a scalar multiplication on \(V\) such that the following properties hold:
Closure
\(\hspace{1cm}\)For all \(u,v \in V\) and \(a \in F\), both \(u + v \in V\) and \(a \cdot u \in V\).
Commutativity
\(\hspace{1cm} u + v = v + u \) for all \(u,v \in V\).
Associativity
\(\hspace{1cm} (u + v) + w = u + (v + w)\) and \((a \cdot b) \cdot v = a \cdot (b \cdot v)\) for all \(u,v,w \in V\) and \(a, b \in F\)
Additive Identity
\(\hspace{1cm}\)There exists an element \(\mathbf{0} \in V\) such that \(v + \mathbf{0} = v\) for all \(v \in V\).
Additive Inverse
\(\hspace{1cm}\)For every \(v \in V\), there exists a unique \(-v \in V\) such that \(v + (-v) = \mathbf{0}\).
Multiplicative Identity
\(\hspace{1cm}1 \cdot v = v\) for all \(v \in V\).
Distributive Property
\(\hspace{1cm} a \cdot (u + v) = a \cdot u + a \cdot v\) and \((a + b) \cdot v = a \cdot v + b \cdot v\) for all \(a,b \in F\) and for all \(u,v \in V\).
In essence, a vector space is an abelian group under addition that is also equipped with a scalar multiplication operation by elements of a field \(F\). Simply put, a vector space is a set of objects that you can add together and scale while following some rules. I stated the definition in its general form, meaning it works for any field \(F\). However, in our setting we will work specifically with the field \(\mathbb{R}\). So whenever we refer to “the field” of a vector space in what follows, we mean \(\mathbb{R}\).
The elements of a vector space, the objects in the set \(V\), are called vectors. Now, you may wonder: in the definition of a vector space, the vectors come from the set \(V\), and only the scalars come from the field \(F\). So why did we spend so much time defining what a field is, only to end up using ordinary numbers? Consider solving a simple equation (this is algebra, after all):
\(a\mathbf{x} = b\),
where \(a\) is a scalar. If we want to solve for \(\mathbf{x}\), we need to “move” \(a\) to the other side, that is, we need to divide by \(a\). This requires a multiplicative inverse. Some mathematical structures like rings don’t guarantee that; fields do. Without inverses, even a simple exercise like this could break down.
These rules exist precisely so we can perform the operations we typically take for granted: solving equations, manipulating expressions, and computing without running into avoidable complications. And the scalars can’t come from just a group either, since both addition and multiplication are required for the distributive laws.
To make this definition more concrete, we look at the vector space \(\mathbb{R}^n\) we will primarily work with; however, we first need to define what lists are.
Lists
Suppose \(n\) is a nonnegative integer. A list of length \(n\) is an ordered collection of \(n\) elements.
By “ordered,” we mean that the sequence of elements matters, so \((2,3)\) is not the same as \((3,2)\). Two lists are considered equal if and only if they have the same length and their corresponding elements are identical and in the same order. A typical list of numbers looks like this:
\((x_1, x_2, \dots)\)
A list of length \(0\) looks like this:
\(()\)
There is an important difference between lists and sets. In a list, the order of elements matters and repetitions are significant. In a set, however, order does not matter and repetitions are ignored. For example:
- The lists \((19, 8)\) and \((8, 19)\) are not equal, because the order of elements is different. But the sets \(\{19, 8\}\) and \(\{8, 19\}\) are equal, since order doesn’t matter in sets.
- Similarly, the lists \((6, 6)\) and \((6, 6, 6)\) are not equal, because they have different lengths. However, the sets \(\{6, 6\}\) and \(\{6, 6, 6\}\) are equal, since repetitions in a set are ignored.
\(\mathbb{R}^n\)
\(\mathbb{R}^n\) is the set of all lists of length \(n\) of elements of \(\mathbb{R}\)
\(\mathbb{R}^n = \{(x_1, x_2, \dots, x_n) : x_k \in \mathbb{R} \; \text{for} \; k = 1, \dots, n\}\)
For \( (x_1, \dots, x_n) \in \mathbb{R}^n\) and \(k \in \{1,\dots,n\}\) we say that \(x_k\) is the \(k^{\text{th}}\) coordinate of \( (x_1, \dots, x_n)\)
Intuitively, this vector space consists of lists of numbers. For example:
- \((2,2,3)\) is a vector in \(\mathbb{R}^3\),
- \((0,7)\) is a vector in \(\mathbb{R}^2\),
- \((1,9,0,8)\) is a vector in \(\mathbb{R}^4\).
We call the numbers in these lists coordinates. Reading from left to right, the first entry is the first coordinate, the second entry is the second coordinate, and so on.
Now pay close attention, this is an important point. Throughout this course, I want to give you strong visual intuition so that you always have a mental image to refer to when thinking about different concepts and algorithms. For this reason, I will often use geometric vectors. However, geometric vectors and lists in \(\mathbb{R}^n\) are not the same thing. They are different kinds of objects.
What we can do is associate each geometric vector with a list of numbers in \(\mathbb{R}^n\). This allows us to use geometry as a visual aid. Think of a cat and a painting of that cat: the animal and the painting are two different objects. There is a clear relationship, you can recognize the cat in the painting, but the painting is not the cat.
Many textbooks and courses introduce geometric vectors and immediately attach coordinates to them, which can be misleading. This often causes readers to believe that geometric vectors and vectors in \(\mathbb{R}^n\) are the same object. As you move forward in this course, you will see that \(\mathbb{R}^n\) is a general-purpose mathematical tool that applies to many different contexts: geometry, image processing, functions, and more. In each case, we translate the objects we care about into lists of numbers. This translation is extremely useful, but it does not mean that the objects themselves are the same.
Since a vector space involves addition and scalar multiplication, let’s now see how these operations work in \(\mathbb{R}^n\).
Addition and Scalar Multiplication in \(\mathbb{R}^n\)
Addition in \(\mathbb{R}^n\) is defined by adding corresponding coordinates:
\( (x_1, x_2, \dots, x_n) + (y_1, y_2, \dots, y_n) = (x_1 + y_1, x_2 + y_2, \dots, x_n + y_n) \)
We can introduce a placeholder for the lists so that we don’t have to write out every element each time:
\(\mathbf{x} + \mathbf{y}\)
here \(\mathbf{x} = (x_1, \dots, x_n)\) and \(\mathbf{y} = (y_1, \dots, y_n)\). Observe that this operation is commutative; in other words, \( \mathbf{x} + \mathbf{y} = \mathbf{y} + \mathbf{x}\).
Example
Consider the vectors \(\mathbf{x}, \mathbf{y} \in \mathbb{R}^3\):
\(\mathbf{x} = (1,2,3), \hspace{0.5cm} \mathbf{y} = (5,6,7)\)
Vector addition is performed componentwise, so we have
\(\mathbf{x} + \mathbf{y} = (1,2,3) + (5,6,7) = (6,8,10)\)
Next, we’ll look at the neutral element, denoted by \( \mathbf{0} \), which has length \(n\):
\( \mathbf{0} = (0,0,\dots,0) \)
This is useful because it allows us to define the additive inverse in \(\mathbb{R}^n\):
For \(\mathbf{x} \in \mathbb{R}^n\), the additive inverse of \(\mathbf{x}\), denoted by \(-\mathbf{x}\), is the element \(-\mathbf{x} \in \mathbb{R}^n \) such that:
\(\mathbf{x} + (-\mathbf{x}) = \mathbf{0}\)
So if \(\mathbf{x} = (x_1, \dots, x_n)\) then \(-\mathbf{x} = (-x_1, \dots, -x_n)\).
Let’s move on to scalar multiplication:
The product of a scalar \(\lambda\) and an element in \(\mathbb{R}^n\) is computed by multiplying each coordinate of the element by \(\lambda\):
\(\lambda \mathbf{x} = \lambda (x_1, x_2, \dots, x_n) = (\lambda x_1, \lambda x_2, \dots, \lambda x_n)\)
here \(\lambda \in \mathbb{R}\) and \(\mathbf{x} \in \mathbb{R}^n\).
“Scalar” is really just another word for “number”, we’ll see soon why that name makes sense. Also, note that both vector addition and scalar multiplication in \(\mathbb{R}^n\) are closed operations. In addition, notice that the scalar comes from our field \(\mathbb{R}\), whereas the elements we add (the lists) come from \(\mathbb{R}^n\).
Example
Suppose we want to scale the vector \(\mathbf{x} \in \mathbb{R}^3\) by the scalar \(\lambda = 2\). Let
\(\mathbf{x} = (1,1,7)\)
Scalar multiplication is performed componentwise, so we obtain
\(\lambda \cdot \mathbf{x} = 2 \cdot (1,1,7) = (2 \cdot 1, 2 \cdot 1, 2 \cdot 7) = (2,2,14)\)
Examples of Vector Spaces (Bonus)
We’ve already seen the example of \(\mathbb{R}^n\), which will be our main “workspace” in machine learning. This is the example you really need to know by heart and understand deeply. But remember, the definition of a vector space is abstract, so many things can form a vector space. Here are a few examples for fun, we won’t be working with these right away, but it’s nice to see the bigger picture.
Polynomials of degree less than 3
These are polynomials that look like:
\(ax^3 + bx^2 + cx + d\)
Now, imagine a whole collection of these polynomials. This collection will be our set \(V\), and each polynomial is a vector.
- We can add any two polynomials in the set, and the result is still a polynomial of degree 3 or less.
- We can multiply a polynomial by a scalar, and it still stays in the set.
- All the other vector space rules we discussed also hold.
Real-valued functions on a domain
Next, consider all real-valued functions defined on some interval, like all continuous functions on \([0,1]\). Denote them by \(f(x), g(x), \dots\) These are our vectors.
- Adding two functions gives another function in the same set.
- Multiplying a function by a scalar still gives a function in the set.
- And again, all the other vector space rules apply.
The key takeaway: vector spaces aren’t limited to lists of numbers. Polynomials, functions, sequences, and so on can all form vector spaces, as long as addition and scalar multiplication follow the rules.
Summary
- A vector space is a mathematical structure consisting of a set \(V\) where we can add elements and multiply them by scalars, following certain rules.
- The elements of \(V\) are called vectors.
- The scalars come from a field \(F\).
- We need the field \(F\) and the vector space rules so we can perform familiar operations, solving equations, manipulating expressions, and computing, without running into unnecessary complications.
- The main vector space used in machine learning is \(\mathbb{R}^n\), where each vector is a list of real numbers.
- Many objects can be written as these lists, but the lists are not the objects themselves, they simply represent them.