# Vector spaces

Why the hell is matrix multiplication so weird? And as we’ve seen already—transformations of the plane, row operation through elementary matrices, representing linear systems—why is it so got-darn effective?

One might look at the list of matrix multiplication’s successes, throw up their hands, say multiplication is defined ingeniously, and make no attempt to question the situation further. In a good math class, though (and I’m not claiming to be teaching one of those), investigation is the name of the game, so further down the rabbit hole we go.

Note well. In the interest of time, these notes form the bare bones of my lecture content. You are expected to come to class to see examples and light proofs.

### An introductory example: Three-dimensional space

You are already familiar with one example of a vector space, (three-dimensional) Euclidean space ${\mathbf R}^3$. Recall that it is the set of all triples $[x_1, x_2, x_3]^t$ whose components $x_1$, $x_2$, and $x_3$ are real numbers. Remember,

$[x_1, x_2, x_3]^t = \left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right].$

Don’t worry so much about the transpose: so that matrix multiplication works out correctly, it is standard to write vectors in ${\mathbf R}^3$ and later, ${\mathbf R}^n$ as columns. However, it is much easier to type row vectors than column vectors, so now that you know about transpose, I am transposing all of them.

Remember that you learned

$[x_1, x_2, x_3]^t + [y_1, y_2, y_3]^t = [x_1 +y_1, x_2 + y_2, x_3 + y_3]^t$

and if λ is a real number, then

$\lambda [x_1, x_2, x_3]^t = [\lambda x_1, \lambda x_2, \lambda x_3]^t.$

These are the rules for addition and scalar multiplication from calculus, but they line up exactly with the rules for matrix addition and scalar multiplication. You also learned that, letting

${\mathbf e}_1 = [1, 0, 0]^t,\qquad {\mathbf e}_2 = [0, 1, 0]^t,\qquad {\mathbf e}_3 = [0,0,1]^t$

(called ij, and k in calculus), through adding and scaling any vector may be written as a linear combination of these. For example,

$[3, 2, -1]^t = 3{\mathbf e}_1 + 2{\mathbf e}_2 - {\mathbf e}_3$.

In general, a linear combination vectors is a sum of scalar multiples of those vectors.

A set of vectors is linearly independent if none can be written as a linear combination of the others.

In ${\mathbf R}^3$, for ${\mathbf e}_1, {\mathbf e}_2,$ and ${\mathbf e}_3$ to be linearly independent means that there are no $\lambda$ and $\mu$ that make any of the equations

${\mathbf e}_1 = \lambda {\mathbf e}_2 + \mu {\mathbf e}_3 \\{\mathbf e}_2 = \lambda {\mathbf e}_1 + \mu {\mathbf e}_3 \\{\mathbf e}_3 = \lambda {\mathbf e}_1 + \mu {\mathbf e}_2$

true. Or, combining all the equations together, we say that there are no nonzero real numbers $\alpha_1, \alpha_2,$ and $\alpha_3$ such that

$\alpha_1 {\mathbf e}_1 + \alpha_2 {\mathbf e}_2 + \alpha_3 {\mathbf e}_3 = {\mathbf 0} = [0, 0, 0]^t.$

When $\alpha_1 = \alpha_2 = \alpha_3 = 0$, we say the linear combination is trivial. Another way to express linear independence is that no nontrivial combination of the vectors makes zero.

How can we show this? Well, it’s a linear system! What else have we been doing all month? The equation

$\alpha_1 \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right] + \alpha_2 \left[ \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right] + \alpha_3 \left[ \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right]$

reduces to

$\left[ \begin{array}{c} \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right]$

which is true if and only if $\alpha_1 = \alpha_2 = \alpha_3 = 0$. The facts that ${\mathbf e}_1, {\mathbf e}_2$, and ${\mathbf e}_3$ generate every vector in ${\mathbf R}^3$ through linear combination and are themselves linearly independent, in other words they each represent a different piece of information, mean that they form a basis for ${\mathbf R}^3$.

Let’s slow up there, because this is important. Every vector in ${\mathbf R}^3$ is made up of some of ${\mathbf e}_1$, some of ${\mathbf e}_2$, and some of ${\mathbf e}_3$. Particularly, there is no redundancy: you could not get the ${\mathbf e}_3$ part out of the other two, for example. The basis of a vector space, in this case ${\mathbf R}^3$ but we’ll be talking about all vector spaces later, tells you all and only what you need to know about the space.

How? Well, let’s consider what the matrix

$A = \left[ \begin{array}{ccc} 0 & 0 & 3 \\ 1 & 0 & -1 \\ 0 & 2 & 0 \end{array} \right].$

If we let ${\mathbf x} = [x_1, x_2, x_3]^t$ be any vector in ${\mathbf R}^3$, then

$A{\mathbf x} = \left[ \begin{array}{ccc} 0 & 0 & 3 \\ 1 & 0 & -1 \\ 0 & 2 & 0 \end{array} \right] \left[ \begin{array} {c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} 3x_3 \\ x_1 - x_3 \\ 2x_2 \end{array} \right].$

In this way, left-multiplication by $A$ gives a function

$A(x_1, x_2, x_3) =\left[ \begin{array}{c} 3x_3 \\ x_1 - x_3 \\ 2x_2 \end{array} \right]$

like the ones we recall from vector calculus. There’s something special about this function, though. Since matrix multiplication distributes over addition and scalars pass through matrices, if and are 3-vectors and $\lambda$ and $\mu$ are scalars then

$A(\lambda {\mathbf x} + \mu {\mathbf y}) = \lambda A{\mathbf x} + \mu A{\mathbf y}.$

The function $A$ is a linear function, meaning it passes through addition of vectors and through scalar multiplication.

Finally, we get to see why bases are so handy. For any 3-vector ${\mathbf x} = [x_1, x_2, x_3]^t$,

$A{\mathbf x} = A(x_1{\mathbf e}_1 + x_2{\mathbf e}_2 + x_3{\mathbf e}_3) = x_1A{\mathbf e}_1 + x_2A{\mathbf e}_2 + x_3A{\mathbf e}_3.$

In other words, the fact that $A$ is linear means that we can know everything there is to know about $A$ from what it does the basis. If only there were an easy way to know what $A$ does to the basis…

$A{\mathbf e}_1 = \left[ \begin{array}{ccc} 0 & 0 & 3 \\ 1 & 0 & -1 \\ 0 & 2 & 0 \end{array} \right] \left[ \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right] = [0, 1, 0]^t$

So, $A{\mathbf e}_1$ is just the first column of $A$! Likewise,

$A{\mathbf e}_2 = [0, 0, 2]^t \\ A{\mathbf e}_3 = [3, -1, 0]^t.$

With all this machinery built, it becomes very easy to compute $A[4, 1, -1]$:

$A\left[ \begin{array}{c} 4 \\ 1 \\ -1 \end{array}\right] = 4A{\mathbf e}_1 + A{\mathbf e}_2 - A{\mathbf e}_3 = 4\left[ \begin{array}{c} 0 \\ 1 \\ 0 \end{array}\right] + \left[ \begin{array}{c} 0 \\ 0 \\ 2 \end{array}\right] - \left[ \begin{array}{c} 3 \\ -1 \\ 0 \end{array}\right] = \left[ \begin{array}{c} -3 \\ 5 \\ 2 \end{array}\right].$

Linear algebra is all about linear transformations between vector spaces. As this example illustrates, linear transformations can be known entirely by what they do to the vector space’s basis. Going forward, we’ll look at these ideas in their abstract form, and then by the end of the class, bring everything back to the study of matrices.

### Vector spaces

The remainder of the section will formalize a lot of the ideas we saw in the ${\mathbf R}^3$ example. Let be a set that is closed under addition ${\mathbf v} + {\mathbf w}$ where ${\mathbf v}, {\mathbf w} \in V$ and scalar multiplication $\lambda {\mathbf v}$ where $\lambda \in {\mathbf R}$. We call vector space if addition:

1. Associates: ${\mathbf u} + ({\mathbf v} + {\mathbf w} = ({\mathbf u} + {\mathbf v}) + {\mathbf w}$
2. Commutes: ${\mathbf v} + {\mathbf w} = {\mathbf w} + {\mathbf v}$
3. Has an identity, a vector 0 with the property ${\mathbf v} + {\mathbf 0} = {\mathbf v}$
4. Has inverses, a vector $(-{\mathbf v})$ for every ${\mathbf v}$ such that ${\mathbf v} + (-{\mathbf v}) = {\mathbf 0}.$

and if scalar multiplication:

1. Associates: $(\lambda \mu){\mathbf v} = \lambda (\mu {\mathbf v})$
2. Identity: $1 \cdot {\mathbf v} = {\mathbf v}$
3. Distributes over vectors: $\lambda ({\mathbf v} + {\mathbf w}) = \lambda {\mathbf v} + \lambda {\mathbf w}$
4. Distributes over scalars: $(\lambda + \mu) {\mathbf v} = \lambda {\mathbf v} + \mu {\mathbf v}$

In other words, if addition and scalar multiplication follow the nice properties we expect from real numbers and the example 3-vectors.

It’s important not to confuse consequences of the vector space definition with the definition itself. For example, it is not in the definition that $0{\mathbf v} = {\mathbf 0}$ or that $(-1){\mathbf v} = -{\mathbf v}$, but both of these facts have “stupid” proofs that follow from the rules. Mathematics is a game in which you must make do with exactly what you have and nothing more, but often you find that you can build what you need.

e.g. Examples of vector spaces included ${\mathbf R}^3$, the set

${\mathbf R}^n = \left\{ [x_1, x_2, \ldots, x_n]^t \; : \; x_i \in {\mathbf R} \right\}$

of all real $n$-column vectors, the set $P_n(x)$ of all polynomials with real coefficients and degree at most $n$, the set $C([0,1])$ of all continuous real-valued functions $f:[0,1]\to {\mathbf R}$, and the set $M_{m\times n}({\mathbf R})$ of all $m \times n$ matrices with real entries. Perhaps more usefully, non-examples include the set $\text{GL}(n,{\mathbf R})$ of all invertible $n \times n$ matrices (the zero matrix is missing) and the set of all polynomials of even degree (it is possible to subtract them to get a polynomial of odd degree, so the set is not closed).

As with ${\mathbf R}^3$, a linear combination of a set of vectors $\{ {\mathbf v}_1, {\mathbf v}_2, \ldots, {\mathbf v}_n \}$ is a sum of scalar multiples of those vectors,

$\displaystyle \sum_{i=1}^n \alpha_i {\mathbf v}_i, \qquad \alpha_i \in {\mathbf R}.$

The same set of vectors is linearly independent if it is impossible to write one as a linear combination of the others, or equivalently if the linear system

$\displaystyle \sum_{i=1}^n \alpha_i {\mathbf v}_i = {\mathbf 0}$

only holds when all $\alpha_i = 0$, and linearly dependent otherwise.

If every vector in $V$ can be written as a linear combination of the vectors in $S \subset V$, then we say $S$ spans or is a spanning set for $V$. If additionally $S$ is linearly independent, or morally speaking has no redundant elements, we say $S$ forms a basis for $V$. A basis is the alphabet of a vector space: it contains all and only that information you need to build the whole space. Since there is no redundancy, each vector in a vector space can be written uniquely in the language of its basis. This turns out to be hugely important.

e.g. We saw earlier that the set $\{ {\mathbf e}_1, {\mathbf e}_2, {\mathbf e}_3\}$ is a basis for ${\mathbf R}^3$.

All bases for $V$—there are many, and which one you choose to use depends on which one makes easiest the problem you want to solve—have the same number of elements. This number is called the dimension of $V$, denoted $\text{dim }V$.

e.g. Since a basis for ${\mathbf R}^3$ contains three elements, unsurprisingly ${\mathbf R}^3$ is three-dimensional.

We say that $W \subseteq V$ is a subspace of $V$ if $W$ itself is a vector space. It turns out that one only needs to verify that ${\mathbf 0} \in W$, and that $W$ is closed under addition and scalar multiplication. Though it is not always the case that a basis for $V$ can be cut down into a basis for $W$, it is true that a basis for $W$ can always be expanded into a basis for $V$. This tells us that if $W \subseteq V$ is a subspace of $V$ such that $\text{dim }V = \text{dim }W$, then $V = W$.

e.g. Pick a nonzero vector ${\mathbf v} \in {\mathbf R}^3$. Then any line

$L({\mathbf v}) = \{ \lambda {\mathbf v} \; : \; \lambda \in {\mathbf R}\}$

is a subspace of ${\mathbf R}^3$. Likewise, for any two nonzero independent vectors the plane

$P({\mathbf v}, {\mathbf w}) = \{ \lambda {\mathbf v} + \mu {\mathbf w} \; : \; \lambda, \mu \in {\mathbf R}\}$

is a subspace of ${\mathbf R}^3$.

These are the basic notions of vector spaces, but to really get a feel for them one must do examples (the textbook and your class notes contain many). This includes painstakingly verifying that so-and-so is a vector space and such-and-such is true given the rules of addition and scalar multiplication—this is how mathematics is done.

Next, we’ll relate vector spaces to one another using linear transformations.