Introduction to Sets and Categories for LAB

Here is my first draft for the introduction to the first chapter of the linear algebra book. Its a chapter introducing sets and categories. I tried to take a different direction for it, and I am genuinely curious what people think. There is also a button on top which changes things into reading focus mode in which all links are removed and to get out of it you must wait for 5 seconds after clicking it again. I am trying out things to make a web-based textbook better. If you have any thought do email me at leo [dot] c [dot] alcock@gmail [dot] com

Introduction

Ah, the classic “remember sets?” chapter that infects the early chapters of many mathematics textbooks. Often dry and unmotivated, it usually serves just to make the textbook self-contained more than anything. As a student, I found myself fully skipping it more often than not—and if you are familiar with all the material in this chapter that would make sense to do. However, oftentimes there would be little pieces of set theory that I didn’t know that were necessary for later chapters (equivalence relations comes to mind). I would miss these by skipping the entire chapter outright. There’d be a critical 15% of the chapter I was missing. So, my poor reader, are you simply doomed to read another dry set theory chapter in which you know 85% of the material to get the 15% novel material sprinkled throughout it?

Well, let’s see what I can do about the dryness. I’ll try to quickly go through the general thing I would expect most of you to have seen before. I’ll also try to have some unique and perhaps more involved examples with sets. Less typically, I will be casting things in the language of categories throughout the chapter to give a new perspective on the set theory concepts. Have you seen the intersection of sets defined via a fibered product in the category of sets?

And how about motivations? Why do we care about sets and categories in the first place? Well, let me address this question now. If you give me a few pages, I can explain what role sets and categories play not just in linear algebra but in mathematics as a whole. I will also correct a few common misconceptions about their role along the way.

Set theory was both the savior and unifier of mathematics around the turn of the twentieth century. While we had been doing mathematics for thousands of years at that point, and with great success, we did not have a common grounding for the mathematics we did. Eventually, issues from this arose. While Aristotelian logic (largely the logic we still use today) and Euclid’s elements held up well, Newton and Leibniz’s calculus with infinitesimals posed real problems and the Italian school of algebraic geometry even “proved” false results.[1]

All of this culminated in a need for a steady foundation for mathematics. This need would find its hero in the set theory of Cantor and Dedekind. These two would develop a theory of infinite collections that proved to be powerful enough to encapsulate the rest of mathematics within it. The axiomatic grounding we use today would be provided by Russell, Frege, Zermelo (the Z of ZFC fame) and others.

Now, in mainstream mathematics, we conduct mathematics in the first-order theory of sets with ZFC axioms. That is, every object is a set and all of our arguments about them use $\forall, \exists$ and things like and, or, not and the relation $A\in B$. Well, even this is not really what we do. In truth, we write mathematics in English. The implicit standard for a full proof is that it has enough detail that, in theory, a working mathematician could translate it to first order statements in ZFC without having to come up with a new idea. Yet this is not an exercise done in practice. So, how do sets come up in practice for the working mathematician?

Well, the detail that matters most is that every object in mathematics is a set. In practice, we have a concept which we want to work with. Maybe it is a vector space, ring, the rational or real numbers, and to work with it, we first need to prove it exists. How do we do so? We build it out of sets of course! That is, sets generally serve as the material of modern mathematics. Though because every object is made out of sets, this can sometimes give people the wrong impression about the concepts being built out of them.

Consider the standard construction of the natural numbers. We take $0$ to be the empty set $\lbrace \rbrace $. Then we let $1$ be the set containing $\lbrace 0\rbrace $, $2$ the set containing $\lbrace 0,1\rbrace $ and so on. Writing it out in a verbose way we have:

\[0 = \lbrace \rbrace , 1 = \lbrace \lbrace \rbrace \rbrace , 2 = \lbrace \lbrace \rbrace , \lbrace \lbrace \rbrace \rbrace \rbrace , 3 = \lbrace \lbrace \rbrace , \lbrace \lbrace \rbrace \rbrace , \lbrace \lbrace \rbrace , \lbrace \lbrace \rbrace \rbrace \rbrace \rbrace ,4=\cdots\]

This can give people the mis-perception that these are what numbers are. That is, they are sets of sets all the way down—boxes of boxes where the box corresponding to $n$ contains precisely the boxes $n-1, n-2,\ldots$. This is not the right way to think about the naturals. Rather, you should think of this as a construction of natural numbers out of sets. For example, you could also construct them as follows:

\[0 = \lbrace \rbrace , 1 = \lbrace \lbrace \rbrace \rbrace , 2 = \lbrace \lbrace \lbrace \rbrace \rbrace \rbrace , 3 = \lbrace \lbrace \lbrace \lbrace \rbrace \rbrace \rbrace \rbrace , 4 = \cdots\]

While this construction may be less convenient than the previous one—we can’t use the $\subset$ relation for $\le$ which also extends to the common construction of all ordinal numbers well—it is still a valid way to construct the natural numbers. The successor function is easy to define in this context. Just as a programmer can implement some website with myriad different source codes, we can often construct our objects in myriad different ways too. In mathematics, our programming language of choice is set theory.

So, where does category theory fit in all this business? Well, in some ways category theory resolves the problem that arises from constructions being arbitrary. If we’re working with various different objects, it is a hassle if we have all of these different constructions of them. When I refer to the natural numbers… which natural numbers do I mean? Well, really it shouldn’t matter which natural numbers I mean. For whatever incarnation of the natural numbers I should get the same results. So, how do we specify that?

Let’s try to do this same exercise for definitions of a product of two sets. The standard definition of a product of two sets is as follows:

Definition: The product of two sets $A$, $B$ is the set of ordered pairs:

\[\lbrace (a,b)\mid a\in A, b\in B\rbrace\]

Wait, what is $(a,b)$? Every object is a set so this must be a set of some sort. However, if $a\neq b$, then $(a,b)\neq (b,a)$. So, it certainly isn’t the set $\lbrace a,b\rbrace $. In general, we construct $(a,b)$ as $\lbrace \lbrace a\rbrace , \lbrace a,b\rbrace \rbrace $. We can find the first element of the ordered pair by looking for the set containing one element and the second element is the element of the other set not equal to the first element. However, we could have also construction it as $\lbrace \lbrace a,b\rbrace ,\lbrace b\rbrace \rbrace $. The first construction proposed was the ugly $(a,b) = \lbrace \lbrace \lbrace a\rbrace , \lbrace \rbrace \rbrace , \lbrace \lbrace b\rbrace \rbrace \rbrace $.See here. There are quite a few options. Yet, category theory allows us to cut through this arbitrariness. So, how would category theory tackle the definition of a product of sets?

Definition: The product of two sets $A$, $B$ is a set $S$ along with two maps $\pi_A: S\rightarrow A, \pi_B: S\rightarrow B$ such that for any other set $T$ with maps $f:T\rightarrow A, g:T\rightarrow B$, there is a unique map $T\xrightarrow{\iota_{f,g}} S$ such that $\pi_A\circ \iota_{f,g} = f$ and $\pi_B\circ \iota_{f,g} = g$.

Wow, what a terrible jumble of words. If this is what is offered by category theory, do we really want it? The answer, to the dismay of some and jubillance for others, is yes. And if you give me a moment, maybe I can untangle this mess a little bit.

Let’s start with the product $(S,\pi_A, \pi_B)$. We’ll draw it as follows:

This seems rather different from the set theory definition of product. Instead of a single set, $A\times B$, we have a set $S$ and two functions. However, these two definitions are perfectly compatible. In fact, we can use the construction $A\times B$ to understand $\pi_A$ and $\pi_B$ better. If we let $S = A\times B$, then we can define $\pi_A$ and $\pi_B$ as follows

\[\pi_A((a,b)) = a\] \[\pi_B((a,b)) = b\]

That is, the maps $\pi_A, \pi_B$ exactly encode the relationship between the set of pairs $S = A\times B$ and $A$ and $B$. We can even define $(a,b)\in S$ as the unique element mapping to $a\in A$ under $\pi_A$ and to $b\in B$ under $\pi_B$. Now then, what about all that other business with another set $T$ with maps to $A$ and $B$. Drawing that out we have the following diagram of maps:

For any set $T$, and maps $f,g$ to $A$ and $B$ respectively we get a unique map $\iota$ making the all compositions in this diagram agree: $\pi_A\circ \iota = f, \pi_B\circ \iota = g$. Well if this property holds for any $T$, lets start with the simplest $T$ we can think of: The one point set $T = \ast = \lbrace \pt\rbrace $. In this case, giving a map from $f:\ast\rightarrow A$ is simply picking an element $a\in A$: the element $f$ maps $\pt$ to. Similarly, maps $g:\ast\rightarrow B$ encode elements of $B$. What does it mean to have a unique map $\iota_{f,g}$ such that the above compositions hold? Well maps $\ast\rightarrow S$ encode elements $s\in S$ and unrolling the definitions we find that this translates into the statement

“For any two elements $a\in A$, $b\in B$, there is a unique element $s\in S$ such that $\pi_A(s) = a$, $\pi_B(s) = b$.”

We call that unique element $(a,b)$. We’re starting to see how this characterizes what we want out of a set product: the set of pairs of elements. We also see how it is encoded using mapping with $\pi_A$, $\pi_B$. One question remains that we must answer that we haven’t gotten to yet: Is it well-defined?

First, lets recall what “well-defined” even means. Oftentimes in mathematics we define something in terms of properties that it holds. We might define $\sqrt{2}$ as the positive real number $r$ such that $r^2 = 2$. This works fine. However, if I tried to define $\sqrt{-1}$ as the positive real number $r$ such that $r^2 = -1$ I would have a terrible problem. There is no positive real number that squares to $-1$: the square of a positive real number is a real number. I could try again to define $\sqrt{-1}$ as the complex number which squares to $-1$ but this leads us to a second problem that is often swept under the rug. There are two complex numbers which square to $-1$. If we call one of them $i$, then $-i$ also squares to $-1$. If you learn nothing else from this chapter, you can come away with the fun fact that the imaginary unit $i$ is not well-defined because its defining property is not unique!

In these examples we see the two key ingredients for checking that something is well-defined: it must exist and be unique. Transitioning back to the categorical set product, does it follow these rules? Well, for existence we can use the set construction for $A\times B$ to furnish an example of a categorical product. The role set theory plays for teh working mathematician is to show existence. Category theory steps in for uniqueness. However, the categorical set product is not unique. At least, not in the usual sense. Though it is unique in a more useful way.

Claim: The categorical set product is unique up to unique bijection. That is, for any two set products $(S,\pi_A, \pi_B)$, $(S’, \pi_A’, \pi_B’)$, there is a unique bijection $\iota:S’\rightarrow S$ such that $\pi_A’ = \pi_A\circ \iota$, $\pi_B’ = \pi_B\circ \iota$.

Well, lets prove this

Proof: As $S$ is a categorical product, we can induce a map $\iota:S\rightarrow S’$ such that the compositions above hold. We also know that it is unique by the defining property of the categorical set product. Similarly, we can induce a map $\iota’:S’\rightarrow S$ that is the unique map such that $\pi_A\circ \iota’ = \pi_A’, \pi_B\circ \iota’ = \pi_B’$. We now have these two uniquely defined maps with the composition properties we want, what remains to show is that they are mutually inverse: $\iota\circ \iota’ = \id_{S’}$ and $\iota’\circ \iota = \id_{S}$. You can do this with the following slick trick: stacking the diagrams on top of each other:

We also have the tautological diagram:

If you follow the composition properties, you’ll notice that $\pi_A\circ (\iota’\circ\iota) = (\pi_A\circ \iota’)\circ \iota = \pi_A’\circ \iota = \pi_A$. A similar calculation yields $\pi_B\circ (\iota’\circ \iota) = \pi_B$. However, tautologically, we also know that $\id_S$ also has these properties: $\pi_A\circ \id_S = \pi_A, \pi_B\circ \id_S$. Combining this with the fact that $S$ is a categorical set product we find that $\id_S$ and $\iota’\circ \iota$ are both the unique maps with this property and must be equal. A symmetric argument shows that $\iota\circ \iota’ = \id_{S’}$ and thus we have uniquely defined bijections. QED

So, to this point we have now fully characterized the categorical set product. We also showed that it is unique—at least, in a categorical way. Instead of exactly one set which we call the set product, we allow for a whole family of sets to be the set product. However, each such set has maps attached to it describing how it is the set product and for any two such set products there is a unique bijection identifying them.

We refer to definitions of this kind colloquially as “universal properties.” Universal properties will characterize many categorical notions such as products, coproducts, limits, colimits, and even many functors and adjunctions. They all come with a similar flavor. We’ll have some properties of an object expressed with maps and you will get an induced unique map to (or from) the universal object to (or from) other test objects. For set-product, we said the product was unique up to unique bijection. In general, we’ll say things are unique up to unique isomorphism—we’ll define what isomorphism means later in the chapter. For sets, it suffices to say that isomorphism is the same as bijection.

Category theory plays this clarifying role well. It shifts one’s perspective from that of defining unique objects to defining objects which are unique up to a kind of equivalence. Though, this is not the only role category theory will play. It will also shift one’s perspectice from the object-oriented to the functional and it will be a key piece of language useful for higher mathematics.

Teaching category theory, unfortunately, is notoriously difficult. It often attracts many young students. It has the allure of being modern and “universal” while also requiring nearly no prerequisites. However, this often yields a class of zealots with little understanding of what category theory is for. They have learned a lot of language without learning about any of the subjects the language talks about.

However, the typical mathematical education in category theory might not be much better. Usually, one will pick up categorical notions by osmosis. They will be sprinkled like bread crumbs into different higher level classes. How many crumbs you get is dependent upon the taste of your intructor.

In this book, I hope to give a unified account of all the basic notions of category theory that are useful to a working mathematician while having some context for all of them. Within linear algebra, you can find examples of all the categorical notions you may care about. Limits, colimits, functors, adjunctions, and even some discussion of abelian categories will all pop up through out the text. With any luck, it will be clarifying to anyone who dares to read on.