Imagine a boy standing at one end of a football field on a calm day. He’d like to throw a stone as far as he can.

This is on par with an exercise in a physics class. A judicious solution comes from a combination of Newtonian physics and elementary calculus. Say the boy can throw equally as hard as he can in any direction, the best he can then do comes from angling the stone at \(45^{\circ}\).

In my physics class, we neglected to discuss why this approach is correct. It appears that this process can predict the behavior of a stone from launch to land. I realized later that this assumed the stone is thrown in vacuum. The solution presumes that the earth doesn’t rotate, the moon doesn’t exist. It also turns out to be slightly unreasonable to assume that one can throw equally hard in all directions.

There are two avenues that open up when one lists these assumptions. One is to take into account all aspects of the system as we can enumerate, then build a solution that takes into account these aspects. Another is to attempt to figure out what answer we’d like to come up with, then make as many simplifying assumptions as we can come up with to solve the problem. In this case, it turns out that a small hard stone enables us to neglect air resistance. The variation in gravitational force due to rotation of the earth is very small over the maximum distance that can be traveled by the stone.

A subtle point to be emphasized is the distinction between the real world, and the idealized mathematical reality, the *model* that we design. Our analysis, deduction and inference happens on the mathematical model. The relevance of the outcome to solving real world problems depends on how close the model approximates reality, something that may not even be feasible. This isn’t a particularly controversial idea, people have a vision of an ideal partner in their mind, the perfect boyfriend or girlfriend. A distillation of our preferences (Kline 2012Kline, Morris. 2012. *Mathematics and the Physical World*. Courier Corporation.).

Say, I’d like to roll a dice and place a bet based on what face comes up. One avenue would be to try to predict the face that will come up. We could conceivably start along the lines of what we did for the stone throwing boy. Soon, we find that we have to take into account the material of the dice, the air resistance involved, perhaps even decide to use a machine to ensure that it is launched precisely. Quickly this model becomes intractable. Furthermore, it isn’t even clear if these approximations that I just outlined are even sufficient to predict the throw accurately.

So instead in reality, we seek to answer a different question: assuming a fair dice, what is the *likelihood* of rolling a certain face after many theoretical throws? Thus, we use phrases like \(1:5\) odds of rolling a one. This doesn’t tell us whether we will roll a one next toss, but tells us how many ones may appear in several tosses.

A consequence of deductions happening on the model is always ensuring that these assumptions continually make sense for the real world problem we solve. We employ a different set of assumptions when we try to find the distance between the earth and the moon. Sure the boy made an assumption that the football field was flat, now we may assume that the earth is a point mass in space. Clearly both assumptions are idealizations specific to those problems. Despite the model that the boy used ignoring air resistance, you’d probably bet that a golf ball would reach the ground faster than a feather when both are dropped simultaneously. Air resistance matters now.

The mathematical operation of \(2 \times \displaystyle\frac{1}{2} = 1\) is completely valid. While someone might agree to take two half pies in place of a full pie, the same person might refuse to accept two half dresses in place of a full dress. Addition helps quantify the total number of wild animals in a zoo. Yet, despite the model involving finding the cardinality of the set of all wild animals, it doesn’t really make sense to *actually* put the animals in the same room.

One of the first models that one encounters is the mathematical structure of counting numbers, also known as the positive integers, denoted by \(\mathbb{Z}^{+} = \{1,2,3,\ldots\}\). One use is to say identify some characteristic by which we’d like to group a set of objects, make an assumption that each element in this group is *whole* or indivisible. For example, counting beans. No two beans are alike, but we ignore those dissimilarities and consider them to be equal under the *is a bean* property. Also, beans can be crushed.

It used to puzzle me to see the presentation of the addition operator on positive integers, i.e. \((\mathbb{Z}^{+},+)\) with properties such as:

*Commutativity*:\(\forall a,b \in \mathbb{Z}^{+}, a+b = b+a\)*Associativity*:\(\forall a,b,c \in \mathbb{Z}^{+}, (a+b)+c=a+(b+c)\)

The textbook presentation is sparse, elegant.Surely there was some innate deeper meaning to a number. It didn’t seem that a number had been *designed* but perhaps was discovered. It is not obvious to me that early man was mucking about with a few pebbles and came up with these properties.

Later I’d learn that the structures are the product of centuries of improvement. This process of going from informal heuristics to clean abstractions is replicated in most of mathematics. Aleksandrov, Lavrent’ev, and others (1999Aleksandrov, Aleksandr Danilovich, Mikhail Alekseevich Lavrent’ev, and others. 1999. *Mathematics: Its Content, Methods and Meaning*. Courier Corporation.) in their excellent soviet math book describe this process for addition and counting numbers by comparing it with how understanding of colors evolves. Perhaps initially individual groups were identified to have a property of size. As many wolves as the fingers on one hand. Similarly a cat can be described as being as black as the night. We then move onto talking about five wolves and black cats, crows. Then we divorce ourselves from the underlying objects and talk about blackness as part of a color structure; similarly numbers in relation to each other.

One problem with the physical metaphor as inspiration is its limitation to act as a source of intuition as we go up the ladder of abstraction. Sure counting numbers is a model that can be applied to determine the size of groups of discrete objects, but what exactly is \(0\) then? Is it nothing? Not precisely. The grade a person got in a test they never took is nothing. Someone else may have taken a test and gotten a grade of \(0\). Is infinity just a very large number?

An approach or an attitude one can adopt is the abstract method (Gowers and Kantor 2006Gowers, Timothy, and Jean-Michel Kantor. 2006. “Mathematics: A Very Short Introduction.” *The Mathematical Intelligencer* 28 (1). Springer: 63–63.). In essence, one is mindful that a mathematical object *is* what it *does* or what its properties are. Zero is simply a number that can be added to any number to produce that number. The *additive identity*. This isn’t a very odd attitude. E.g. what is the black king in Chess? Sure one can describe a particular chessboard. However the black king can be talked about irrespective of any one chessboard. It is described by its properties, how putting it in check happens etc.

A number, any number that property that is common to several groups of arbitrary unique objects of the same size. Given two groups of unique elements, addition is merely that size that the union of the group becomes (Stewart 1995Stewart, Ian. 1995. *Concepts of Modern Mathematics*. Courier Corporation.). Investigating the properties of the numbers, one also notes that each number is also unique. A unique label. E.g. when we number people running in a race, surely we don’t imply that runner \(10\) is ten times runner \(1\). We might as well have used a color to label a runner.

This is not to say that intuition is not useful. It provides us with vivid imagery that is powerful to think about very fast. You could intuitively think of the addition operation of \(i + 3\) as adding a distance of \(3\) steps to a *point* \(i\) and asking what lies there. A convenient mental picture for programming.

A formal rigorous description of the counting numbers can be built up axiomatically. We say that \(\mathbb{Z}^{+}\) is any set \(N\) that satisfies the following axioms:

- There is a function \(S:N \mapsto N\) called a successor. In other words, if \(m \in N\), there exists only one unique successor \(n \in N\) such that \(S(m)=n\).
- There is an element symbolized as \(1 \in N\). Also, \(1\) is no element’s successor, i.e \(\forall m \in N\), \(S(m) \neq 1\).

- S is one-to-one, i.e. if \(S(m) = S(m')\), then \(m=m'\).
*Axiom of Induction*: Given a set \(V\)1 \(V\) is called an inductive set. such that- \(1 \in V\).
- If \(x \in V\) then \(S(x) \in V\). \(N \subseteq V\).

Intuitively, this is intended to be a recipe to build what we think of as the whole or counting numbers. Every element has a unique successor. No two elements have the same successor. There are no cycles in this sequence. Note that we haven’t said anything about what the successor of an element should be.

Let us pretend for a second that we are from the future and have forgotten how the counting number system was ever represented. Let us see how to use these axioms to construct something that acts just like \(\mathbb{Z}^{+}\).

We know for sure that \(\{1\} \subseteq N\). There is some element that is the successor of \(1\). Say we make it up and say that \(S(1) = \alpha\). Unfortunately, we are not yet done. \(\alpha\) needs a successor. It also can’t be \(1\) by Axiom 2. This forces us to make up another symbol and add an element to our potential set, which now looks like \(\{1, \alpha, \beta \}\). Since \(\alpha\) is the successor of \(1\) and \(1\) can’t have a successor, we will need a new unique successor for \(\beta\). Say we repeatedly generate a bunch of these symbols and their unique successors like so:

- \(S(\beta) = \gamma\)
- \(S(\gamma) = \delta\)

Now we know for sure that \(\{1, \alpha, \beta, \gamma, \delta, \ldots \} \subseteq N\) where \(\ldots\) represents the successor of \(\delta\), its successor and so on.

What stops someone else from building a set \(\{1, \alpha, \beta, \gamma, \delta, \ldots \} \cup \{a, b\}\) where \(S(a) = b\) and \(S(b) = a\) and arguing that this is \(N\)?

The axiom of induction provides us a way out of this conundrum. Note that by definition, \(\{1, \alpha, \beta, \gamma, \delta, \ldots \}\) is an inductive set. We also know that given a successor function, say we produce an inductive set, \(V\), then \(N \subseteq V\). In our construction, we know that \(\{1, \alpha, \beta, \gamma, \delta, \ldots \} \subseteq N\). This combined with the axiom of induction tells us that \(\{1, \alpha, \beta, \gamma, \delta, \ldots \} = \mathbb{Z}^{+}\). Notice how this definition doesn’t rely on addition at all. All we say is that each number transforms into some unique number. Intuitively, this is what we do when we recite \(1\), \(1+1=2\), \(2+1=3\), \(\ldots\)2 What if we had an adversary who secretly decided that \(S(x)=x+2\). Would we still end up with our set of counting numbers as \(\{1,2,3,\ldots\}\)?.

Interestingly, this setup of a successor also provides us with a way to describe infinity as a property without getting into the meta discussion of what it infinity *is*.

- A set \(W\) is said to be infinite if there exists a one-to-one function from \(W\) to a proper subset of \(W\). Since the co-domain of \(S(x) = x+1\) is \(\{2,3 \ldots\} \subset \mathbb{Z}^{+}\), this makes \(Z^{+}\) infinite.

Having defined constructed a set of numbers such that each number has a unique successor, we can define addition as a process of repeated applications of successions.

\(\forall\) \(a,b \in \mathbb{Z}^{+}\), we define a binary function, \(f: \mathbb{Z}^{+}\times \mathbb{Z}^{+} \mapsto \mathbb{Z}^{+}\) as follows:

- \(f(a,1) = S(a)\)
- \(f(a,S(b)) = S(f(a,b))\)

Adding \(1\) to a number is defined from the successor function. Say we do this for all \(a\). Now given any \(k > 1\), finding \(f(a,k)\) is essentially doing \(k-1\) compositions of the successor function starting from \(f(a,1)\) and seeing where we land. Clearly, this function is recursive.

The closure of addition over \(\mathbb{Z}^{+}\) makes \((\mathbb{Z}^{+},+)\) a magma.

Let us try to prove the two properties (associativity and commutativity) of \(f\). The trick for both is to repeatedly apply induction.

\[ \forall a,b,c \in \mathbb{Z}^{+} \text{ , } f(f(a,b), c) = f(a,f(b,c)) \] Let \(c=1\), we find that: \[ \begin{align*} f(f(a,b), 1) &= S(f(a,b)) \text{ By definition of addition} \\ &= f(a,S(b)) \\ &= f(a,f(b,1)) \end{align*} \] Some minor symbolic manipulation later we find that the property holds. Working this out for \(c=2\) and \(c=3\), we notice a pattern where the result for \(c=1\) is getting employed to prove for \(c=2\)3 An exercise for the reader is to prove this for \(c=2\), \(c=3\).

Now, we try to see if this pattern holds more generally. The last chain in the inductive proof is to assume that for some \(k\), \(f(f(a,b), k) = f(a,f(b,k))\) and then attempt to prove this for \(f(f(a,b), S(k)) = f(a,f(b,S(k)))\).

\[ \begin{align*} f(f(a,b), S(k)) &= S(f(f(a,b),k)) \\ &= S(f(a,f(b,k))) \\ &= f(a, S(f(b,k))) \\ &= f(a, f(b,S(k))) \text{ Repeated applications of definitions } \\ \end{align*} \] Now that we have a set of elements with an operator that is associative, the label this structure a semigroup.

\[ \forall a,b \in \mathbb{Z}^{+} \text{ , } f(a,b) = f(b,a) \] This is trivially true for \(a=1,b=1\). The proof has two parts. First, we prove for \(a=1\) and any \(b\). Then we prove for any \(a\) and \(b\).

Assume this holds for \(a=1\) and \(b=k\), i.e. \(f(1,k) = f(k,1)\), let us prove this will hold for \(b=S(k)\) \[ \begin{align*} f(1,S(k)) &= S(f(1,k)) \\ &= S(f(k,1)) \\ &= S(S(k)) \\ &= f(S(k),1) \end{align*} \]

Now, assume this is true for \(a=k\), i.e. forSo the physical process might involve starting at \(a\) and requiring what happens after you repetitively add \(b\) \(1\)s or it might involve starting at \(b\) and asking what happens when you repetitively add \(a\), both are equivalent mathematically. While this is fairly obvious for addition of positive integers, there are other non commutative operations, e.g. the operation of putting on your underpants first and then your pants versus pants first.