A short primer on type theory

Type theory is a theory in mathematics that uses terms and types instead of elements and sets to formalize mathematics, meaning that all of mathematics can be built upon type theory (therefore replacing set theory in this regards; we say that type theory can act as a foundation of mathematics.)

Operations in type theory are formalized using another very interesting theory known as Category Theory and have interesting connections to logic via the Curry-Howard isomorphism, but these are topics for another day. Let's now give an overview of the basics of the theory.

How does it work?

Type theory builds upon the untyped lambda calculus adding in types as an extra feature. Interestingly, this apparently simple addition makes the lambda calculus a lot more interesting and powerful. Given that, understanding basic lambda calculus is a prerequisite for understanding type theory. Don't worry, I will not go into much detail here though.

Untyped lambda calculus

Lambda calculus is a rewriting system for terms. This means that a term t is successively converted - according to some specified rules of conversion - to other terms t1, t2, t3, etc... until conversion is no longer possible using the given rules (we say that we've reached the normal form.) Note that the reduction may not always halt (see the Y-Combinator to learn about recursion in the lambda calculus.) However we will not consider this in depth since the introduction of types to the lambda calculus will cause all valid terms to reach a unique normal form. So throughout this document we will assume that all terms reach a unique normal form.

Terms

Now what exactly are terms, you may ask? A term is any syntactically correct expression. The idea is that numbers, booleans, functions, etc... are terms and certain combinations of such terms as also terms. Here are some examples of terms from common mathematics (not directly applicable to lambda calculus as you will see later:) a + 1, f(x, y), 3, f, etc...

Formally, we say that this is a recursive definition. These kinds of definitions are very common within computer science, so you may like to read more about them.

In general, we consider atomic terms which are things like numbers, booleans and such, and compound terms, including lambda abstractions and function applications.

Lambda abstractions are actually just functions. For those with previous experience in imperative programming, lambda abstractions are sometimes calles anonymous functions since they exist without needing to be named in the sense that they are different from function-like procedures in imperative programming languages. If you don't have any exposure to imperative programming, then you don't have to worry about that. Note that in the lambda calculus, functions are considered values in and of themselves and can be manipulated like any other term.

As a notation, we write abstractions as such:

λ input . output

For instance, a function that increments a number may be written as such λ n . n + 1, and the identity function which returns its input as is can be written as λ x . x.

Functions can be applied by simply writing their arguments to their right therefore producing an application term. For instance: (λ x . x + 1) 2 is an application term that will reduce to 3.

Notice, that if f is a function, then f a is an application term. This is written in a different style that is usual in mathematics and programming, where it would be written as f(a).

Reduction rules

The reduction rules of lambda calculus are really simple. If your term is an abstraction or atomic, then it's normal and you can't do any reductions to it. If instead your term is an application, then you must do β-reduction (pronounced 'beta reduction') to it. This simply means that you take your argument (input) and plug it in every position it should be in within the body of the lambda expression.

Atomic --> Done
Abstraction --> Done
Application --> β-reduction

This (λ x . x + 1) 2 will reduce - via β-reduction - to 2 + 1, so you untangle the lambda reveiling the body where every occurence of the argument variable (known as a bound variable) is replaced by its value.

There are two traditions to beta reduction:

Call-by-value: you evaluate (reduce) the argument to the function and then you apply beta reduction.
Call-by-name: you do beta reduction without reducing the argument, then you reduce what you get. This leads to lazy evaluation which has many interesting properties.

Example - encoding of boolean algebra:

True  = (λ x (λ y . x))
False = (λ x (λ y . y))
And   = (λ b1 (λ b2 ((b1 b2) False)))
(and False False) reduces to (λ x (λ y . y)) which is False
(and False True ) reduces to False
(and True  False) reduces to False
(and True  True ) reduces to True

Type theory

In type theory, each term has a type, and that is marked by the following judgement:

x : A

read as "x is of type A." Notice that this is an intrinsic property of x and not an extrinsic property (elements belonging to sets.) Although it might look syntactically nearly identical, it does imply a different, constructive view of mathematics. One never considers a term without its type. This is a very subtle difference and it may take some time for one to really see it.

Given this view, we define some common operations as primitives on types that allow us to construct new types out of other types.

Product

The product of two types is very similar to the cartesian product of two sets in set theory and it is defined as follows:

If x : A and y : B, then the pair (x, y) : A x B where A x B (generally read as 'A times B') is the product type of A with B and (x, y) is a single term made up of two sub-terms.

This means that given two terms of types A and B, you can always construct a term of type A x B. Reciprocally, given a term of type A x B, you can always produce a term of type A and a term of type B. This is done using the two projections: π¹ which extracts the first element and π² which extracts the second element of a pair. Note: this π has nothing to do with the number 3.14159... and it means projection.

For (x, y) : A x B we have

π¹ (x, y) = x : A
π² (x, y) = y : B

Functions

As can be seen in the part on the untyped lambda calculus, functions are also terms. A function with arguments of type A and output of type B is of type A -> B (may be read as 'A to B'.) As such, we have the following:

(λ (x : A) . (y : B)) : A -> B

There is a way to get a term of type B out of terms of type A and of type A -> B. And that's done by function application.

For x : A
    f : A -> B
we always have
    f x : B

As you can see, unlike the product, it is necessary to have two terms of the correct types to get a term of type B.

Sum

The sum type (also known as the co-product, a nomenclature reminiscent of the fact that the sum type is the dual of the product type in category theory) is the type of terms that belong to either one of two types. This may sound confusing as stated. The sum type is often called the disjoint union, or better yet, the tagged union of the types. Again, this is reminiscent of set theory. Basically, if x : A and y : B then x (tagged with its origin A) is of type A + B (the sum type of A and B) and so is y (tagged with its origin B.)

What exactly is meant by tags, you may ask? Well, this is an example of how it's done in Haskell.

data Either a b = Left a | Right b

In Haskell, Either a b is how you write A + B. You may not know Haskell, so here's a simple explanation of what's happening. We are defining the data type (for our intents, simply a type) Either a b. Elements of this type look like Left x where x : a or (the pipe |) Right y where y : b. Here Left and Right are simply wrappers around the value. They could have been named Alice and Bob or Eeny and Meeny, or just anything really, but Left and Right express the fact that A is written on the left side of Either and B on its right, so as not to get confused writing the type. (although A + B and B + A are technically isomorphic, they're not identical so it's essential to tell a programming language whether it's A + B or B + A.)

Left and Right are the tags in 'tagged union.' You can now also see why it's called a disjoint union: A and B stay seperate using the tags.

The sum type is a very interesting and surprisingly very useful type that has only recently got significant attention. Notice that there is no way to extract, from an arbitrary t : A + B, a value x : A or y : B, but case enumerations allow for extractions of terms of types A or B depending on the value of t. In Haskell, this is done with pattern matching as follows:

extractA           :: Either a b -> Maybe a
extractA (Left x)  = Just x
extractA (Right y) = Nothing

extractB           :: Either a b -> Maybe b
extractB (Left x)  = Nothing
extractB (Right y) = Just y

Notice the use of the Maybe type here. This is because extractions can fail. The Maybe type is a wrapper around a type a, such that it can allow for no values to be returned meaning failure (Nothing).

data Maybe a = Just a | Nothing

As mentioned in Bartosz Milewski's article on that, the Maybe type is actually a sum type itself. Maybe A = A + Unit where Unit is the type of exactly one term, which is why Maybe is sometimes known as +1. But we will not talk about that today.

An interesting thing to notice is that it takes 1 term of type A x B to get a term of type A and term of type B, 2 terms of types A and A -> B are necessary to get a term of type B, and no number of terms of type A + B is alone enough to get any elements of A or elements of B (meaning that you are not free to choose to retrieve an A or a B at will.)

This is all there is to say in this post. Keep on looking for more posts, because very interesting posts will follow!