On pattern matching, computation and (geometric) dependent type theory
Abstract. Pattern matching is an ubiquitous tool in theoretical computer science and type-theoretical mathematics, as well as the formalization of formal languages themselves. We briefly review its role in these subjects, giving examples along the way. We then motivate a deeper connection between pattern matching and higher-dimensional geometry, and suggest how geometric type theory, while not a real type theory yet, tries to provide a general formal framework in which mathematics can be built from *higher-dimensional patterns.*
Basic concepts
Universal computation
Definitions by pattern matching are the secret sauce both for generating and for evaluating program code. The basic idea can be informally dissected into two parts as follows:
- Pattern generation. ‘Patterns’ (or, say, ‘members’) of some ‘type’ $T$ are generated in ways $w_i(…)$ (called ‘constructors’ of $T$), which may depend on members of other types $(…)$ (including $T$ itself).
- Pattern computation. Having defined members of $T$, we may define something dependent on them by case distinction matching against the constructors $w_i(…)$, in each case using $(…)$ for our desired definition.
Many definitions of datataypes are given in terms of constructors (making them what is also called inductive datatypes): e.g. natural numbers $\mathbb{N}$ (a number is either $0$ or the succesor $\mathsf{succ}(n)$ of an existing number), or strings (a string is either empty, or an existing string with one character added), or booleans (a boolean is either true
or false
). The $\Sigma$ type (say, in Lean) is another example of an inductive type. But importantly, even program code syntax often falls into that scheme: in the simplest case, one could have a program to be either empty or an existing program with an added line of valid instructions; but even in more complicated cases, e.g. terms of types (as programs) in a type theory, we can think of the inference rules of that type theory as the constructors for its syntax.
The above are examples of pattern generation (with our choice of constructors being the ‘generators’). An important characteristic of pattern generation is that, at each step, there are multiple ways to use existing patterns to create new ones.
In contrast, for the pattern computation part, we usually want exactly one matching case per pattern, yielding something akin to a functional mapping of types $T \to S$. For instance, we could define the deterministic evaluation of programme code in this way (and we will see an example of this later!). Importantly, however, pattern matching could equally be used non-deterministically with multiple matching cases for a given pattern.
As a deterministic example, consider the following: all primitive recursive functions $\mathbb{N}^k \to \mathbb{N}$ can be inductively constructed via pattern matching on the objects of $\mathbb{N}$ (with cases $0$ or $\mathsf{succ}(n)$) and on the object of $k$-fold products (with cases $(x_1, …, x_k)$). As a deterministic example of ‘program evaluation’, we will later implement a specific evaluation strategy for lambda calculus using pattern matching. In fact, other forms of deterministic Turing-complete computation (like partial recursive functions, or, Turing machine computation itself) may also be implemented via pattern matching. As a non-deterministic example, consider the usual notion of evaluation in lambda calculus, or, say, in SK calculus, which can be easily cast into a pattern matching form.
With pattern matching a fundamental concept in program generation and program computation, what is maybe surprising is the following: it appears that certain choices of patterns $T$, and certain choices of deterministic computation with them, are universal in the sense that any other kind of computation with patterns $S$ can be computationally reproduced in one such universal kind by translating $S$ into $T$ (in fact, it seems that ‘most’ non-trivial/non-predictable choices of computation turn out to be universal in this sense). (Note also, ‘deterministic or not’ doesn’t matter in the last sentence). However, this claim of universality is not a formal theorem at all: it’s a (guiding) intuition going under the name Church-Turing thesis. We will give one example of universal computation-by-pattern-matching here: that of lambda calculus.
(Computation as generation. As an aside, pattern computation itself ‘generates’ something: namely, a relation between its domain and codomain (in particular, if those are the same type $T$, and the process can be iterated!). In this way, pattern generation and pattern computation are not so different after all: as an example, the step-by-step evaluation of program generates an ‘evaluation sequence’ (or ‘computation history’). The difference here is of course, that the domain in which evaluation sequences live is pre-determined (namely, the domain of all possible sequences of program states), whereas adding constructors to our inductive types provide a way to extend our domain of discourse. But once we regard all inductive types as generated, i.e. their domains to be fixed, then building patterns with constructors is quite like non-deterministic computation!)
(Backwards search on constructors. In many cases, inductive types themselves are subtypes of a larger domain: for instance, valid terms in a type theory might include into the domain of raw grammar, which may include grammatically valid-looking but ‘type-theoretically invalid’ (i.e. non-inferrable) terms. Given a subtype $S \subset T$ with $S$ being inductive, determining whether $x : T$ is decidable if any $x$ arises from at most finitely many constructors: simply do a backward search along the constructor tree (assuming this search never loops). This makes type checking decidable in intensional type theory, but undecidable in extensional type theory, since the latter has constructors parametrized by a potentially infinite (or at least non-computable) set: namely, the set of proofs of identity.)
Universal logic
Working with a formal logical system is, in some way, not much different than non-deterministic computation: starting from ‘formula A’, through a sequence of (non-pre-determined, but pre-defined) manipulations, the mathematician tries to reach ‘formula B’. Crucially, the real ambition of logic actually reverses the order-of-action in the previous sentence. Namely, logic is meant to, in particular, provide a general-purpose, formal description of pattern matching! In particular, we’d expect to be able to be able to give formal definitions of inductive datatypes (e.g. natural numbers), and of functions on dataypes (e.g. models of computation), but then also study those functions formally (e.g. by having an representation of ‘functions of functions’), etc.
We should highlight that the ambition of being a formal framework in particular stands in contrast to the previous section, where the idea of pattern matching was used in a more intuitive sense. Nonetheless, and maybe confusingly to some extent, in the inception of any logical framework we must precisely rely on this latter version of pattern matching to define the basic datatypes of the framework. Exemplifying this, let us briefly, at least once, give an overview of the datatypes commonly in a logical system.
-
A datatype of statements, whose members are to be thought of as those datatypes internally (and thus formally!) represented in the logic (side note: I take “type” to be nothing more than an abbreviation for “datatype”, but some might object to this conflation, since “type” is maybe meant to be more abstract than “datatype” … here, everything is abstract :-) ).
-
A datatype of judgements, whose members are to be thought of functional relations from a collection of types (‘assumptions’) to another type or collection thereof (‘conclusions’).
-
The constructors of jugements (and thereby statements): inference rules which described how judgements (and thus statements) may be inferred from one another. Note, as remarked above, ‘computation with constructors’ itself generates datatype (of ‘inference sequences’ if you want).
A central hypothesis (analogous to our earlier claim of universality for certain models of computation) is the following: not only are there many logical frameworks which fulfil the above purpose of formalizing (or defining, or evaluating, …) notions computation, but that they, in fact, in an appropriate sense all equivalent in that they can internally model one another, and in this sense they are ‘universal’ logical frameworks.
That being said, there is now a subtlety that we didn’t care about in our earlier discussion of universal computation: the mentioned ‘internal models’ in question may be based on encodings that don’t resemble their intended meaning at all; and if we try to translate between systems in a way that ‘preserves meaning’ then differences will start to appear, and people may start to argue about which operations to make available and which not to make available! So ‘semantically’, logical framework may well differ substantially in that the ‘first-class internal operations’ in one framework may not be available in another framework.
Pattern matching for computation
After so much talk, let’s actually put pattern matching to use and define a model of Turing computation. We’ll do this for lambda calculus. I should remark that, say, SK calculus would have been even easier to implement (but often ‘even easier’ means that program code become ‘even harder’ to read).
Recollection on de Bruijn indices
The section title is a trap: if you don’t know de Bruijn indices, go and read this (or anything else written on the topic).
Lambda term evaluation by pattern-matching
Here it goes. First note, we use ‘introduce’ to mean the introduction of a new type or constructor, while we use ‘define’ to mean the abbreviation of a computation defined by pattern-matching with a new symbol.
- As in the linked examples, we introduce $\lN : \mathcal{U}_i$ using $0$ and $\mathsf{succ}$ as constructors. We’ll make our lives easier by writing $k$ for the $k$-fold application of $\mathsf{succ}$ to $0$; in particular, we write $\mathsf{succ}(k) = k + 1$.
- We introduce a new type $\mathsf{Term} : \mathcal{U}_i$ in our type universe, and use $\mathsf{var} : \lN \to \mathsf{Term}$ and $\lambda : \mathsf{Term} \to \mathsf{Term}$ as well as $\cdot : \mathsf{Term}, \mathsf{Term} \to \mathsf{Term}$ as its constructors (here, you can think of “$,$” as really being a product). The ‘$\cdot$’ constructor will be written in infix style: $(- \cdot -)$.
-
We now define $\beta = (\beta_1, \beta_2) : \mathsf{Term}, \lN \to \mathsf{Term}, \lN$ by pattern matching as follows:
- Case-1-$\lambda$: On $(\lambda(M), k)$ map to $(\lambda(\beta_1(M,0)), 0)$.
- Case-2-$\lambda$: On $(\lambda(M) \cdot N, k)$ map to $(M \cdot N, k + 1)$
- Case-1-$\cdot$: On $((M \cdot M’) \cdot N, 0)$ map to $(\beta_1(M \cdot M’,0) \cdot \beta_1(N,0),0)$
- Case-2-$\cdot$: On $((M \cdot M’) \cdot N, k)$ map to $(\beta_1(M \cdot N, k) \cdot \beta_1(M’ \cdot N, k), 0)$
- Case-1-$\mathsf{var}$: On $(\mathsf{var}(k) \cdot N, k)$ map to $(N,0)$
- Case-2-$\mathsf{var}$: On $(\mathsf{var}(l) \cdot N, k)$ map to $(\mathsf{var}(l),0)$
The $\beta$-reduction of a term $t \in \mathsf{Term}$ can now be computed by repeated applications of $\beta$ to $(t,0)$. Let’s give an example of this.
We want to evaluate the expression $(\lambda (\mathsf{var}(1) \cdot \mathsf{var}(1))) \cdot (\lambda (\mathsf{var}(1) \cdot \mathsf{var}(1)))$. First, let’s abbreviate things a little, and write $\mathsf{var}(k) \equiv k$. We thus want to compute repeated applications of $\beta$ to $(\lambda (1 \cdot 1) \cdot \lambda (1 \cdot 1),0)$ using the above rules.
- First, Case-2-$\lambda$ applies to yield $((1 \cdot 1) \cdot \lambda(1 \cdot 1),1)$
- Next, Case-2-$\cdot$ applies to yield $(\beta_1(1 \cdot \lambda(1 \cdot 1),1) \cdot \beta_1(1 \cdot \lambda(1 \cdot 1),1), 0)$. At this point we recursively compute $\beta(1 \cdot \lambda(1 \cdot 1),1)$ using Case-1-$\mathsf{var}$ to find $\beta_1(1 \cdot \lambda(1 \cdot 1),1) = \lambda(1 \cdot 1)$. Thus, $\beta((1 \cdot 1) \cdot \lambda(1 \cdot 1),1)$ equals $(\lambda (1 \cdot 1) \cdot \lambda (1 \cdot 1),0)$.
- We arrived at the term we started with! So evaluation will never stop: our program is caught in an infinite loop.
Modelling full computation histories
Here’s a small but important twist. Within GTT, because we can access the universe , we can also model ‘full computation histories’. Namely, we can consider the dependent product of the functor $(\lN,\leq) \to \mathcal{U}_{i+1}$ whose image is $\mathsf{Term} \times \lN \xto{\beta} \mathsf{Term} \times \lN \xto{\beta} …$ .
Exercise. How would you define the poset $(\lN,\leq)$ as a higher inductive type? Hint: consider 2-morphism constructors in $\mathcal{U}$.
Solution. Use one additional ‘higher’ constructors in addition to the usual constructors:
- $0 : \mathbf{1} \to \lN$
- $\mathsf{succ} : \lN \to \lN$
- $\mathsf{less} : \id_{\lN} \to \mathsf{succ}$
Primitive and partial recursive functions
There’s an internal way to get total computation in a type theory which allows definitions by pattern matching (and composition, and products, see above!). Namely, simply construct the natural number type $\lN$ (with constructors $0$ and $\mathsf{succ}$), and since definitions by pattern matching are allowed, for all primitive recursive function a corresponding 1-morphism on that type can be defined (note: this “for all” statement is meant to be a statement in the meta-theory, …but using non-truncated universes and an internally defined notion of primitive recursive functions one could be maybe make this into an internal statement as well!).
Suggested exercise. Verify that you get all primitive recursive functions can indeed be defined by pattern matching, by going through the cases here.
The difference between primitive and partial recursive functions can be captured by one additional mode of function construction: namely, the construction of functions via ‘minimization’. This, a priori, cannot be naturally written in pattern matching form. Indeed, it requires a definition of a min-searching function $\mathrm{min}(-,-)$ which takes a total function $g$ and an ‘in’ number $x$ as well as an ‘out’ number $y$, and matches $(g,x,y)$ as follows.
- on $(g,x,0)$ map to $x$
- on $(g,x,n)$ map to $\mathrm{min}(g,\mathsf{succ}(x),\mathsf{ev}(g,x))$
The find $\mathsf{argmin}_{x}(g(x) = 0)$ evaluate $\mathrm{min}(g,0,g(0))$ using this definition.
While that’s clearly is not a well-founded definition by pattern matching (and moreover considers “functions” as inputs to the indicated ‘evaluation’ function $\mathsf{ev}$), it is important to highlight that the underlying mechanism of minimization can of course very well be formulated using pattern matching. Indeed, we have just seen that computation in lambda calculus can be formulated using pattern matching, and lambda calculus is Turing universal and thus represents (like partial recursive functions) all computable functions.
(Moreover, the case of partial recursive functions is instructive to isolate intuitively some of the ingredients for Turing-complete computation: we need, (1), infinite patterns as inputs (in this case, $\lN$), a way to access programs as inputs and evaluate them (in this case, $\mathrm{min}$/$\mathsf{ev}$ above), and otherwise … free use of pattern matching!)
Geometric Type Theory
Overview
One aspect of our discussion of inductive types that has been swept under the rug: namely, their relation to higher-dimensional geometry (which in particularly brings of up questions about dimensionality of types and constructors). Certainly all of our above discussion must physically be taking place at least in dimension 1: for instance, program code could be understood as 1-dimensional arrays of characters (say, 0’s and 1’s, or lambda expression, etc.). Where do higher dimensions come into play?
Here’s one (rather informal, but intuitive) way to motivate how higher dimensions naturally come into play when considering computations with patterns: say, we start with patterns that live in $n$-dimensions (where $n$ could be 1); now, any change in those patterns, say, with continuous or discrete time parameter $t$, leads to an $(n+1)$-dimensional ‘patterns of patterns’ (or ‘sequence of patterns’) obtained by drawing the changing patterns though all times $t$. Thus: the manipulation of (or ‘computation’ with) patterns can be described by higher-dimensional patterns!
This basic insight, together with the earlier remarks of the fundamental role of pattern matching in logic, forms the starting point for a geometric point-of-view on mathematical foundations. Geometric type theory is a (so far incomplete) approach to the formalization of this point-of-view. Indeed, it turns out that many constructs in logic are closely related to geometric constructions.
-
Function types are precisely patterns of pattern manipulations (so one dimension up from the patterns they manipulate). In category-theoretic lingo, these types of higher patterns are hom types. Even dependent function types arise in this way, as long as we have access to type of all (small) types at our disposal (see the example section here). And this so far only make use of (directed) 1-hom types, but similarly $n$-hom types can be very useful.
-
Inductive types can also benefit from higher dimensions and, in fact, these have been studied under the name of higher inductive types (note that the depend sum type, as long as we have a type of types, are also inductive types but not really ‘higher ones’ in this sense, see the previously linked examples). However, previous usage of the term usually referred to the case of higher inductive types with invertible higher structure which doesn’t allow for the more general case of non-invertible higher structure. (A concrete example that we’ve seen above is the the higher inductive type $(\lN,\leq)$.)
So, to summarize these two points, we may say that a central goal of geometric higher type theory would be the following:
Make pattern matching and inductive types work in any dimension!
(As a side note: formally, geometric type theory is based on the theory of geometric higher categories, which provides a natural framework for unifying higher dimensional geometry and logic. We refer to here and the general web for material on the subject!)
Briefly: but why?
Let’s get one point out of the way first. Geometric type theory is (the idea of) yet another approach to mathematical foundations. But … if we believe that the exact form of mathematical foundations doesn’t matter, and many different ‘universal’ phrasings of mathematical foundations already exist, why bother with another one?
Actually, that’s a completely valid question. The answer I’d give is “mainly for aesthetical reasons”. The interplay of patterns from a geometric and logical perspective is just very nice (see also table below). That may sound stupid, but a lot of mathematics is actually guided by aesthetics, and not by practically. And, in fact, it might not be a bad idea to do so, since in several instances it turned out that aesthetically-incepted mathematics eventually turned out to be practically relevant, say, in fundamental physics. Indeed, our universe is often pretty aesthetic.
Logic vs geometry
So what’s aesthetic about (the idea of) geometric type theory? There have been various type-theoretic approaches to higher inductive types. What sets the GTT approach apart is that it attempts to leverage the manifold-diagrammatic approach to the coherence issues encountered in higher dimension. And, it turns out, this works naturally and well! For instance, there is a tight correspondence between diagrammatic isotopies and the coherence laws of higher functors, their transformations, their modifications, and so on (examples can be found here). In this way, the working mathematician need not worry about imposing coherences or other laws by hand. These, are always automatically guaranteed when working with diagrams (see the ‘discussion’ section here for a more in-depth discussion). The only thing the user should worry about is that canonicity is being preserved at all times: recall, canonicity means that all formable terms of a type can be understood, up to equivalence, in terms of the type’s definition by constructors. And indeed, some canonicity questions still need to be solved before a satisfying formalization of GTT becomes possible (see the previously linked discussion section loc.cit.).
In addition to generally being a natural and nice approach to the difficulties that usually come with higher dimensions, higher dimensions bring many pleasant features with them that often correspond to well-known concepts in traditional (1-dimensional) logic. The following table summarizes these correspondences.
Categorical logic | $\to$ | Geometric type theory | |
---|---|---|---|
Inductive types with constructors, with various ‘mechanics’ at play (like types, functions of types, equalities of functions) | $\to$ | Geometric objects (namely, strata in diagrams), all with the same mechanic, since manifold $n$-diagrams work uniformly in all dimensions $n$ | |
Syntactic rules: composition and substitution | $\to$ | Geometric operations: gluing and blow-up | |
Rules of contexts: e.g. rules like weakening, contraction, exchange | $\to$ | Geometry of embedded diagrams, a.k.a. ‘stabilization’ which embeds diagrams in higher codimensions (in categorical terms: cartesian monoidality) | |
Universe type truncation: in a theory without internal support for (non-invertible) higher structure universe types are truncated | $\to$ | In the presence of all higher-dimensional structure, no truncation of universe type needed (see here for further discussion of ‘truncation and universes’) | |
Reflection rules for universes and types, which allow to pass from the ‘inside’ of the type, to the ‘inside’ of that type’s type | $\to$ | Something akin to ‘boundary theories’ but the geometric intuition is a bit vague on this (important!) point | |
Function types and dependent function types, including dependent function types | $\to$ | Higher hom types, i.e. geometric objects with specified boundary | |
(Various other logical operations) | $\to$ | (Most of them turn out to be either inductive types or variations of function types) | |
$\vdots$ | $\vdots$ |
Open issues
To make this clear again: GTT hasn’t been formally specified.
Two main issues pertain to our understanding of isotopies (i.e. the laws of higher transfors) and singularities (i.e. the laws of higher adjoint equivalences). Without figuring out these issues, it seems difficult to decidably guarantee canonicity for the mathematician working in GTT. Independt of GTT, I think they are very much interesting problems in their own right though!
Conclusion
In this post we touched on three main points:
- Pattern matching being important for everything from computability theory to setting up mathematical foundations.
- How to work with pattern matching, e.g. for defining a machine running lambda calculus.
- Geometric type theory trying to be bare framework for “higher dimensional pattern matching” and not much more.