Theoretical Computer Science

Calculus of Constructions: compress expression to its smallest form

I'm aware that the Calculus of Constructions is strongly normalizing, meaning every expression has a normal for that cannot be beta,eta-reduced further. So in fact this is the most efficient expression that calculates the same value as the original expression.

But in certain cases, normalization may reduce a small expression to a huge expression (in terms of size).

Is there a smallest-form of expressions? A form that calculates the same value with the smallest size.

In other words, instead of a time-efficient normal-form, a space-efficient one.


There's a bit of freedom in what we considre "the same value". Let me show that there is no such algorithm if "the same value" means "observationally equivalent". I will use a fragment of the Calculus of constructions, namely Gdel's System T (simply typed $\lambda$-calculus, natural numbers, and primitive recursion on them), so the argument applies to a much weaker calculus already.

Given a number $n$, let $\overline{n}$ be the corresponding numeral representing it, i.e., $n$ applications of $\mathtt{succ}$ to $0$. Given a Turing mahcine $M$, let $\lceil M \rceil$ be the numeral encoding $M$ in some reasonable way.

Say that two closed terms $t, u : \mathtt{nat} \to \mathtt{nat}$ are equivalent, written $t \simeq u$, when for all $n \in \mathbb{N}$, $t \, \overline{n}$ and $s \, \overline{n}$ both normalize to the same numeral (they normalize to a numeral because we're in a strongly normalizing claculus).

Suppose we had an algorithm, which given any closed term of type $\mathtt{nat} \to \mathtt{nat}$ calculates a minimal equivalent term. Then we can solve the Halting oracle as follows.

There is a term $S : \mathtt{nat} \times \mathtt{nat} \to \mathtt{nat}$ such that, for all $n \in \mathbb{N}$ and all Turing machines $M$, $S (\lceil M \rceil, \overline{n})$ normalizes to $\overline{1}$ if $T$ halts within $n$ steps, and it normalizes to $\overline{0}$ otherwise. This is well known, since simulation of a Turing machine for a fixed number of steps $n$ is primitive recursive.

There are finitely many closed terms $Z_1, \ldots, Z_k$ which are minimal terms equivalent to $\lambda x : \mathtt{nat} .\, 0$. Our minimization algorithm returns one of them when we give it $\lambda x : \mathtt{nat} .\, 0$, and it may even be the case that $\lambda x : \mathtt{nat} .\, 0$ is in fact the only such minimal term. All this does not matter, the only thing that matters is that there are finitely many minimal terms that are equivalent to $\lambda x : \mathtt{nat} .\, 0$.

Now, given any machine $M$, consider the term $$u \mathbin{{:}{=}} \lambda x : \mathtt{nat} .\, S (\lceil M \rceil, x)$$ If $M$ runs forever then $u \overline{n}$ normalizes to $\overline{0}$ for every $n$ and is equivalent to $\lambda x : \mathtt{nat} .\, 0$. To decide whether $M$ runs forever, we feed $u$ into our minimzation algorithm and check whether the algorithm returned one of $Z_1, \ldots, Z_k$. If it did, then $M$ runs forever. If it did not, then it halts. (Note: the algorithm need not calculate the $Z_1, \ldots, Z_k$ by itself, these can be hard-coded into the algorithm.)

It would be nice to know an argument that works with a weaker notion of equivalence, for instance just $\beta$-reducibility.

Comments (4)

  • +0 – How do you compute Z1,..Zk? — Sep 18, 2017 at 12:12  
  • +0 – You don't have to. That is, the algorithm that I am describing is out there, and we don't know precisely what it is, but that is irrelevant. I am not actually trying to run the algorithm, I just need its existence to show that your algorithm doesn't exist. — Sep 18, 2017 at 12:14  
  • +0 – Yes, but your argument says that if my algorithm exists than we can solve the halting problem. To determine if a turing machine M halts your algorithm normalizes u and checks if it's one of Z1,..Zk. So it needs to be able to enumerate those, otherwise it may not halt. — Sep 18, 2017 at 12:29  
  • +0 – The $Z_1,\ldots,Z_k$ are hard-wired in the algorithm, the algorithm doesn't need to enumerate them in a computational sense. If you prefer, the source code of the algorithm would start with a declaration of an integer $k$ (e.g. a #define directive in C) and an array $Z$ of length $k$, and then the code may freely use $Z[i]$, whatever they are. We don't need to explictly know them, we just need to know that they exist. — Sep 18, 2017 at 13:08  

External Links

External links referenced by this document: