Theoretical Computer Science
fl.formal-languages automata-theory lower-bounds context-free
Updated Tue, 07 Jun 2022 05:27:50 GMT

# Maximum shortest word accepted by pushdown automata

Given a fixed alphabet, consider all deterministic pushdown automata with $$n$$ states that accept a nonempty language. What is the maximum length of the shortest word accepted by a deterministic pushdown automaton with $$n$$ states (holding alphabet size constant)?

I found an example where the shortest word is $$\Omega(n^2)$$ and suspect that this bound is tight, but have been unable to prove it. Everything I found online talking about shortest words is talking about finite or two-way automata only, not pushdown automata.

As an example, choose two large prime numbers $$p$$ and $$q$$, and two input symbols $$a$$ and $$b$$. Create an automaton with a cycle of length $$p$$ that reads an $$a$$ and pushes onto the stack, with a transition to a cycle of length $$q$$ that reads a $$b$$ and pops from the stack.

By placing the initial and accept states at appropriate places on the first and second cycle, you force the automaton to go through the first cycle $$q-1$$ times and the second cycle $$p-1$$ times, so that the maximum stack length is the same modulo $$p$$ and $$q$$, and thus the shortest word has length $$\Omega(pq)$$. Since the automaton has $$p+q$$ states, this means the shortest word is $$\Omega(n^2)$$.

## Solution

Counter Automata

I was a co-author for a paper where we investigated this problem for counter automata. We were able to show that the length of a shortest string accepted by an $$n$$-state (non-empty) counter automaton is at most $$\Theta(n^2)$$. See here: https://lmcs.episciences.org/5251

The lower bound can be obtained similar to how you described in your question with cycles of length $$p$$ and $$q$$ (or any two relatively prime numbers).

Pushdown Automata

Upper Bound: We can get an upper bound using standard techniques. The length of a shortest string accepted by an $$n$$-state (non-empty) pushdown automaton is at most $$2^{O(n^2)}$$.

Essentially, we argue that if the pushdown automaton's language is non-empty, then there exists some string that it accepts where the stack height is at most $$O(n^2)$$. Therefore, there are at most $$n \cdot 2^{O(n^2)}$$ (which is still $$2^{O(n^2)}$$) possible configurations so there must be an accepted string of length at most $$2^{O(n^2)}$$.

Lower Bound: For an exponential lower bound, see Jeffrey's answer above.

Also, see my answer to this related question: Shortest string in the intersection of a context-free language and a regular language

This related answer leads to a deterministic binary stack (non-empty) PDA with an exponential lower bound for the length of a shortest accepted string. Note that the construction relies on the fact that logspace bounded auxiliary pushdown automata can run for exponential time.

For example, such a machine could iterate through the numbers from $$0$$ to $$2^n - 1$$ in binary on the stack using only $$O(\log(n))$$ auxiliary space.

Update: A Tight Bound

Due to results from [1] (Theorems 3.19 and 4.22), it follows that there is a tight bound. That is, the length of a shortest string accepted by an $$n$$-state (non-empty) pushdown automaton is at most $$2^{\Theta\left(\frac{n^2}{\log(n)}\right)}$$. This assumes a restriction on the PDA's such that the stack alphabet is fixed and the stack pushes or pops only one symbol at a time.

After looking through the proofs of Theorems 3.19 and 4.22, as far as I can tell, this result should hold for both deterministic and non-deterministic PDA's.

Note: I find their proofs difficult to fully verify / reconstruct. Does anyone know of a simplified argument? If not, I would always be interested in looking through this further with others.

How To Apply Results From [1]

Rational Index: The rational index of a language $$L$$ is a function $$r$$ such that for every $$n$$, $$r(n)$$ is the maximum length of a shortest string in $$L \cap L(A)$$ over all $$n$$-state non-deterministic finite automata $$A$$. In other words, $$r(n) := max_{A}\{ \; min_{x}\{ \; \vert x \vert \; : \; x \in L \cap L(A) \; \} \; \}$$ where $$A$$ is an $$n$$-state NFA and $$x$$ is a finite string. A definition for rational index can also be found in [2].

Lower Bound: By Theorem 3.19 from [1], we get an $$2^{\Omega\left(\frac{n^2}{\log(n)}\right)}$$ lower bound. This is because there is some fixed context-free language $$L$$ whose rational index is $$2^{\Omega\left(\frac{n^2}{\log(n)}\right)}$$.

Let me explain. Let $$P$$ denote a PDA that recognizes $$L$$. By the preceding, there is an infinite family $$\{ A_n \}_{n \in \mathbb{N}}$$ of finite automata such that for all $$n$$, $$A_n$$ has $$n$$ states and asymptotically a shortest string accepted by the Cartesian product of $$A_n$$ with $$P$$ has length $$2^{\Omega\left(\frac{n^2}{\log(n)}\right)}$$.

It looks to me that, each finite automaton $$A_n$$ from their construction is deterministic. Also, the PDA $$P$$ is deterministic with a fixed stack alphabet that only pushes or pops one symbol at a time. Therefore, the lower bound applies to deterministic PDA's with a fixed stack alphabet that only push or pop one symbol at a time.

Upper Bound: By Theorem 4.22 from [1], we get an $$2^{O\left(\frac{n^2}{\log(n)}\right)}$$ upper bound. This is because any given context-free language has rational index $$2^{O\left(\frac{n^2}{\log(n)}\right)}$$.

Let me explain. Given any $$n$$-state PDA $$P$$ over a fixed alphabet that only pushes or pops one symbol at a time, we can convert it into an associated $$O(n)$$-state visibly pushdown automaton $$P^{\prime}$$ over a larger alphabet that must read a push-$$c$$ symbol in order to push $$c$$ onto the stack and a pop-$$c$$ symbol in order to pop $$c$$ off of the stack for each stack symbol $$c$$. The PDA's $$P$$ and $$P^{\prime}$$ have shortest accepted strings of similar length.

We can now view $$P^{\prime}$$ as the Cartesian product of a fixed PDA and an $$O(n)$$-state finite automaton. The fixed PDA's language has rational index $$2^{O\left(\frac{n^2}{\log(n)}\right)}$$ meaning that a shortest string accepted by $$P^{\prime}$$ has length at most $$2^{O\left(\frac{n^2}{\log(n)}\right)}$$. Therefore, a shortest string accepted by $$P$$ has length at most $$2^{O\left(\frac{n^2}{\log(n)}\right)}$$.

References

[1] Pierre, Laurent, Rational indexes of generators of the cone of context-free languages, Theor. Comput. Sci. 95, No. 2, 279-305 (1992). ZBL0745.68068.

[2] Deleage, Jean-Luc; Pierre, Laurent, The rational index of the Dyck language (D_ 1^{*}), Theor. Comput. Sci. 47, 335-343 (1986). ZBL0632.68072.