Theoretical Computer Science
ds.algorithms lg.learning partial-order search-problem order-theory
Updated Tue, 07 Jun 2022 05:43:20 GMT

Worst number of questions needed to learn a monotonic predicate over a poset


Consider $(X, \leq)$ a finite poset over $n$ items, and $P$ an unknown monotonic predicate over $X$ (i.e., for any $x$, $y \in X$, if $P(x)$ and $x \leq y$ then $P(y)$). I can evaluate $P$ by providing one node $x \in X$ and finding out if $P(x)$ holds or not. My goal is to determine exactly the set of nodes $x \in X$ such that $P(x)$ holds, using as few evaluations of $P$ as possible. (I can choose my queries depending on the answer of all previous queries, I am not required to plan all queries in advance.)

A strategy $S$ over $(X, \leq)$ is a function which tells me, as a function of the queries that I ran so far and their answers, which node to query, and which ensures that on any predicate $P$, by following the strategy, I will reach a state in which I know the value of $P$ on all nodes. The running time $r(S, P)$ of $S$ on a predicate $P$ is the number of queries required to know the value of $P$ on all nodes. The worst running time of $S$ is $wr(S) = \max_P r(S, P)$. An optimal strategy $S'$ is such that $wr(S') = \min_S wr(S)$.

My question is the following: given as input the poset $(X, \leq)$, how can I determine the worst running time of the optimal strategies?

[It is clear that for an empty poset $n$ queries will be needed (we need to ask about each single node), and that for a total order around $\lceil \log_2 n \rceil$ queries will be needed (doing a binary search to find the frontier). A more general result is the following information-theoretic lower bound: the number of possible choices for the predicate $P$ is the number $N_X$ of antichains of $(X, \leq)$ (because there is a one-to-one mapping between monotonic predicates and antichains interpreted as the maximal elements of $P$), so, since each query gives us one bit of information, we will need at least $\lceil \log_2 N_X \rceil$ queries, subsuming the two previous cases. Is this bound tight, or are them some posets whose structure is such that learning can require asymptotically more queries than the number of antichains?]




Solution

In their paper Every Poset Has a Central Element, Linial and Saks show (Theorem 1) that the number of queries required to solve the ideal identification problem in a poset $X$ is at most $K_0 \log_2 i(X)$, where $K_0 = 1/(2 - \log_2(1 + \log_2 5))$ and $i(X)$ is the number of ideals of $X$. What they call an "ideal" is actually a lower set and there is an obvious one to one correspondance between monotonic predicates and the lower set of the points at which they don't hold, besides their "identification problem" is to identify by querying nodes just like in my setting, so I think they are dealing with the problem I'm interested in and that $i(X) = N_X$.

So, according to their result, the information-theoretic lower bound is tight up to a relatively small multiplicative constant. So this basically settles the question of the number of questions required, as a function of $N_X$ and up to a multiplicative constant: it is between $\log_2 N_X$ and $K_0 \log_2 N_X$.

Linial and Saks quote a personal communication by Shearer to say that there are known orders for which we can prove a lower bound of $K_1 \log_2 N_X$ for some $K_1$ which is just slightly less than $K_0$ (this is in the spirit of Yoshio Okamoto's answer who tried this approach for a smaller value of $K_1$).

This does not fully answer my question of computing the number of questions required from $X$, however, since computing $N_X$ from $X$ is #P-complete, I have a feeling that there is little hope. (Comments about this point are welcome.) Still, this result by Linial and Saks is enlightening.