Theoretical Computer Science
ds.algorithms pr.probability polynomial-time oracles
Updated Sat, 18 Jun 2022 15:47:06 GMT

Young Diagrams and distinguishing between two distributions


The reference for everything is this paper.

The RobinsonSchenstedKnuth (RSK) algorithm is a well-known combinatorial algorithm with diverse applications throughout mathematics, computer science, and physics. Given a word $w$ with $n$ letters from the alphabet $[d]$, it outputs two semistandard Young tableaus $(\text{P}, \text{Q}) = \text{RSK}(w)$ with common shape given by some Young diagram $\lambda \in \mathbb{N}^{d}$ $(\lambda_1 \geq \cdots \geq _d)$. We write $ = \text{shRSK}(w).$

Given a probability distribution $\alpha = (\alpha_1,...,\alpha_d)$ on alphabet $[d]$, an $n$-letter $\alpha$-random word $w = (w_1, \cdots, w_n)$, written as $w \sim \alpha^{\otimes n}$, is a random word in which each letter $w_i$ is independently drawn from $[d]$ according to $\alpha$. The SchurWeyl distribution $\text{SW}^{n}(\alpha)$ is the distribution on Young diagrams given by $\lambda = \text{shRSK}(w)$.

Let the length of the $i^{\text{th}}$ row of a Young diagram be $\lambda_i$, for $i \in [d]$. From the paper referenced (Theorem $1.4$):

For $k [d]$, set $_k = \text{min}\{1, _k d\}$. Then, $$_kn2\sqrt{_k n} \leq \underset{{\lambda \sim \text{SW}^{n}(\alpha)}}{\mathbb{E}}[\lambda_k]\leq _kn +2\sqrt{_k n} .$$

From Proposition $4.8$,

Let $\alpha \in [d]$ be a probability distribution. Then for any $k \in [d]$, $$\underset{{\lambda \sim \text{SW}^{n}(\alpha)}}{\text{Var}} \left[_k\right] \leq16n.$$

Note that $n$ is the total length of all the rows of the Young diagram. That is $$ \sum_{i = 1}^{d} \lambda_i = n.$$

If $n$ is much smaller than $d$, many of the rows may have a length of $0$.

The problem:

Let two probability distributions, on alphabet $[2^{m}]$, be $\alpha$ and $\beta$. Both of them are efficiently samplable.

It is known that

$$\alpha_i = \frac{1}{2^{m}},$$

for every $i \in [2^{m}]$, and

$$\beta_i = \frac{1}{2^{m/2}},$$ for every $i \in [2^{m/2}]$ and $0$ otherwise.

Input: For an $n = \text{poly}(m)$, we are either given a Young diagram $\lambda$ sampled from $\text{SW}^{n}(\alpha)$ or a Young diagram sampled from $\text{SW}^{n}(\beta)$.

Task: We want to distinguish between the two cases (in other words, tell which distribution the Young diagram came from.)

Is there a polynomial-time uniform classical distinguishing algorithm that distinguishes between these two cases?

My intuition is that there should exist no such distinguisher. Since $n = \text{poly}(m)$, the useful "signal" we get after looking at the length of each row of the Young diagram is inverse exponentially small --- whereas the "noise" is inverse polynomially large. Thus, the two cases would look exactly similar to any distinguisher.

Is there a statistical or a computational argument that makes this formal?


Even relaxing the "computationally efficient" requirement, it is information-theoretically impossible. We will use the following "folklore" fact, which can be viewed as a consequence of the birthday paradox:

Fact. Distinguishing with probability say $2/3$ between (1) the uniform distribution on $[d]$ and (2) a distribution on a subset of half the domain $[d]$, chosen uniformly at random, requires $n=\Omega(\sqrt{d})$ samples.

Let's call this problem $(\star)$. In our case, $d=2^m$, but note that the problem we want to solve is not an instance of $(\star)$: $\alpha$ is indeed the uniform distribution on $d$, but $\beta$ is a uniform distribution on a very specific, known subset of $d/2$ elements, and so the fact stating hardness of $(\star)$ does not apply directly (it's actually very easy to distinguish between the two with $O(1)$ samples!). But we also don't directly have access to the sequence of $n$ samples $w$, we only have access to them through $\lambda := \mathrm{shRSK}(w)$. So, well, it does apply. Let's see why.

From Definition 1.3 of [O'DonnellWright'16]:

The SchurWeyl distribution $\mathrm{SW}^n(\alpha)$ is the distribution on Young diagrams given by $\lambda := \mathrm{shRSK}(w)$ [when $w\sim \alpha^{\otimes n}$]. Although it is not obvious, it is a fact that the distribution $\mathrm{SW}^n(\alpha)$ does not depend on the ordering of $\alpha$'s components.

What this means (if I am reading this correctly) is that $\mathrm{SW}^n(\alpha)$ only depends on the multiset of frequencies of $\alpha$, not on the labeling of the domain. In particular, all "uniform distributions on a subset of half the domain" lead to the same distribution $\mathrm{SW}^n$ over $\lambda$.

Which is exactly what we need to conclude. Suppose by contradiction we had an algorithm (even inefficient!) which distinguishes $\mathrm{SW}^n(\alpha)$ from $\mathrm{SW}^n(\beta)$ with probability at least $2/3$, given $\lambda$ from either of the two distributions (where $n = o(2^{m/2})$, i.e., $n=o(\sqrt{d})$). Then we could use it to solve $(\star)$ as follows: given $n$ samples (i.e., a word $w$) from an instance of $(\star)$, we compute $\lambda = \mathrm{shRSK}(w)$, and feed it into our purported algorithm. But the $\lambda$ is then either from $\mathrm{SW}^n(\alpha)$ or $\mathrm{SW}^n(\beta)$, so the algorithm can distinguish. This allows us to solve $(\star)$ with way too few samples, leading to a contradiction.

Comments (1)

  • +0 – If I misunderstood something (the whole argument relies on my understanding of Definition 1.3 of [OW16]), please let me know. — Nov 09, 2021 at 20:36  

External Links

External links referenced by this document: