 Theoretical Computer Science

# Young Diagrams and distinguishing between two distributions

Introduction:

The reference for everything is this paper.

The RobinsonSchenstedKnuth (RSK) algorithm is a well-known combinatorial algorithm with diverse applications throughout mathematics, computer science, and physics. Given a word $$w$$ with $$n$$ letters from the alphabet $$[d]$$, it outputs two semistandard Young tableaus $$(\text{P}, \text{Q}) = \text{RSK}(w)$$ with common shape given by some Young diagram $$\lambda \in \mathbb{N}^{d}$$ $$(\lambda_1 \geq \cdots \geq _d)$$. We write $$= \text{shRSK}(w).$$

Given a probability distribution $$\alpha = (\alpha_1,...,\alpha_d)$$ on alphabet $$[d]$$, an $$n$$-letter $$\alpha$$-random word $$w = (w_1, \cdots, w_n)$$, written as $$w \sim \alpha^{\otimes n}$$, is a random word in which each letter $$w_i$$ is independently drawn from $$[d]$$ according to $$\alpha$$. The SchurWeyl distribution $$\text{SW}^{n}(\alpha)$$ is the distribution on Young diagrams given by $$\lambda = \text{shRSK}(w)$$.

Let the length of the $$i^{\text{th}}$$ row of a Young diagram be $$\lambda_i$$, for $$i \in [d]$$. From the paper referenced (Theorem $$1.4$$):

For $$k [d]$$, set $$_k = \text{min}\{1, _k d\}$$. Then, $$_kn2\sqrt{_k n} \leq \underset{{\lambda \sim \text{SW}^{n}(\alpha)}}{\mathbb{E}}[\lambda_k]\leq _kn +2\sqrt{_k n} .$$

From Proposition $$4.8$$,

Let $$\alpha \in [d]$$ be a probability distribution. Then for any $$k \in [d]$$, $$\underset{{\lambda \sim \text{SW}^{n}(\alpha)}}{\text{Var}} \left[_k\right] \leq16n.$$

Note that $$n$$ is the total length of all the rows of the Young diagram. That is $$\sum_{i = 1}^{d} \lambda_i = n.$$

If $$n$$ is much smaller than $$d$$, many of the rows may have a length of $$0$$.

The problem:

Let two probability distributions, on alphabet $$[2^{m}]$$, be $$\alpha$$ and $$\beta$$. Both of them are efficiently samplable.

It is known that

$$\alpha_i = \frac{1}{2^{m}},$$

for every $$i \in [2^{m}]$$, and

$$\beta_i = \frac{1}{2^{m/2}},$$ for every $$i \in [2^{m/2}]$$ and $$0$$ otherwise.

Input: For an $$n = \text{poly}(m)$$, we are either given a Young diagram $$\lambda$$ sampled from $$\text{SW}^{n}(\alpha)$$ or a Young diagram sampled from $$\text{SW}^{n}(\beta)$$.

Task: We want to distinguish between the two cases (in other words, tell which distribution the Young diagram came from.)

Is there a polynomial-time uniform classical distinguishing algorithm that distinguishes between these two cases?

My intuition is that there should exist no such distinguisher. Since $$n = \text{poly}(m)$$, the useful "signal" we get after looking at the length of each row of the Young diagram is inverse exponentially small --- whereas the "noise" is inverse polynomially large. Thus, the two cases would look exactly similar to any distinguisher.

Is there a statistical or a computational argument that makes this formal? ## Solution

Even relaxing the "computationally efficient" requirement, it is information-theoretically impossible. We will use the following "folklore" fact, which can be viewed as a consequence of the birthday paradox:

Fact. Distinguishing with probability say $$2/3$$ between (1) the uniform distribution on $$[d]$$ and (2) a distribution on a subset of half the domain $$[d]$$, chosen uniformly at random, requires $$n=\Omega(\sqrt{d})$$ samples.

Let's call this problem $$(\star)$$. In our case, $$d=2^m$$, but note that the problem we want to solve is not an instance of $$(\star)$$: $$\alpha$$ is indeed the uniform distribution on $$d$$, but $$\beta$$ is a uniform distribution on a very specific, known subset of $$d/2$$ elements, and so the fact stating hardness of $$(\star)$$ does not apply directly (it's actually very easy to distinguish between the two with $$O(1)$$ samples!). But we also don't directly have access to the sequence of $$n$$ samples $$w$$, we only have access to them through $$\lambda := \mathrm{shRSK}(w)$$. So, well, it does apply. Let's see why.

From Definition 1.3 of [O'DonnellWright'16]:

The SchurWeyl distribution $$\mathrm{SW}^n(\alpha)$$ is the distribution on Young diagrams given by $$\lambda := \mathrm{shRSK}(w)$$ [when $$w\sim \alpha^{\otimes n}$$]. Although it is not obvious, it is a fact that the distribution $$\mathrm{SW}^n(\alpha)$$ does not depend on the ordering of $$\alpha$$'s components.

What this means (if I am reading this correctly) is that $$\mathrm{SW}^n(\alpha)$$ only depends on the multiset of frequencies of $$\alpha$$, not on the labeling of the domain. In particular, all "uniform distributions on a subset of half the domain" lead to the same distribution $$\mathrm{SW}^n$$ over $$\lambda$$.

Which is exactly what we need to conclude. Suppose by contradiction we had an algorithm (even inefficient!) which distinguishes $$\mathrm{SW}^n(\alpha)$$ from $$\mathrm{SW}^n(\beta)$$ with probability at least $$2/3$$, given $$\lambda$$ from either of the two distributions (where $$n = o(2^{m/2})$$, i.e., $$n=o(\sqrt{d})$$). Then we could use it to solve $$(\star)$$ as follows: given $$n$$ samples (i.e., a word $$w$$) from an instance of $$(\star)$$, we compute $$\lambda = \mathrm{shRSK}(w)$$, and feed it into our purported algorithm. But the $$\lambda$$ is then either from $$\mathrm{SW}^n(\alpha)$$ or $$\mathrm{SW}^n(\beta)$$, so the algorithm can distinguish. This allows us to solve $$(\star)$$ with way too few samples, leading to a contradiction.