Say we have a black-box access to a distribution $\mathcal{D}$ with finite support $\{1,2,...,n\}$ with probability mass function $i \mapsto p_i$. How many samples of $\mathcal{D}$ are needed to learn the mode of $\mathcal{D}$, i.e. the $j$ such that $p_j=\max\{p_i\}$, (with error probability $\delta$)? Is there any known lower bound? Any idea, comment, or reference will be very helpful.
Update: I proved a lower bound $\Omega(\frac{1}{\gamma}\log\frac{1}{\delta})$ by reducing the problem to some $n=2$ case, where $\gamma=\min_i\left(1-2\sqrt{\frac{pp_i}{(p+p_i)^2}}\right)$ with $p$ being the mode. But, I still want to reduce the gap between upper and lower bounds.
A followup question: Say we have a black-box access to a distribution $\mathcal{D}$ with finite support $A\cup B:=\{a_1,...,a_n\}\cup\{b_1,...,b_m\}$ with probability mass function $a_i \mapsto p_i$ and $b_i \mapsto q_i$. How many samples of $\mathcal{D}$ are required to make sure the most frequent sample is in $A$ (with error probability $\delta$)? For sanity, it is guaranteed that $A$ contains the mode of $\mathcal{D}$.
From the DKW inequality, it follows one can learn an arbitrary distribution over $\mathbb{R}$ (and in particular over $\{1,2,\dots,n\}$ to Kolmogorov distance* $\varepsilon$ with probability at least $1-\delta$ using $$ O\!\left(\frac{1}{\varepsilon^2}\log\frac{1}{\delta}\right)$$ samples; which is optimal in view of the complexity of learning even a single Bernoulli (biased coin): embed the biased coin (with bias either $1/2+\varepsilon$ or $1/2-\varepsilon$) in the middle of the domain, where all other elements have mass 0: figuring out which element is the mode corresponds to figuring out the bias.
This implies the same upper (and lower) bounds for your question, where $\varepsilon$ is defined as the gap (assuming you have a priori knowledge on that gap; this is not quite apparent from your post. If not, doing something adaptive and slightly more involved might be required).
$*$ Kolmgorov distance is $\ell_\infty$ distance between CDFs. This guarantee is stronger (and implies) learning the pdf in $\ell_\infty$. For more, see. e.g., this short note of mine, specifically Section 4 (not new results, just exposition of "folklore" facts)
External links referenced by this document: