- machine-learning pr.probability lg.learning epsilon-nets
- Updated Tue, 21 Jun 2022 03:49:31 GMT

Let $M$ be *finite* set with $n$ distinct elements. I want to probalistically approximate the relative counts $\frac{|P(Q)|}{|M|}$ of $Q \subseteq M$, where $P(Q) = |P \cap M|$.

An upper-bound for the number of samples need to get an (additive) $\varepsilon$-approximation can be derrived using the Hoeffding bound.

I am interested in achieving (empirical) better bounds using *empirical Rademacher averages*. My idea is to adaptability sample from $M$, to fulfill
\begin{equation}
\sup_{f \in \mathcal{F}}\,| L_{\mathcal{D}}(f) - L_{S}(f) | \leq 2\cdot\mathcal{R}_{\mathcal{F}}(S) + 3 \sqrt{\frac{ \log (2/\delta)}{2m}} \leq \epsilon,
\end{equation}
where
\begin{equation*}
\mathcal{F} = \{ \mathbf{1}_{Q} \mid Q \subseteq M \},
\end{equation*}
and $\mathbf{1}_{Q}$ equals $1$ if the current sample is in $Q$, and otherwise $0$.
It follows that
\begin{equation*}
L_{\mathcal{D}}(\mathbf{1}_{Q}) = \frac{|P(Q)|}{|M|}.
\end{equation*}

Can one achieve better bounds using this approaches, e.g., by using Massart's Lemma to approximate $\mathcal{R}_{\mathcal{F}}(S)$?

The answer depends on whether you want a bound for a single, *fixed* $Q\subseteq M$ or *uniformly* over all such $Q$. For a single, fixed $Q$, the Chernoff-Hoeffding bound is essentially the best you can do (see also empirical Bernstein's inequality for "fast rates").

If you want the bound to hold simultaneously for all $Q\subseteq M$, you need apply the union bound, which amounts to multiplying the deviation probability by $2^{|M|}$ (or the sample complexity by $|M|\log 2$).

You have nothing to gain here by using Rademacher complexities. The latter are useful at capturing the combinatorial properties of VC classes, and are sharper than naive union bounds (though Massart's lemma of course has a union bound inside). In your case, the VC structure is very simple -- it's all subsets -- and so you might as well just use the simple union bound.