- pr.probability vc-dimension high-dimensional-geometry epsilon-nets
- Updated Sat, 25 Jun 2022 20:36:39 GMT

Taken from paper "A Universal Law of Robustness via isoperimetry" by Bubeck and Sellke.

**Theorem 3.** Let $\mathcal{F}$ be a class of functions from $\mathbb{R}^{d} \rightarrow \mathbb{R}$ and let $\left(x_{i}, y_{i}\right)_{i=1}^{n}$ be i.i.d. input-output pairs in $\mathbb{R}^{d} \times[-1,1]$. Fix $\epsilon, \delta \in(0,1)$.

Assume that:

The function class can be written as $\mathcal{F}=\left\{f_{\boldsymbol{w}}, \boldsymbol{w} \in \mathcal{W}\right\}$ with $\mathcal{W} \subset \mathbb{R}^{p}$, $\operatorname{diam}(\mathcal{W}) \leq W$ and for any $\boldsymbol{w}_{1}, \boldsymbol{w}_{2} \in \mathcal{W}$, $$ \left\|f_{\boldsymbol{w}_{1}}-f_{\boldsymbol{w}_{2}}\right\|_{\infty} \leq J\left\|\boldsymbol{w}_{1}-\boldsymbol{w}_{2}\right\| $$

The distribution $\mu$ of the covariates $x_{i}$ can be written as $\mu=\sum_{\ell=1}^{k} \alpha_{\ell} \mu_{\ell}$, where each $\mu_{\ell}$ satisfies c-isoperimetry.

The expected conditional variance of the output is strictly positive, denoted $\sigma^{2}:=\mathbb{E}^{\mu}[\operatorname{Var}[y \mid x]]>0$.\newline

Then, with probability at least $1-\delta$ with respect to the sampling of the data, one has simultaneously for all $f \in \mathcal{F}$ : $$ \frac{1}{n} \sum_{i=1}^{n}\left(f\left(x_{i}\right)-y_{i}\right)^{2} \leq \sigma^{2}-\epsilon \Rightarrow \operatorname{Lip}(f) \geq \frac{\epsilon}{2^{9} \sqrt{c}} \sqrt{\frac{n d}{p \log \left(60 W J \epsilon^{-1}\right)+\log (4 / \delta)}} . $$

**Proof.** Define $\mathcal{W}_{L} \subseteq \mathcal{W}$ by $\mathcal{W}_{L}=\left\{\boldsymbol{w} \in \mathcal{W}: \operatorname{Lip}\left(f_{\boldsymbol{w}}\right) \leq L\right\}$. Denote $\mathcal{W}_{L, \epsilon}$ for an $\frac{\epsilon}{6 J}$-net of $\mathcal{W}_{L}$. We have in particular $\left|\mathcal{W}_{\epsilon}\right| \leq\left(60 W J \epsilon^{-1}\right)^{p}$. We apply Theorem 2 to $\mathcal{F}_{L, \epsilon}=\left\{f_{\boldsymbol{w}}, \boldsymbol{w} \in \mathcal{W}_{L, \epsilon}\right\}$ :

This continues .........

I feel curious to know the size of a $\epsilon$-net Vector Space.

How $\left|\mathcal{W}_{\epsilon}\right| \leq\left(60 W J \epsilon^{-1}\right)^{p}$???

Edit:- Corollary 4.2.13 (Covering numbers of the Euclidean ball). The covering numbers of the unit Euclidean ball $B_{2}^{n}$ satisfy the following for any $\varepsilon>0$ : $$ \left(\frac{1}{\varepsilon}\right)^{n} \leq \mathcal{N}\left(B_{2}^{n}, \varepsilon\right) \leq\left(\frac{2}{\varepsilon}+1\right)^{n} . $$ The same upper bound is true for the unit Euclidean sphere $S^{n-1}$. Proof The lower bound follows immediately from Proposition 4.2.12, since the volume in $\mathbb{R}^{n}$ scales as $$ \left|\varepsilon B_{2}^{n}\right|=\varepsilon^{n}\left|B_{2}^{n}\right| . $$ The upper bound follows from Proposition 4.2.12, too: $$ \mathcal{N}\left(B_{2}^{n}, \varepsilon\right) \leq \frac{\left|(1+\varepsilon / 2) B_{2}^{n}\right|}{\left|(\varepsilon / 2) B_{2}^{n}\right|}=\frac{(1+\varepsilon / 2)^{n}}{(\varepsilon / 2)^{n}}=\left(\frac{2}{\varepsilon}+1\right)^{n} . $$ The upper bound for the sphere can be proved in the same way. To simplify the bound a bit, note that in the non-trivial range $\varepsilon \in(0,1]$ we have $$ \left(\frac{1}{\varepsilon}\right)^{n} \leq \mathcal{N}\left(B_{2}^{n}, \varepsilon\right) \leq\left(\frac{3}{\varepsilon}\right)^{n}.............(1) $$

By the inequality 1 I am able to reach that

$$\mathcal{N}\left(B_{2}^{n}, \varepsilon\right) \leq\left(\frac{18J}{\varepsilon}\right)^{p}.$$ This is not dependent on $W$ right. How can we scale it down then?

Note that $\mathcal{W}_{\epsilon}$ is an epsilon-net in the parametrization space, which is just $p$-dimensional Euclidean space. (So there is no need to think about covering numbers in e.g. spaces of functions; there seems to be some confusion in the other discussion.) The result being used is here: https://mathoverflow.net/questions/356310/covering-number-of-l-2-ball-in-mathbbrd. The constant 60 is definitely a bit loose :).

To prove this covering number bound, the idea is to greedily choose a finite subset of points, so that the balls of radius $\eta/2$ around them are disjoint (where in this case, $\eta=\frac{\epsilon}{6J}$). By considering volumes, you can upper-bound the number of balls by $(60WJ\epsilon^{-1})^p$. Moreover, you can show that once there is no room to add another point, increasing the radius of these balls from $\eta/2$ to $\eta$ covers the whole region. So taking their centers as the set $\mathcal{W}_{\epsilon}$ does the job.

- +0 – $(1/\epsilon)^d \le N(\epsilon, B_2) \le (3/\epsilon)^d$. this eventually leads $(\frac{18J}{\epsilon})^p$ . .. how it is matching with the result of paper? :( — May 07, 2022 at 22:22
- +0 – It's an upper bound, so it is okay that 60 is larger than necessary. You do need the factor of $W$ inside the $p$-th power though. This is because it's an $\epsilon$ cover of a radius $W$ ball, while the bound you quoted from the link is for a radius $1$ ball. — May 07, 2022 at 22:28
- +0 – Ok Mark got some insight . Some doubts are still in my head. So epsilon / 6j is the radius of 1 ball. Diam(w) < W is given, Does this imply that there are atmost W balls? That's why the W factor needed !! — May 07, 2022 at 22:48
- +0 – Diameter W means the set is contained inside a ball of radius W. So you can scale everything down by a factor of W and apply the result that you quoted. This scaling down also turns $\epsilon$ into $\epsilon/W$, so you end up with an extra $W$ factor inside the $p$-th power. — May 07, 2022 at 22:56
- +0 – Okay so this leads to $\frac{\epsilon}{6JW}$ and eventuall by that inequality $N(\epsilon,B_{2}) \leq (\frac{18WJ}{\epsilon})^p.$ that is great. But unable to find why $\frac{\epsilon}{W}$ is happening here. The radius of the Ball is $W$ and there are of small balls with $\frac{\epsilon}{6J}$ which can cover the Ball. why the scaling is eventually needed to determine the upper-bound — May 08, 2022 at 05:55

External links referenced by this document: