Theoretical Computer Science
pr.probability vc-dimension high-dimensional-geometry epsilon-nets
Updated Sat, 25 Jun 2022 20:36:39 GMT

# How to find the size of an ϵ-net of a vector space?

Taken from paper "A Universal Law of Robustness via isoperimetry" by Bubeck and Sellke.

Theorem 3. Let $$\mathcal{F}$$ be a class of functions from $$\mathbb{R}^{d} \rightarrow \mathbb{R}$$ and let $$\left(x_{i}, y_{i}\right)_{i=1}^{n}$$ be i.i.d. input-output pairs in $$\mathbb{R}^{d} \times[-1,1]$$. Fix $$\epsilon, \delta \in(0,1)$$.

Assume that:

1. The function class can be written as $$\mathcal{F}=\left\{f_{\boldsymbol{w}}, \boldsymbol{w} \in \mathcal{W}\right\}$$ with $$\mathcal{W} \subset \mathbb{R}^{p}$$, $$\operatorname{diam}(\mathcal{W}) \leq W$$ and for any $$\boldsymbol{w}_{1}, \boldsymbol{w}_{2} \in \mathcal{W}$$, $$\left\|f_{\boldsymbol{w}_{1}}-f_{\boldsymbol{w}_{2}}\right\|_{\infty} \leq J\left\|\boldsymbol{w}_{1}-\boldsymbol{w}_{2}\right\|$$

2. The distribution $$\mu$$ of the covariates $$x_{i}$$ can be written as $$\mu=\sum_{\ell=1}^{k} \alpha_{\ell} \mu_{\ell}$$, where each $$\mu_{\ell}$$ satisfies c-isoperimetry.

3. The expected conditional variance of the output is strictly positive, denoted $$\sigma^{2}:=\mathbb{E}^{\mu}[\operatorname{Var}[y \mid x]]>0$$.\newline

Then, with probability at least $$1-\delta$$ with respect to the sampling of the data, one has simultaneously for all $$f \in \mathcal{F}$$ : $$\frac{1}{n} \sum_{i=1}^{n}\left(f\left(x_{i}\right)-y_{i}\right)^{2} \leq \sigma^{2}-\epsilon \Rightarrow \operatorname{Lip}(f) \geq \frac{\epsilon}{2^{9} \sqrt{c}} \sqrt{\frac{n d}{p \log \left(60 W J \epsilon^{-1}\right)+\log (4 / \delta)}} .$$

Proof. Define $$\mathcal{W}_{L} \subseteq \mathcal{W}$$ by $$\mathcal{W}_{L}=\left\{\boldsymbol{w} \in \mathcal{W}: \operatorname{Lip}\left(f_{\boldsymbol{w}}\right) \leq L\right\}$$. Denote $$\mathcal{W}_{L, \epsilon}$$ for an $$\frac{\epsilon}{6 J}$$-net of $$\mathcal{W}_{L}$$. We have in particular $$\left|\mathcal{W}_{\epsilon}\right| \leq\left(60 W J \epsilon^{-1}\right)^{p}$$. We apply Theorem 2 to $$\mathcal{F}_{L, \epsilon}=\left\{f_{\boldsymbol{w}}, \boldsymbol{w} \in \mathcal{W}_{L, \epsilon}\right\}$$ :

This continues .........

I feel curious to know the size of a $$\epsilon$$-net Vector Space.

How $$\left|\mathcal{W}_{\epsilon}\right| \leq\left(60 W J \epsilon^{-1}\right)^{p}$$???

Edit:- Corollary 4.2.13 (Covering numbers of the Euclidean ball). The covering numbers of the unit Euclidean ball $$B_{2}^{n}$$ satisfy the following for any $$\varepsilon>0$$ : $$\left(\frac{1}{\varepsilon}\right)^{n} \leq \mathcal{N}\left(B_{2}^{n}, \varepsilon\right) \leq\left(\frac{2}{\varepsilon}+1\right)^{n} .$$ The same upper bound is true for the unit Euclidean sphere $$S^{n-1}$$. Proof The lower bound follows immediately from Proposition 4.2.12, since the volume in $$\mathbb{R}^{n}$$ scales as $$\left|\varepsilon B_{2}^{n}\right|=\varepsilon^{n}\left|B_{2}^{n}\right| .$$ The upper bound follows from Proposition 4.2.12, too: $$\mathcal{N}\left(B_{2}^{n}, \varepsilon\right) \leq \frac{\left|(1+\varepsilon / 2) B_{2}^{n}\right|}{\left|(\varepsilon / 2) B_{2}^{n}\right|}=\frac{(1+\varepsilon / 2)^{n}}{(\varepsilon / 2)^{n}}=\left(\frac{2}{\varepsilon}+1\right)^{n} .$$ The upper bound for the sphere can be proved in the same way. To simplify the bound a bit, note that in the non-trivial range $$\varepsilon \in(0,1]$$ we have $$\left(\frac{1}{\varepsilon}\right)^{n} \leq \mathcal{N}\left(B_{2}^{n}, \varepsilon\right) \leq\left(\frac{3}{\varepsilon}\right)^{n}.............(1)$$

By the inequality 1 I am able to reach that

$$\mathcal{N}\left(B_{2}^{n}, \varepsilon\right) \leq\left(\frac{18J}{\varepsilon}\right)^{p}.$$ This is not dependent on $$W$$ right. How can we scale it down then?

## Solution

Note that $$\mathcal{W}_{\epsilon}$$ is an epsilon-net in the parametrization space, which is just $$p$$-dimensional Euclidean space. (So there is no need to think about covering numbers in e.g. spaces of functions; there seems to be some confusion in the other discussion.) The result being used is here: https://mathoverflow.net/questions/356310/covering-number-of-l-2-ball-in-mathbbrd. The constant 60 is definitely a bit loose :).

To prove this covering number bound, the idea is to greedily choose a finite subset of points, so that the balls of radius $$\eta/2$$ around them are disjoint (where in this case, $$\eta=\frac{\epsilon}{6J}$$). By considering volumes, you can upper-bound the number of balls by $$(60WJ\epsilon^{-1})^p$$. Moreover, you can show that once there is no room to add another point, increasing the radius of these balls from $$\eta/2$$ to $$\eta$$ covers the whole region. So taking their centers as the set $$\mathcal{W}_{\epsilon}$$ does the job.

• +0 – $(1/\epsilon)^d \le N(\epsilon, B_2) \le (3/\epsilon)^d$. this eventually leads $(\frac{18J}{\epsilon})^p$ . .. how it is matching with the result of paper? :( — May 07, 2022 at 22:22
• +0 – It's an upper bound, so it is okay that 60 is larger than necessary. You do need the factor of $W$ inside the $p$-th power though. This is because it's an $\epsilon$ cover of a radius $W$ ball, while the bound you quoted from the link is for a radius $1$ ball. — May 07, 2022 at 22:28
• +0 – Diameter W means the set is contained inside a ball of radius W. So you can scale everything down by a factor of W and apply the result that you quoted. This scaling down also turns $\epsilon$ into $\epsilon/W$, so you end up with an extra $W$ factor inside the $p$-th power. — May 07, 2022 at 22:56
• +0 – Okay so this leads to $\frac{\epsilon}{6JW}$ and eventuall by that inequality $N(\epsilon,B_{2}) \leq (\frac{18WJ}{\epsilon})^p.$ that is great. But unable to find why $\frac{\epsilon}{W}$ is happening here. The radius of the Ball is $W$ and there are of small balls with $\frac{\epsilon}{6J}$ which can cover the Ball. why the scaling is eventually needed to determine the upper-bound — May 08, 2022 at 05:55