Theoretical Computer Science

Bounds on approximating frequency moments


Let $a_1, a_2,\dotsc, a_m$ be a sequence of integers where each $a_j \in \{1,2,\dotsc,n\}$. For $i \in \{1,2,\dotsc,n\}$, let $m_i = |\{j : a_j = i\}|$. The $k$th frequency moment is defined to be

$\displaystyle F_k = \sum_{i=1}^n m_i^k.$

In their well-known paper, The space complexity of approximating the frequency moments, Alon et al. give a streaming algorithm that approximates $F_k$ using roughly $O(n^{1-\frac{1}{k}}(\log n + \log m))$ space. They also use communication complexity techniques to obtain a lower bound of $\Omega(n^{1-\frac{5}{k}})$ for $k > 5$. For $k = 0,1,2$, they provide more or less matching upper and lower bounds.

Have there been improvements to these bounds since then, and has there been progress for $k = 3,4,5$?




Solution

There's been a fair bit of progress. On the specific problem of $F_k$, there is a matching upper and lower bound of $n^{1-2/k}$ for $k > 2$. The upper bounds come from this paper by Indyk and Woodruff (which appeared in STOC 2005) and the lower bounds are via the information complexity framework, due to Bar-Yossef et al and Chakrabarti et al.





Comments (2)