- ds.algorithms reference-request communication-complexity data-streams
- Updated Sat, 21 May 2022 20:43:35 GMT

Let $a_1, a_2,\dotsc, a_m$ be a sequence of integers where each $a_j \in \{1,2,\dotsc,n\}$. For $i \in \{1,2,\dotsc,n\}$, let $m_i = |\{j : a_j = i\}|$. The $k$th frequency moment is defined to be

$\displaystyle F_k = \sum_{i=1}^n m_i^k.$

In their well-known paper, The space complexity of approximating the frequency moments, Alon et al. give a streaming algorithm that approximates $F_k$ using roughly $O(n^{1-\frac{1}{k}}(\log n + \log m))$ space. They also use communication complexity techniques to obtain a lower bound of $\Omega(n^{1-\frac{5}{k}})$ for $k > 5$. For $k = 0,1,2$, they provide more or less matching upper and lower bounds.

Have there been improvements to these bounds since then, and has there been progress for $k = 3,4,5$?

There's been a fair bit of progress. On the specific problem of $F_k$, there is a matching upper and lower bound of $n^{1-2/k}$ for $k > 2$. The upper bounds come from this paper by Indyk and Woodruff (which appeared in STOC 2005) and the lower bounds are via the information complexity framework, due to Bar-Yossef et al and Chakrabarti et al.

- +3 – This is also relevant: arxiv.org/abs/1011.1263 — Dec 09, 2011 at 18:47
- +1 – Check the link @MCH sent, it makes the algorithm and analysis lean and mean. But maybe David's thesis will be useful for intuition and discussion, too: almaden.ibm.com/cs/people/dpwoodru/phdFinal.pdf — Dec 09, 2011 at 20:24

External links referenced by this document: