On Estimating the First Frequency Moment of Data Streams

Estimating the first moment of a data stream defined as $F_1 = \sum_{i \in \{1, 2, \ldots, n\}} \abs{f_i}$ to within $1 \pm \epsilon$-relative error with high probability is a basic and influential problem in data stream processing. A tight space bound of $O(\epsilon^{-2} \log (mM))$ is known from the work of [Kane-Nelson-Woodruff-SODA10]. However, all known algorithms for this problem require per-update stream processing time of $\Omega(\epsilon^{-2})$, with the only exception being the algorithm of [Ganguly-Cormode-RANDOM07] that requires per-update processing time of $O(\log^2(mM)(\log n))$ albeit with sub-optimal space $O(\epsilon^{-3}\log^2(mM))$. In this paper, we present an algorithm for estimating $F_1$ that achieves near-optimality in both space and update processing time. The space requirement is $O(\epsilon^{-2}(\log n + (\log \epsilon^{-1})\log(mM)))$ and the per-update processing time is $O( (\log n)\log (\epsilon^{-1}))$.