Norm, Point, and Distance Estimation Over Multiple Signals Using Max-Stable Distributions

Consider a set of signals fs : {1, ..., N} → [0, ..., M] appearing as a stream of tuples (i, fs (i)) in arbitrary order of i and s. We would like to devise one pass approximate algorithms for estimating various functionals on the dominant signal fmax, defined as fmax = {(i, maxs fs (i)), ∀i}. For example, the "worst case influence" which is the F1-norm of the dominant signal (Cormode and Muthukrishnan, 2003), general Fp-norms, and special types of distances between dominant signals. The only known previous work in this setting are the algorithms of Cormode and Muthukrishnan and Pavan and Tirtha-pura (2005) which can only estimate the F1-norm over fmax-No previous work addressed more general norms or distance estimation. In this work, we use a novel sketch, based on the properties of max-stable distributions, for these more general problems. The max-stable sketch is a significant improvement over previous alternatives in terms of simplicity of implementation, space requirements, and insertion cost, while providing similar approximation guarantees. To assert our statements, we also conduct an experimental evaluation using real datasets.

[1]  Srikanta Tirthapura,et al.  Range Efficient Computation of F0 over Massive Data Streams , 2005, ICDE.

[2]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[3]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[4]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[5]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[6]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[7]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[8]  Srikanta Tirthapura,et al.  Estimating simple functions on the union of data streams , 2001, SPAA '01.

[9]  Murad S. Taqqu,et al.  Extremal stochastic integrals: a parallel between max-stable processes and α-stable processes , 2005 .

[10]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[11]  Andreas Buja,et al.  Visualization Methodology for Multidimensional Scaling , 2002, J. Classif..

[12]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[13]  Srikanta Tirthapura,et al.  Range-efficient computation of F/sub 0/ over massive data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Graham Cormode,et al.  Estimating Dominance Norms of Multiple Data Streams , 2003, ESA.

[15]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[16]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[17]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[18]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[19]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[20]  竹中 茂夫 G.Samorodnitsky,M.S.Taqqu:Stable non-Gaussian Random Processes--Stochastic Models with Infinite Variance , 1996 .

[21]  Divesh Srivastava,et al.  Effective computation of biased quantiles over data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Rajeev Rastogi,et al.  Processing set expressions over continuous update streams , 2003, SIGMOD '03.

[23]  System Sciences , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.