论文信息 - The query complexity of estimating weighted averages

The query complexity of estimating weighted averages

The query complexity of estimating the mean of some [0, 1] variables is understood. Inspired by some work by Carterette et al. on evaluating retrieval systems, and by Moffat and Zobel’s new proposal for such evaluation, we examine the query complexity of weighted average calculation. In general, determining an answer within accuracy $${\varepsilon}$$, with high probability, requires $${\Omega(\varepsilon^{-2})}$$ queries, as the mean is a special case. There is a matching upper bound for the weighted mean. If the weights are a normalized prefix of a divergent series, the same result holds. However, if the weights follow a geometric sequence, a sample of size $${\Omega(\log (1/\varepsilon))}$$ suffices. Our principal contribution is the investigation of power-law sequences of weights. We show that if the ith largest weight is proportional to i−p, for p > 1, then the query complexity is in $${\Omega(\varepsilon^{2/(1-2p)})}$$.

Venkatesan Guruswami | Andrew Wirth | Amit Chakrabarti | Anthony Wirth

[1] Ran Canetti,et al. Lower Bounds for Sampling Algorithms for Estimating the Average , 1995, Inf. Process. Lett..

[2] James Allan,et al. Minimal test collections for retrieval evaluation , 2006, SIGIR.

[3] Richard M. Karp,et al. An optimal algorithm for Monte Carlo estimation , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[4] Andrew Trotman,et al. Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[5] Alistair Moffat,et al. Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[6] Ravi Kumar,et al. Sampling algorithms: lower bounds and applications , 2001, STOC '01.

[7] Carsten Lund,et al. Priority sampling for estimation of arbitrary subset sums , 2007, JACM.

[8] Rajeev Motwani,et al. Estimating Sum by Weighted Sampling , 2007, ICALP.

[9] Mario Szegedy,et al. The DLT priority sampling is essentially optimal , 2006, STOC '06.