论文信息 - Rank quantization

Rank quantization

We study the problem of aggregating and summarizing partial orders, on a large scale. Our motivation is two-fold: to discover elements at similar preference levels and to reduce the number of bits needed to store an element's position in a full ranking.We proceed in two steps: first, we find a total order by linearizing the rankings induced by the multiple partial orders and removing potentially inconsistent pairwise preferences. Next, given a total order, we introduce and formalize the rank quantization problem, which intuitively aims to bucketize the total order in a manner that mostly preserves the relations appearing in the partial orders. We show an exact quadratic-time quantization algorithm, as well as a greedy 2/3-approximation algorithm whose running is substantially faster on sparse instances. As an application, we aggregate rankings of top-10 search results over millions of search engine queries, approximately reproducing and then efficiently encoding the underlying static ranks used by the engine. We evaluate the performance of our algorithms on a web dataset of 12 million(2^{23.5}) unique pages and show that we can quantize the pages' static ranks using as few as eight bits, with only a minor degradation in search quality.

[1] Aristides Gionis,et al. Algorithms for discovering bucket orders from data , 2006, KDD '06.

[2] Joseph Naor,et al. Approximating Minimum Feedback Sets and Multicuts in Directed Graphs , 1998, Algorithmica.

[3] Vinayaka Pandit,et al. On Discovering Bucket Orders from Preference Data , 2010 .

[4] Moni Naor,et al. Rank aggregation methods for the Web , 2001, WWW '01.

[5] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6] Luiz André Barroso,et al. Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[7] Alistair Moffat,et al. Memory Efficient Ranking , 1994, Inf. Process. Manag..

[8] Atri Rudra,et al. Ordering by weighted number of wins gives a good ranking for weighted tournaments , 2006, SODA '06.

[9] Ning Li,et al. Static score bucketing in inverted indexes , 2005, CIKM '05.

[10] Heikki Mannila,et al. Global partial orders from sequential data , 2000, KDD '00.

[11] Wilfred Ng,et al. Discovering bucket orders from full rankings , 2008, SIGMOD Conference.

[12] Stephen E. Robertson,et al. Relevance weighting for query independent evidence , 2005, SIGIR '05.

[13] Claire Mathieu,et al. Electronic Colloquium on Computational Complexity, Report No. 144 (2006) How to rank with few errors A PTAS for Weighted Feedback Arc Set on Tournaments , 2006 .

[14] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[16] Ronald Fagin,et al. Comparing Partial Rankings , 2006, SIAM J. Discret. Math..

[17] Taher H. Haveliwala. Efficient Encodings for Document Ranking Vectors (Extended Abstract) , 2003, International Conference on Internet Computing.

[18] Alistair Moffat,et al. An Efficient Indexing Technique for Full Text Databases , 1992, Very Large Data Bases Conference.

[19] Heikki Mannila,et al. Finding partial orders from unordered 0-1 data , 2005, KDD '05.

[20] Ian H. Witten,et al. Managing gigabytes 2nd edition , 1999 .

[21] Aristides Gionis,et al. A randomized approximation algorithm for computing bucket orders , 2009, Inf. Process. Lett..

[22] JUSTIN ZOBEL,et al. Inverted files for text search engines , 2006, CSUR.