Rank quantization

We study the problem of aggregating and summarizing partial orders, on a large scale. Our motivation is two-fold: to discover elements at similar preference levels and to reduce the number of bits needed to store an element's position in a full ranking.We proceed in two steps: first, we find a total order by linearizing the rankings induced by the multiple partial orders and removing potentially inconsistent pairwise preferences. Next, given a total order, we introduce and formalize the rank quantization problem, which intuitively aims to bucketize the total order in a manner that mostly preserves the relations appearing in the partial orders. We show an exact quadratic-time quantization algorithm, as well as a greedy 2/3-approximation algorithm whose running is substantially faster on sparse instances. As an application, we aggregate rankings of top-10 search results over millions of search engine queries, approximately reproducing and then efficiently encoding the underlying static ranks used by the engine. We evaluate the performance of our algorithms on a web dataset of 12 million(2^{23.5}) unique pages and show that we can quantize the pages' static ranks using as few as eight bits, with only a minor degradation in search quality.

[1]  Aristides Gionis,et al.  Algorithms for discovering bucket orders from data , 2006, KDD '06.

[2]  Joseph Naor,et al.  Approximating Minimum Feedback Sets and Multicuts in Directed Graphs , 1998, Algorithmica.

[3]  Vinayaka Pandit,et al.  On Discovering Bucket Orders from Preference Data , 2010 .

[4]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[7]  Alistair Moffat,et al.  Memory Efficient Ranking , 1994, Inf. Process. Manag..

[8]  Atri Rudra,et al.  Ordering by weighted number of wins gives a good ranking for weighted tournaments , 2006, SODA '06.

[9]  Ning Li,et al.  Static score bucketing in inverted indexes , 2005, CIKM '05.

[10]  Heikki Mannila,et al.  Global partial orders from sequential data , 2000, KDD '00.

[11]  Wilfred Ng,et al.  Discovering bucket orders from full rankings , 2008, SIGMOD Conference.

[12]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[13]  Claire Mathieu,et al.  Electronic Colloquium on Computational Complexity, Report No. 144 (2006) How to rank with few errors A PTAS for Weighted Feedback Arc Set on Tournaments , 2006 .

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[16]  Ronald Fagin,et al.  Comparing Partial Rankings , 2006, SIAM J. Discret. Math..

[17]  Taher H. Haveliwala Efficient Encodings for Document Ranking Vectors (Extended Abstract) , 2003, International Conference on Internet Computing.

[18]  Alistair Moffat,et al.  An Efficient Indexing Technique for Full Text Databases , 1992, Very Large Data Bases Conference.

[19]  Heikki Mannila,et al.  Finding partial orders from unordered 0-1 data , 2005, KDD '05.

[20]  Ian H. Witten,et al.  Managing gigabytes 2nd edition , 1999 .

[21]  Aristides Gionis,et al.  A randomized approximation algorithm for computing bucket orders , 2009, Inf. Process. Lett..

[22]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.