On Discovering Bucket Orders from Preference Data

The problem of ordering a set of entities which contain inherent ties among them arises in many applications. Notion of “bucket order” has emerged as a popular mechanism of ranking in such settings. A bucket order is an ordered partition of the set of entities into “buckets”. There is a total order on the buckets, but the entities within a bucket are treated as tied. In this paper, we focus on discovering bucket order from data captured in the form of user preferences. We consider two settings: one in which the discrepancies in the input preferences are “local” (when collected from experts) and the other in which discrepancies could be arbitrary (when collected from a large population). We present a formal model to capture the setting of local discrepancies and consider the following question: “how many experts need to be queried to discover the underlying bucket order on n entities?”. We prove an upperbound of O( √ logn). In the case of arbitrary discrepancies, we model it as the bucket order problem of discovering a bucket order that best fits the data (captured as pairwise preference statistics). We present a new approach which exploits a connection between the discovery of buckets and the correlation clustering problem. We present empirical evaluation of our algorithms on real and artificially generated datasets.

[1]  Heikki Mannila,et al.  Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods , 2006, PLoS Comput. Biol..

[2]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[3]  Aristides Gionis,et al.  Algorithms for discovering bucket orders from data , 2006, KDD '06.

[4]  Aristides Gionis,et al.  Spectral ordering and biochronology of European fossil mammals , 2006, Paleobiology.

[5]  Mark Braverman,et al.  Noisy sorting without resampling , 2007, SODA '08.

[6]  János Podani,et al.  REARRANGEMENT OF ECOLOGICAL DATA MATRICES VIA MARKOV CHAIN MONTE CARLO SIMULATION , 2005 .

[7]  S. Shapiro,et al.  Mathematics without Numbers , 1993 .

[8]  M. Mitzenmacher,et al.  Probability and Computing: Chernoff Bounds , 2005 .

[9]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[10]  C. F. Kossack,et al.  Rank Correlation Methods , 1949 .

[11]  Atri Rudra,et al.  Ordering by weighted number of wins gives a good ranking for weighted tournaments , 2006, SODA '06.

[12]  Nikhil Bansal,et al.  Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[13]  H. Young Condorcet's Theory of Voting , 1988, American Political Science Review.

[14]  J. M. Bevan,et al.  Rank Correlation Methods , 1949 .

[15]  Werner Vach,et al.  A Bayesian approach to seriation problems in archaeology , 2004, Comput. Stat. Data Anal..

[16]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[17]  Wilfred Ng,et al.  Discovering bucket orders from full rankings , 2008, SIGMOD Conference.

[18]  Alan Halverson,et al.  Generating labels from clicks , 2009, WSDM '09.

[19]  H. Young,et al.  A Consistent Extension of Condorcet’s Election Principle , 1978 .

[20]  Richard M. Karp,et al.  Noisy binary search and its applications , 2007, SODA '07.

[21]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[22]  Claire Mathieu,et al.  Electronic Colloquium on Computational Complexity, Report No. 144 (2006) How to rank with few errors A PTAS for Weighted Feedback Arc Set on Tournaments , 2006 .

[23]  Nir Ailon,et al.  Aggregation of Partial Rankings, p-Ratings and Top-m Lists , 2007, SODA '07.

[24]  Ronald Fagin,et al.  Comparing and aggregating rankings with ties , 2004, PODS '04.