论文信息 - On Discovering Bucket Orders from Preference Data

On Discovering Bucket Orders from Preference Data

The problem of ordering a set of entities which contain inherent ties among them arises in many applications. Notion of “bucket order” has emerged as a popular mechanism of ranking in such settings. A bucket order is an ordered partition of the set of entities into “buckets”. There is a total order on the buckets, but the entities within a bucket are treated as tied. In this paper, we focus on discovering bucket order from data captured in the form of user preferences. We consider two settings: one in which the discrepancies in the input preferences are “local” (when collected from experts) and the other in which discrepancies could be arbitrary (when collected from a large population). We present a formal model to capture the setting of local discrepancies and consider the following question: “how many experts need to be queried to discover the underlying bucket order on n entities?”. We prove an upperbound of O( √ logn). In the case of arbitrary discrepancies, we model it as the bucket order problem of discovering a bucket order that best fits the data (captured as pairwise preference statistics). We present a new approach which exploits a connection between the discovery of buckets and the correlation clustering problem. We present empirical evaluation of our algorithms on real and artificially generated datasets.

[1] Heikki Mannila,et al. Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods , 2006, PLoS Comput. Biol..

[2] Ronald Fagin,et al. Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[3] Aristides Gionis,et al. Algorithms for discovering bucket orders from data , 2006, KDD '06.

[4] Aristides Gionis,et al. Spectral ordering and biochronology of European fossil mammals , 2006, Paleobiology.

[5] Mark Braverman,et al. Noisy sorting without resampling , 2007, SODA '08.

[6] János Podani,et al. REARRANGEMENT OF ECOLOGICAL DATA MATRICES VIA MARKOV CHAIN MONTE CARLO SIMULATION , 2005 .

[7] S. Shapiro,et al. Mathematics without Numbers , 1993 .

[8] M. Mitzenmacher,et al. Probability and Computing: Chernoff Bounds , 2005 .

[9] Moni Naor,et al. Rank aggregation methods for the Web , 2001, WWW '01.

[10] C. F. Kossack,et al. Rank Correlation Methods , 1949 .

[11] Atri Rudra,et al. Ordering by weighted number of wins gives a good ranking for weighted tournaments , 2006, SODA '06.

[12] Nikhil Bansal,et al. Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[13] H. Young. Condorcet's Theory of Voting , 1988, American Political Science Review.

[14] J. M. Bevan,et al. Rank Correlation Methods , 1949 .

[15] Werner Vach,et al. A Bayesian approach to seriation problems in archaeology , 2004, Comput. Stat. Data Anal..

[16] Javed A. Aslam,et al. Models for metasearch , 2001, SIGIR '01.

[17] Wilfred Ng,et al. Discovering bucket orders from full rankings , 2008, SIGMOD Conference.

[18] Alan Halverson,et al. Generating labels from clicks , 2009, WSDM '09.

[19] H. Young,et al. A Consistent Extension of Condorcet’s Election Principle , 1978 .

[20] Richard M. Karp,et al. Noisy binary search and its applications , 2007, SODA '07.

[21] Yoram Singer,et al. Learning to Order Things , 1997, NIPS.

[22] Claire Mathieu,et al. Electronic Colloquium on Computational Complexity, Report No. 144 (2006) How to rank with few errors A PTAS for Weighted Feedback Arc Set on Tournaments , 2006 .

[23] Nir Ailon,et al. Aggregation of Partial Rankings, p-Ratings and Top-m Lists , 2007, SODA '07.

[24] Ronald Fagin,et al. Comparing and aggregating rankings with ties , 2004, PODS '04.