Mallows Models for Top-k Lists

The classic Mallows model is a widely-used tool to realize distributions on per- mutations. Motivated by common practical situations, in this paper, we generalize Mallows to model distributions on top-k lists by using a suitable distance measure between top-k lists. Unlike many earlier works, our model is both analytically tractable and computationally efficient. We demonstrate this by studying two basic problems in this model, namely, sampling and reconstruction, from both algorithmic and experimental points of view.

[1]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[2]  Avrim Blum,et al.  Learning Mixtures of Ranking Models , 2014, NIPS.

[3]  Mark Braverman,et al.  Sorting from Noisy Information , 2009, ArXiv.

[4]  Marina Meila,et al.  An Exponential Model for Infinite Rankings , 2010, J. Mach. Learn. Res..

[5]  Yi Mao,et al.  Non-parametric Modeling of Partially Ranked Data , 2007, NIPS.

[6]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[7]  Craig Boutilier,et al.  Effective sampling and learning for mallows models with pairwise-preference data , 2014, J. Mach. Learn. Res..

[8]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[9]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[10]  Silvio Lattanzi,et al.  On Learning Mixture Models for Permutations , 2015, ITCS.

[11]  Silvio Lattanzi,et al.  On Reconstructing a Hidden Permutation , 2014, APPROX-RANDOM.

[12]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[13]  Lorenzo De Stefani,et al.  Reconstructing Hidden Permutations Using the Average-Precision (AP) Correlation Statistic , 2016, AAAI.

[14]  Rizal Setya Perdana What is Twitter , 2013 .

[15]  D. Wilson Mixing times of lozenge tiling and card shuffling Markov chains , 2001, math/0102193.

[16]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[17]  Xueqi Cheng,et al.  A new probabilistic model for top-k ranking problem , 2012, CIKM.

[18]  S. Falcon,et al.  Combining Results of Microarray Experiments: A Rank Aggregation Approach , 2006, Statistical applications in genetics and molecular biology.

[19]  G. Thompson,et al.  The Theory of Committees and Elections. , 1959 .

[20]  Dana Randall,et al.  Disjoint Decomposition of Markov Chains and Sampling Circuits in Cayley Graphs , 2006, Combinatorics, Probability and Computing.

[21]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[22]  D. Critchlow Metric Methods for Analyzing Partially Ranked Data , 1986 .

[23]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[24]  Tie-Yan Liu,et al.  Statistical Consistency of Top-k Ranking , 2009, NIPS.

[25]  A. Pekec,et al.  The repeated insertion model for rankings: Missing link between two subset choice models , 2004 .

[26]  V. Climenhaga Markov chains and mixing times , 2013 .

[27]  Eyke Hüllermeier,et al.  Statistical Inference for Incomplete Ranking Data: The Case of Rank-Dependent Coarsening , 2017, ICML.

[28]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[29]  Elchanan Mossel,et al.  Mixing times of the biased card shuffling and the asymmetric exclusion process , 2002, math/0207199.