Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions

A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to ratings-converted-to-comparisons. Other examples include sports data analysis and peer grading. In this paper, we design two-sample tests for pairwise comparison data and ranking data. For our two-sample test for pairwise comparison data, we establish an upper bound on the sample complexity required to correctly distinguish between the distributions of the two sets of samples. Our test requires essentially no assumptions on the distributions. We then prove complementary lower bounds showing that our results are tight (in the minimax sense) up to constant factors. We investigate the role of modeling assumptions by proving lower bounds for a range of pairwise comparison models (WST, MST,SST, parameter-based such as BTL and Thurstone). We also provide testing algorithms and associated sample complexity bounds for the problem of two-sample testing with partial (or total) ranking data.Furthermore, we empirically evaluate our results via extensive simulations as well as two real-world datasets consisting of pairwise comparisons. By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently. On the other hand, our test recognizes no significant difference in the relative performance of European football teams across two seasons. Finally, we apply our two-sample test on a real-world partial and total ranking dataset and find a statistically significant difference in Sushi preferences across demographic divisions based on gender, age and region of residence.

[1]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[2]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[3]  R. Plackett The Analysis of Permutations , 1975 .

[4]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[5]  Mark Jerrum,et al.  Large Cliques Elude the Metropolis Process , 1992, Random Struct. Algorithms.

[6]  L. Thurstone A law of comparative judgment. , 1994 .

[7]  Yu. I. Ingster Minimax detection of a signal in ℓp metrics , 1994 .

[8]  Ludek Kucera,et al.  Expected Complexity of Graph Partitioning Problems , 1995, Discret. Appl. Math..

[9]  Yu. I. Ingster Adaptive chi-square tests , 2000 .

[10]  Yu. I. Ingster,et al.  Nonparametric Goodness-of-Fit Testing Under Gaussian Models , 2002 .

[11]  Toshihiro Kamishima,et al.  Nantonac collaborative filtering: recommendation based on order responses , 2003, KDD '03.

[12]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[13]  E. Wagenmakers,et al.  A psychometric analysis of chess expertise. , 2005, The American journal of psychology.

[14]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[15]  P. Rosenbaum An exact distribution‐free test comparing two multivariate distributions based on adjacency , 2005 .

[16]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[17]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[18]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[19]  Mark Braverman,et al.  Noisy sorting without resampling , 2007, SODA '08.

[20]  John Guiver,et al.  Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.

[21]  Lars Magnus Hvattum,et al.  Using ELO ratings for match result prediction in association football , 2010 .

[22]  J. Dana,et al.  Transitivity of preferences. , 2011, Psychological review.

[23]  Nir Ailon,et al.  An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity , 2010, J. Mach. Learn. Res..

[24]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[25]  Devavrat Shah,et al.  Iterative ranking from pair-wise comparisons , 2012, NIPS.

[26]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[27]  Devavrat Shah,et al.  A Nonparametric Approach to Modeling Choice with Limited Data , 2009, Manag. Sci..

[28]  Kannan Ramchandran,et al.  A Case for Ordinal Peer-evaluation in MOOCs , 2013 .

[29]  C. Varin,et al.  Dynamic Bradley–Terry modelling of sports tournaments , 2013 .

[30]  Arun Rajkumar,et al.  A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data , 2014, ICML.

[31]  Thorsten Joachims,et al.  Methods for ordinal peer grading , 2014, KDD.

[32]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[33]  Daniel R. Cavagnaro,et al.  Transitive in Our Preferences, But Transitive in Different Ways: An Analysis of Choice Variability , 2014 .

[34]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[35]  Matthias Grossglauser,et al.  Fast and Accurate Inference of Plackett-Luce Models , 2015, NIPS.

[36]  Eyke Hüllermeier,et al.  Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach , 2015, NIPS.

[37]  Yuxin Chen,et al.  Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons , 2015, ICML.

[38]  Arun Rajkumar,et al.  Ranking from Stochastic Pairwise Preferences: Recovering Condorcet Winners and Tournament Solution Sets at the Top , 2015, ICML.

[39]  Michael I. Jordan,et al.  On kernel methods for covariates that are rankings , 2016, 1603.08035.

[40]  Martin J. Wainwright,et al.  Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence , 2015, J. Mach. Learn. Res..

[41]  Nihar B. Shah,et al.  Active ranking from pairwise comparisons and when parametric assumptions do not help , 2016, The Annals of Statistics.

[42]  N. Wermuth,et al.  MINIMAX ESTIMATION OF LINEAR AND QUADRATIC FUNCTIONALS ON SPARSITY CLASSES , 2017 .

[43]  D. Aldous Elo Ratings and the Sports Model: A Neglected Topic in Applied Probability? , 2017 .

[44]  Alon Orlitsky,et al.  Maximum Selection and Ranking under Noisy Comparisons , 2017, ICML.

[45]  Martin J. Wainwright,et al.  Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues , 2015, IEEE Transactions on Information Theory.

[46]  Martin J. Wainwright,et al.  Simple, Robust and Optimal Ranking from Pairwise Comparisons , 2015, J. Mach. Learn. Res..

[47]  Isabelle Guyon,et al.  Design and Analysis of the NIPS 2016 Review Process , 2017, J. Mach. Learn. Res..

[48]  Xi Chen,et al.  Optimal Instance Adaptive Algorithm for the Top- $K$ Ranking Problem , 2018, IEEE Transactions on Information Theory.

[49]  Sivaraman Balakrishnan,et al.  Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information , 2018, ICML.

[50]  L. Wasserman,et al.  Robust multivariate nonparametric tests via projection averaging , 2018 .

[51]  Nihar B. Shah,et al.  Choosing How to Choose Papers , 2018, ArXiv.

[52]  Sivaraman Balakrishnan,et al.  Hypothesis Testing for High-Dimensional Multinomials: A Selective Review , 2017, ArXiv.

[53]  Alexandra Carpentier,et al.  Minimax Rate of Testing in Sparse Linear Regression , 2018, Automation and Remote Control.

[54]  Nihar B. Shah,et al.  On Strategyproof Conference Peer Review , 2018, IJCAI.

[55]  Sivaraman Balakrishnan,et al.  Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates , 2017, The Annals of Statistics.

[56]  Sabyasachi Chatterjee,et al.  Estimation in Tournaments and Graphs Under Monotonicity Constraints , 2016, IEEE Transactions on Information Theory.

[57]  Nihar B. Shah,et al.  Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings , 2018, AAMAS.

[58]  Johan Ugander,et al.  Fundamental Limits of Testing the Independence of Irrelevant Alternatives in Discrete Choice , 2019, EC.

[59]  Nihar B. Shah,et al.  Loss Functions, Axioms, and Peer Review , 2018, J. Artif. Intell. Res..

[60]  L. Wasserman,et al.  Minimax optimality of permutation tests , 2020, Annals of Statistics.