Antithetic and Monte Carlo kernel estimators for partial rankings

In the modern age, rankings data are ubiquitous and they are useful for a variety of applications such as recommender systems, multi-object tracking and preference learning. However, most rankings data encountered in the real world are incomplete, which prevent the direct application of existing modelling tools for complete rankings. Our contribution is a novel way to extend kernel methods for complete rankings to partial rankings, via consistent Monte Carlo estimators for Gram matrices: matrices of kernel values between pairs of observations. We also present a novel variance-reduction scheme based on an antithetic variate construction between permutations to obtain an improved estimator for the Mallows kernel. The corresponding antithetic kernel estimator has lower variance, and we demonstrate empirically that it has a better performance in a variety of machine learning tasks. Both kernel estimators are based on extending kernel mean embeddings to the embedding of a set of full rankings consistent with an observed partial ranking. They form a computationally tractable alternative to previous approaches for partial rankings data. An overview of the existing kernels and metrics for permutations is also provided.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[3]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[4]  J. Hammersley,et al.  A new Monte Carlo technique: antithetic variates , 1956, Mathematical Proceedings of the Cambridge Philosophical Society.

[5]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[6]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[7]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[8]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9]  R. Plackett The Analysis of Permutations , 1975 .

[10]  Gordon James,et al.  The Representation Theory of the Symmetric Groups , 1977 .

[11]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[12]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[13]  A. M. Fink,et al.  On Chebyshev's other inequality , 1984 .

[14]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[15]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[16]  R. Stanley What Is Enumerative Combinatorics , 1986 .

[17]  P. Diaconis Group representations in probability and statistics , 1988 .

[18]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[19]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[20]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[21]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[22]  R. Dudley,et al.  Real Analysis and Probability: Complex Numbers, Vector Spaces, and Taylor's Theorem with Remainder , 2002 .

[23]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[24]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[25]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[26]  Holger Wendland,et al.  Scattered Data Approximation: Conditionally positive definite functions , 2004 .

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[29]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[30]  Alexander J. Smola,et al.  Nonparametric Quantile Estimation , 2006, J. Mach. Learn. Res..

[31]  Shotaro Akaho,et al.  Efficient Clustering for Orders , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[32]  Joachim M. Buhmann,et al.  Cluster analysis of heterogeneous rank data , 2007, ICML '07.

[33]  Yi Mao,et al.  Non-parametric Modeling of Partially Ranked Data , 2007, NIPS.

[34]  Tony Jebara,et al.  Multi-object tracking with representations of the symmetric group , 2007, AISTATS.

[35]  Joaquín Muñoz-García,et al.  A test for the two-sample problem based on empirical characteristic functions , 2008, Comput. Stat. Data Anal..

[36]  Bernhard Schölkopf,et al.  Characteristic Kernels on Groups and Semigroups , 2008, NIPS.

[37]  Radford M. Neal Regression and Classification Using Gaussian Process Priors , 2009 .

[38]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[39]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[40]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[41]  Risi Kondor,et al.  Ranking with Kernels in Fourier space. , 2010, COLT 2010.

[42]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[43]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[44]  Yee Whye Teh,et al.  Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes , 2015 .

[45]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[46]  J. A. Lozano,et al.  PerMallows: An R Package for Mallows and Generalized Mallows Models , 2016 .

[47]  Estimation in exponential families on permutations , 2013, 1307.0978.

[48]  Michael I. Jordan,et al.  Universality of Mallows' and degeneracy of Kendall's kernels for rankings , 2016, ArXiv.

[49]  Michael I. Jordan,et al.  On kernel methods for covariates that are rankings , 2016, 1603.08035.

[50]  Valeria Vitelli,et al.  Probabilistic preference learning with the Mallows rank model , 2014, J. Mach. Learn. Res..

[51]  Silvio Lattanzi,et al.  Mallows Models for Top-k Lists , 2018, NeurIPS.

[52]  J. A. Lozano,et al.  Sampling and learning the Mallows and Generalized Mallows models under the Cayley distance , 2014 .

[53]  Jean-Philippe Vert,et al.  The Kendall and Mallows Kernels for Permutations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Yuxin Chen,et al.  Spectral Method and Regularized MLE Are Both Optimal for Top-$K$ Ranking , 2017, Annals of statistics.