On kernel methods for covariates that are rankings

Permutation-valued features arise in a variety of applications, either in a direct way when preferences are elicited over a collection of items, or an indirect way in which numerical ratings are converted to a ranking. To date, there has been relatively limited study of regression, classification, and testing problems based on permutation-valued features, as opposed to permutation-valued responses. This paper studies the use of reproducing kernel Hilbert space methods for learning from permutation-valued features. These methods embed the rankings into an implicitly defined function space, and allow for efficient estimation of regression and test functions in this richer space. Our first contribution is to characterize both the feature spaces and spectral properties associated with two kernels for rankings, the Kendall and Mallows kernels. Using tools from representation theory, we explain the limited expressive power of the Kendall kernel by characterizing its degenerate spectrum, and in sharp contrast, we prove that Mallows' kernel is universal and characteristic. We also introduce families of polynomial kernels that interpolate between the Kendall (degree one) and Mallows' (infinite degree) kernels. We show the practical effectiveness of our methods via applications to Eurobarometer survey data as well as a Movielens ratings dataset.

[1]  L. Thurstone Rank order as a psycho-physical method. , 1931 .

[2]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[3]  R. Plackett The Analysis of Permutations , 1975 .

[4]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[5]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[6]  P. Diaconis Group representations in probability and statistics , 1988 .

[7]  Bruce E. Sagan,et al.  The symmetric group - representations, combinatorial algorithms, and symmetric functions , 2001, Wadsworth & Brooks / Cole mathematics series.

[8]  L. Thurstone A law of comparative judgment. , 1994 .

[9]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[12]  I. Kondor,et al.  Group theoretical methods in machine learning , 2008 .

[13]  Leonidas J. Guibas,et al.  Fourier Theoretic Probabilistic Inference over Permutations , 2009, J. Mach. Learn. Res..

[14]  B. Francis,et al.  Modeling heterogeneity in ranked responses by nonparametric maximum likelihood: How do Europeans get their scientific knowledge? , 2010, 1101.1425.

[15]  Andreas Christmann,et al.  Universal Kernels on Non-Standard Input Spaces , 2010, NIPS.

[16]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[17]  Risi Kondor,et al.  Ranking with Kernels in Fourier space. , 2010, COLT 2010.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[20]  Barnabás Póczos,et al.  Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing , 2015, ArXiv.

[21]  Jean-Philippe Vert,et al.  The Kendall and Mallows Kernels for Permutations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.