An Exponential Model for Infinite Rankings

This paper presents a statistical model for expressing preferences through rankings, when the number of alternatives (items to rank) is large. A human ranker will then typically rank only the most preferred items, and may not even examine the whole set of items, or know how many they are. Similarly, a user presented with the ranked output of a search engine, will only consider the highest ranked items. We model such situations by introducing a stagewise ranking model that operates with finite ordered lists called top-t orderings over an infinite space of items. We give algorithms to estimate this model from data, and demonstrate that it has sufficient statistics, being thus an exponential family model with continuous and discrete parameters. We describe its conjugate prior and other statistical properties. Then, we extend the estimation problem to multimodal data by introducing an Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of our model and demonstrate that infinite models over permutations can be simple, elegant and practical.

[1]  R. Plackett The Analysis of Permutations , 1975 .

[2]  Walter L. Smith Probability and Statistics , 1959, Nature.

[3]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[4]  Miguel Á. Carreira-Perpiñán,et al.  Fast nonparametric clustering with Gaussian blurring mean-shift , 2006, ICML.

[5]  I. C. Gormley Exploring Heterogeneity In Irish Voting Data : A Mixture Modelling Approach ∗ , 2005 .

[6]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[7]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  D. Critchlow Metric Methods for Analyzing Partially Ranked Data , 1986 .

[9]  M. Fligner,et al.  Posterior probabilities for a consensus ordering , 1990 .

[10]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[11]  R. Stanley Enumerative Combinatorics: Volume 1 , 2011 .

[12]  E. Thoma Die unzerlegbaren, positiv-definiten Klassenfunktionen der abzählbar unendlichen, symmetrischen Gruppe , 1964 .

[13]  ChengYizong Mean Shift, Mode Seeking, and Clustering , 1995 .

[14]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[15]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[16]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[17]  Yi Mao,et al.  Non-parametric Modeling of Partially Ranked Data , 2007, NIPS.

[18]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[19]  M. Fligner,et al.  Multistage Ranking Models , 1988 .

[20]  Marina Meila,et al.  Estimation and clustering with infinite rankings , 2008, UAI.

[21]  John D. Lafferty,et al.  Conditional Models on the Ranking Poset , 2002, NIPS.

[22]  Le Song,et al.  Kernelized Sorting , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Joachim M. Buhmann,et al.  Cluster analysis of heterogeneous rank data , 2007, ICML '07.

[24]  I. C. Gormley,et al.  Analysis of Irish third‐level college applications data , 2006 .

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[26]  Marina Meila,et al.  Tractable Search for Learning Exponential Models of Rankings , 2009, AISTATS.

[27]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[28]  Thomas Brendan Murphy,et al.  Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[29]  Jeff A. Bilmes,et al.  Consensus ranking under the exponential model , 2007, UAI.