Clustering and Prediction of Rankings Within a Kemeny Distance Framework

Rankings and partial rankings are ubiquitous in data analysis, yet there is relatively little work in the classification community that uses the typical properties of rankings. We review the broader literature that we are aware of, and identify a common building block for both prediction of rankings and clustering of rankings, which is also valid for partial rankings. This building block is the Kemeny distance, defined as the minimum number of interchanges of two adjacent elements required to transform one (partial) ranking into another. The Kemeny distance is equivalent to Kendall’s τ for complete rankings, but for partial rankings it is equivalent to Emond and Mason’s extension of τ. For clustering, we use the flexible class of methods proposed by Ben-Israel and Iyigun (Journal of Classification 25: 5–26, 2008), and define the disparity between a ranking and the center of cluster as the Kemeny distance. For prediction, we build a prediction tree by recursive partitioning, and define the impurity measure of the subgroups formed as the sum of all within-node Kemeny distances. The median ranking characterizes subgroups in both cases.

[1]  M. Fligner,et al.  Multistage Ranking Models , 1988 .

[2]  L. Thurstone A law of comparative judgment. , 1994 .

[3]  Henry E. Brady Factor and ideal point analysis for interpersonally incomparable data , 1989 .

[4]  C. Coombs A theory of data. , 1965, Psychology Review.

[5]  Jun Zhang Binary choice, subset choice, random utility, and ranking: A unified perspective using the permutahedron , 2004 .

[6]  P. Groenen,et al.  Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation , 2005 .

[7]  I. C. Gormley,et al.  Exploring Voting Blocs Within the Irish Electorate , 2008 .

[8]  Karl Christoph Klauer,et al.  New developments in psychological choice modeling , 1989 .

[9]  Berthold Lausen,et al.  Advances in Data Analysis, Data Handling and Business Intelligence - Proceedings of the 32nd Annual Conference of the Gesellschaft für Klassifikation e.V., Joint Conference with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), Helmut-Schmidt-University, Ha , 2010, GfKl.

[10]  William B. Michael,et al.  Psychological Scaling: Theory and Applications , 1961 .

[11]  Eyke Hüllermeier,et al.  Decision tree and instance-based learning for label ranking , 2009, ICML '09.

[12]  H. E. Daniels,et al.  Rank Correlation and Population Models , 1950 .

[13]  R. Siciliano,et al.  A statistical approach to growing a reliable honest tree , 2002 .

[14]  Randall G. Chapaaan,et al.  Exploiting Rank Ordered Choice Set Data within the Stochastic Utility Model , 1982 .

[15]  Willem J. Heiser,et al.  Principal Components Analysis With Nonlinear Optimal Scaling Transformations for Ordinal and Nominal Data , 2005 .

[16]  Thomas Brendan Murphy,et al.  Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[17]  Sophia Rabe-Hesketh,et al.  Multilevel logistic regression for polytomous data and rankings , 2003 .

[18]  Willem J. Heiser,et al.  Multidimensional Scaling and Unfolding of Symmetric and Asymmetric Proximity Relations , 2004 .

[19]  Akimichi Takemura,et al.  Characterization of rankings generated by linear discriminant anlaysis , 2005 .

[20]  Cem Iyigun,et al.  Probabilistic D-Clustering , 2008, J. Classif..

[21]  Adi Ben-Israel,et al.  PROBABILISTIC DISTANCE CLUSTERING ADJUSTED FOR CLUSTER SIZE , 2008, Probability in the Engineering and Informational Sciences.

[22]  I. C. Gormley,et al.  A mixture of experts model for rank data with applications in election studies , 2008, 0901.4203.

[23]  Antonio D’Ambrosio,et al.  Tree based methods for data editing and preference rankings , 2008 .

[24]  L. Thurstone Rank order as a psycho-physical method. , 1931 .

[25]  Willem J. Heiser,et al.  Multidimensional mapping of preference data , 1981 .

[26]  Joseph L. Zinnes,et al.  Probabilistic, multidimensional unfolding analysis , 1974 .

[27]  U. Böckenholt Thurstonian representation for partial ranking data , 1992 .

[28]  Louis Guttman,et al.  An Approach for Quantifying Paired Comparisons and Rank Order , 1946 .

[29]  Wagner A. Kamakura,et al.  An Ideal-Point Probabilistic Choice Model for Heterogeneous Preferences , 1986 .

[30]  紙屋 英彦,et al.  Characterization of Rankings Generated by Linear Discriminant Analysis , 2003 .

[31]  Hiroshi Hojo Multidimensional unfolding analyses of ranking data for groups , 2002 .

[32]  David Kaplan,et al.  The Sage handbook of quantitative methodology for the social sciences , 2004 .

[33]  W. Heiser,et al.  Clusteringn objects intok groups under optimal scaling of variables , 1989 .

[34]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[35]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[36]  Albert Maydeu-Olivares,et al.  Thurstonian modeling of ranking data via mean and covariance structure analysis , 1999 .

[37]  Peter M. Bentler,et al.  Covariance structure analysis of ordinal ipsative data , 1998 .

[38]  Akimichi Takemura,et al.  Ranking patterns of unfolding models of codimension one , 2010, Adv. Appl. Math..

[39]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[40]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[41]  John G. Kemeny,et al.  Mathematical models in the social sciences , 1964 .

[42]  Regina Dittrich,et al.  Analysing partial ranks by using smoothed paired comparison methods: an investigation of value orientation in Europe , 2002 .

[43]  Akimichi Takemura,et al.  Arrangements and Ranking Patterns , 2006 .

[44]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[45]  Patrick Slater,et al.  THE ANALYSIS OF PERSONAL PREFERENCES , 1960 .

[46]  W. Heiser,et al.  Restricted unfolding: Preference analysis with optimal transformations of preferences and attributes , 2010 .

[47]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[48]  U. Böckenholt,et al.  BAYESIAN ESTIMATION OF THURSTONIAN RANKING MODELS BASED ON THE GIBBS SAMPLER , 1999 .

[49]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[50]  Marcel A. Croon,et al.  Latent Class Models for the Analysis of Rankings , 1989 .

[51]  Cem Iyigun,et al.  Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification , 2008, GfKl.

[52]  W. Kruskal Ordinal Measures of Association , 1958 .

[53]  Ulf Böckenholt,et al.  Mixed-effects analyses of rank-ordered data , 2001 .

[54]  E. J. Emond,et al.  A new rank correlation coefficient with application to the consensus ranking problem , 2002 .

[55]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[56]  Hiroshi Hojo A marginalization model for the multidimensional unfolding analysis of ranking data , 1997 .

[57]  Eyke Hllermeier,et al.  Preference Learning , 2010 .

[58]  Wade D. Cook,et al.  Distance-based and ad hoc consensus models in ordinal preference ranking , 2006, Eur. J. Oper. Res..

[59]  Joseph S. Verducci,et al.  Probability models on rankings. , 1991 .

[60]  P. Diaconis A Generalization of Spectral Analysis with Application to Ranked Data , 1989 .

[61]  C H COOMBS,et al.  Psychological scaling without a unit of measurement. , 1950, Psychological review.

[62]  Walter Katzenbeisser,et al.  The analysis of rank ordered preference data based on Bradley-Terry Type Models Die Analyse von Präferenzdaten mit Hilfe von log-linearen Bradley-Terry Modellen , 2000, OR Spectr..

[63]  Akimichi Takemura,et al.  On Rankings Generated by Pairwise Linear Discriminant Analysis ofmPopulations , 1997 .

[64]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[65]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[66]  A. Guénoche,et al.  Median linear orders: Heuristics and a branch and bound algorithm , 1989 .

[67]  Douglas E. Critchlow,et al.  Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation on GLIM , 1991 .

[68]  G. L. Thompson Generalized Permutation Polytopes and Exploratory Graphical Methods for Ranked Data , 1993 .

[69]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[70]  G. De Soete,et al.  Unfolding and consensus ranking: A prestige ladder for technical occupations , 1989 .

[71]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[72]  S. Shapiro,et al.  Mathematics without Numbers , 1993 .

[73]  C. Iyigun Probabilistic Distance Clustering , 2011 .

[74]  I. C. Gormley,et al.  Exploring Voting Blocs Within the Irish Electorate , 2008 .

[75]  Willem J. Heiser,et al.  Geometric representation of association between categories , 2004 .