Author's Personal Copy Computational Statistics and Data Analysis Mixtures of Weighted Distance-based Models for Ranking Data with Applications in Political Studies

Analysis of ranking data is often required in various fields of study, for example politics, market research and psychology. Over the years, many statistical models for ranking data have been developed. Among them, distance-based ranking models postulate that the probability of observing a ranking of items depends on the distance between the observed ranking and a modal ranking. The closer to the modal ranking, the higher the ranking probability is. However, such a model assumes a homogeneous population, and the single dispersion parameter in the model may not be able to describe the data well. To overcome these limitations, we formulate more flexible models by considering the recently developed weighted distance-based models which can allow different weights for different ranks. The assumption of a homogeneous population can be relaxed by an extension to mixtures of weighted distance-based models. The properties of weighted distance-based models are also discussed. We carry out simulations to test the performance of our parameter estimation and model selection procedures. Finally, we apply the proposed methodology to analyze synthetic ranking datasets and a real world ranking dataset about political goals priority.

[1]  I. C. Gormley,et al.  Exploring Voting Blocs Within the Irish Electorate , 2008 .

[2]  Joseph S. Verducci,et al.  Probability Models and Statistical Analyses for Ranking Data , 1992 .

[3]  Jeroen K. Vermunt,et al.  Heterogeneity in Post-materialist Value Priorities. Evidence from a Latent Class Discrete Choice Approach , 2007 .

[4]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[7]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[8]  G. L. Thompson Graphical Techniques for Ranked Data , 1993 .

[9]  Joseph S. Verducci,et al.  Probability models on rankings. , 1991 .

[10]  Gérard Govaert,et al.  Model-based cluster and discriminant analysis with the MIXMOD software , 2006, Comput. Stat. Data Anal..

[11]  Hal S. Stern,et al.  Probability Models on Rankings and the Electoral Process , 1993 .

[12]  Joachim M. Buhmann,et al.  Cluster analysis of heterogeneous rank data , 2007, ICML '07.

[13]  Marina Meila,et al.  Dirichlet Process Mixtures of Generalized Mallows Models , 2010, UAI.

[14]  Marcel A. Croon,et al.  Latent Class Models for the Analysis of Rankings , 1989 .

[15]  James D. Wright,et al.  The Political Consciousness of Post-Industrialism@@@The Silent Revolution: Changing Values and Political Styles among Western Publics. , 1978 .

[16]  I. C. Gormley,et al.  Analysis of Irish third‐level college applications data , 2006 .

[17]  D. Critchlow Metric Methods for Analyzing Partially Ranked Data , 1986 .

[18]  Philip L. H. Yu,et al.  Distance-based tree models for ranking data , 2010, Comput. Stat. Data Anal..

[19]  Eyke Hüllermeier,et al.  Decision tree and instance-based learning for label ranking , 2009, ICML '09.

[20]  S. Fienberg,et al.  DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA. , 2007, The annals of applied statistics.

[21]  M. Fligner,et al.  Multistage Ranking Models , 1988 .

[22]  I. C. Gormley,et al.  Exploring Voting Blocs Within the Irish Electorate , 2008 .

[23]  Christian Hennig,et al.  Design of dissimilarity measures: a new dissimilarity measure between species distribution ranges , 2006 .

[24]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[25]  Grace S. Shieh A weighted Kendall's tau statistic , 1998 .

[26]  Thomas Brendan Murphy,et al.  Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[27]  Grace S. Shieh,et al.  RANK TESTS FOR INDEPENDENCE — WITH A WEIGHTED CONTAMINATION ALTERNATIVE , 2000 .

[28]  Agostino Tarsitano COMPARING THE EFFECTIVENESS OF RANK CORRELATION STATISTICS , 2009 .

[29]  P. Diaconis Group representations in probability and statistics , 1988 .

[30]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[31]  John D. Lafferty,et al.  Cranking: Combining Rankings Using Conditional Probability Models on Permutations , 2002, ICML.

[32]  Marina Meila,et al.  An Exponential Model for Infinite Rankings , 2010, J. Mach. Learn. Res..

[33]  M. Kendall Theoretical Statistics , 1956, Nature.