PLMIX: an R package for modelling and clustering partially ranked data

ABSTRACT The PLMIX package offers a comprehensive framework aimed at endowing the R statistical environment with some recent methodological advances in modelling and clustering partially ranked data. The usefulness of the PLMIX package can be motivated from several perspectives: (i) it contributes to fill the gap concerning Bayesian estimation of ranking models in R, by focusing on the Plackett–Luce model and its extension within the finite mixture approach as the generative sampling distribution; (ii) it addresses computational complexity by combining the flexibility of R routines and the speed of compiled C++ code, with possibly parallel execution; (iii) it covers the fundamental phases of ranking data analysis allowing for a more careful and critical application of ranking models in real contexts; (iv) it provides effective tools for clustering heterogeneous partially ranked data. Specific S3 classes and methods are also supplied to enhance the usability and foster exchange with other packages. The functionality of the novel package is illustrated with several applications to simulated and real datasets.

[1]  Julien Jacques,et al.  A generative model for rank data based on insertion sort algorithm , 2013, Comput. Stat. Data Anal..

[2]  Panagiotis Papastamoulis,et al.  label.switching: An R Package for Dealing with the Label Switching Problem in MCMC Outputs , 2015, 1503.02271.

[3]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[4]  I. C. Gormley,et al.  Analysis of Irish third‐level college applications data , 2006 .

[5]  Cristina Mollica,et al.  Epitope profiling via mixture modeling of ranked data , 2014, Statistics in medicine.

[6]  I. C. Gormley,et al.  Exploring Voting Blocs Within the Irish Electorate , 2008 .

[7]  Hossein Azari Soufiani Revisiting Random Utility Models , 2014 .

[8]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[9]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[10]  Philip L. H. Yu,et al.  Bayesian analysis of order-statistics models for ranking data , 2000 .

[11]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[12]  S. Fienberg,et al.  Log linear representation for paired and multiple comparisons models , 1976 .

[13]  Christophe Biernacki,et al.  Rankcluster: An R Package for Clustering Multivariate Partial Rankings , 2014, R J..

[14]  Philip L. H. Yu,et al.  Distance-based tree models for ranking data , 2010, Comput. Stat. Data Anal..

[15]  Jean-Michel Marin,et al.  Bayesian Modelling and Inference on Mixtures of Distributions , 2005 .

[16]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[17]  R. Plackett The Analysis of Permutations , 1975 .

[18]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[19]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[20]  Anna E. Bargagliotti Aggregation and decision making using ranked data , 2009, Math. Soc. Sci..

[21]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[22]  Paul H Lee,et al.  An R package for analyzing and modeling ranking data , 2013, BMC Medical Research Methodology.

[23]  Ralph A. Bradley,et al.  14 Paired comparisons: Some basic procedures and examples , 1984, Nonparametric Methods.

[24]  Jeff A. Bilmes,et al.  Consensus ranking under the exponential model , 2007, UAI.

[25]  Thomas Brendan Murphy,et al.  Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[26]  L. Thurstone A law of comparative judgment. , 1994 .

[27]  Jayant Kalagnanam,et al.  A Computational Study of the Kemeny Rule for Preference Aggregation , 2004, AAAI.

[28]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[29]  Valeria Vitelli,et al.  Probabilistic preference learning with the Mallows rank model , 2014, J. Mach. Learn. Res..

[30]  Arnaud Doucet,et al.  Efficient Bayesian Inference for Generalized Bradley–Terry Models , 2010, 1011.1761.

[31]  Christophe Biernacki,et al.  Model-based clustering for multivariate partial ranking data , 2014 .

[32]  I. C. Gormley,et al.  A grade of membership model for rank data , 2009 .

[33]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[34]  David C. Parkes,et al.  Generalized Method-of-Moments for Rank Aggregation , 2013, NIPS.

[35]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[36]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[37]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[38]  David C. Parkes,et al.  Generalized Random Utility Models with Multiple Types , 2013, NIPS.

[39]  L. Tardella,et al.  Bayesian Plackett–Luce Mixture Models for Partially Ranked Data , 2015, Psychometrika.

[40]  Mayer Alvo,et al.  Statistical Methods for Ranking Data , 2014 .

[41]  T. Stengos,et al.  Love Thy Neighbor, Love Thy Kin: Strategy and Bias in the Eurovision Song Contest , 2006 .

[42]  A. P. Dawid,et al.  Bayesian Statistics 8 , 2007 .

[43]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.

[44]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[45]  T. Ando Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models , 2007 .

[46]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[47]  E. Vigneau,et al.  Analysis of ranked preference data using latent class models , 1999 .

[48]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[49]  R. J. Henery,et al.  Permutation Probabilities as Models for Horse Races , 1981 .

[50]  Philip L. H. Yu,et al.  Factor analysis for ranked data with application to a job selection attitude survey , 2005 .

[51]  Hal S. Stern,et al.  Probability Models on Rankings and the Electoral Process , 1993 .

[52]  Ayala Cohen,et al.  Assessing Goodness of Fit of Ranking Models to Data , 1983 .

[53]  Arthur R. Silverberg,et al.  Statistical models for Q-permutations , 1980 .

[54]  U. Böckenholt,et al.  BAYESIAN ESTIMATION OF THURSTONIAN RANKING MODELS BASED ON THE GIBBS SAMPLER , 1999 .

[55]  Hal S. Stern,et al.  Models for Distributions on Permutations , 1990 .

[56]  Regina Dittrich,et al.  prefmod: An R Package for Modeling Preferences Based on Paired Comparisons, Rankings, or Ratings , 2012 .

[57]  Ekhine Irurozki,et al.  An R package for permutations, Mallows and Generalized Mallows models , 2014 .

[58]  I. C. Gormley,et al.  Exploring Voting Blocs Within the Irish Electorate , 2008 .

[59]  M. Fligner,et al.  Multistage Ranking Models , 1988 .

[60]  Joseph S. Verducci,et al.  Probability models on rankings. , 1991 .