Multivariate spearman's ρ for aggregating ranks using copulas

We study the problem of rank aggregation: given a set of ranked lists, we want to form a consensus ranking. Furthermore, we consider the case of extreme lists: i.e., only the rank of the best or worst elements are known. We impute missing ranks and generalise Spearman's ρ to extreme ranks. Our main contribution is the derivation of a non-parametric estimator for rank aggregation based on multivariate extensions of Spearman's ρ, which measures correlation between a set of ranked lists. Multivariate Spearman's ρ is defined using copulas, and we show that the geometric mean of normalised ranks maximises multivariate correlation. Motivated by this, we propose a weighted geometric mean approach for learning to rank which has a closed form least squares solution. When only the best (top-k) or worst (bottom-k) elements of a ranked list are known, we impute the missing ranks by the average value, allowing us to apply Spearman's ρ. We discuss an optimistic and pessimistic imputation of missing values, which respectively maximise and minimise correlation, and show its effect on aggregating university rankings. Finally, we demonstrate good performance on the rank aggregation benchmarks MQ2007 and MQ2008.

[1]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[2]  E. Lehmann Some Concepts of Dependence , 1966 .

[3]  D. Critchlow Metric Methods for Analyzing Partially Ranked Data , 1986 .

[4]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[5]  P. Diaconis Group representations in probability and statistics , 1988 .

[6]  Harry Joe,et al.  Multivariate concordance , 1990 .

[7]  Roger B. Nelsen,et al.  Nonparametric measures of multivariate association , 1996 .

[8]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[9]  Manuel Úbeda-Flores Multivariate versions of Blomqvist’s beta and Spearman’s footrule , 2005 .

[10]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[11]  Yi Mao,et al.  Non-parametric Modeling of Partially Ranked Data , 2007, NIPS.

[12]  Pravin K. Trivedi,et al.  Copula Modeling: An Introduction for Practitioners , 2007 .

[13]  Friedrich Schmid,et al.  Multivariate Extensions of Spearman's Rho and Related Statistics , 2007 .

[14]  C. Genest,et al.  Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask , 2007 .

[15]  M. D. Taylor Multivariate measures of concordance , 2007 .

[16]  P. Embrechts,et al.  Dependence modeling with copulas , 2007 .

[17]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[18]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[19]  C. Sempi,et al.  Copula Theory: An Introduction , 2010 .

[20]  Tao Qin,et al.  A New Probabilistic Model for Rank Aggregation , 2010, NIPS.

[21]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[22]  Friedrich Schmid,et al.  Copula-Based Measures of Multivariate Association , 2010 .

[23]  D. Sculley,et al.  Combined regression and ranking , 2010, KDD.

[24]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[25]  Maksims Volkovs,et al.  A flexible generative model for preference aggregation , 2012, WWW.

[26]  Rishabh K. Iyer,et al.  Submodular-Bregman and the Lovász-Bregman Divergences with Applications , 2012, NIPS.

[27]  Xueqi Cheng,et al.  Top-k learning to rank: labeling, ranking and evaluation , 2012, SIGIR '12.

[28]  Rishabh K. Iyer,et al.  The Lovasz-Bregman Divergence and connections to rank aggregation, clustering, and web ranking , 2013, UAI.

[29]  Xueqi Cheng,et al.  Stochastic Rank Aggregation , 2013, UAI.

[30]  Gal Elidan,et al.  Copulas in Machine Learning , 2013 .

[31]  Cheng Soon Ong,et al.  Stability of Bivariate GWAS Biomarker Detection , 2014, PloS one.

[32]  Karin M. Verspoor,et al.  Associating disease-related genetic variants in intergenic regions to the genes they impact , 2014, PeerJ.