Enhanced statistical rankings via targeted data collection

Given a graph where vertices represent alternatives and pairwise comparison data, yij, is given on the edges, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with pairwise comparisons. We study the dependence of the statistical ranking problem on the available pairwise data, i.e., pairs (i,j) for which the pairwise comparison data yij is known, and propose a framework to identify data which, when augmented with the current dataset, maximally increases the Fisher information of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding an edge set on the graph (with a fixed number of edges) such that the second eigenvalue of the graph Laplacian is maximal. This reduction of the data collection problem to a spectral graph-theoretic question is one of the primary contributions of this work. As an application, we study the Yahoo! Movie user rating dataset and demonstrate that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking.

[1]  Shane T. Jensen,et al.  Adaptive Paired Comparison Design , 2005 .

[2]  A. Elo The rating of chessplayers, past and present , 1978 .

[3]  F. Pukelsheim Optimal Design of Experiments (Classics in Applied Mathematics) (Classics in Applied Mathematics, 50) , 2006 .

[4]  P. Moran On the method of paired comparisons. , 1947, Biometrika.

[5]  Stephen P. Boyd,et al.  The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem , 2006, SIAM Rev..

[6]  James P. Keener,et al.  The Perron-Frobenius Theorem and the Ranking of Football Teams , 1993, SIAM Rev..

[7]  László Lovász,et al.  Chip-firing Games on Graphs , 1991, Eur. J. Comb..

[8]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[9]  Anil N. Hirani,et al.  Least Squares Ranking on Graphs , 2010, 1011.1716.

[10]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[11]  Robert D. Nowak,et al.  Active Ranking using Pairwise Comparisons , 2011, NIPS.

[12]  Peter F. Stadler,et al.  Laplacian Eigenvectors of Graphs , 2007 .

[13]  Nir Ailon,et al.  An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity , 2010, J. Mach. Learn. Res..

[14]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[15]  Peter J Mucha,et al.  Visualization of communities in networks. , 2009, Chaos.

[16]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[17]  S. Osher,et al.  Statistical ranking using the $l^{1}$-norm on graphs , 2013 .

[18]  Yuan Yao,et al.  Statistical ranking and combinatorial Hodge theory , 2008, Math. Program..

[19]  Stephen P. Boyd,et al.  Upper bounds on algebraic connectivity via convex optimization , 2006 .

[20]  Stephen P. Boyd,et al.  Growing Well-connected Graphs , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[21]  Stanley Osher,et al.  Optimal Data Collection for Improved Rankings Expose Well-Connected Graphs , 2012, ArXiv.

[22]  David J. Hand,et al.  Who's #1? The science of rating and ranking , 2012 .

[23]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[24]  G. Quinn,et al.  Experimental Design and Data Analysis for Biologists , 2002 .

[25]  E. Haber,et al.  Numerical methods for experimental design of large-scale linear ill-posed inverse problems , 2008 .

[26]  Eldad Haber,et al.  Experimental Design for Biological Systems , 2012, SIAM J. Control. Optim..

[27]  Qingming Huang,et al.  Random partial paired comparison for subjective video quality assessment via hodgerank , 2011, ACM Multimedia.

[28]  Matthias W. Seeger,et al.  Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models , 2011, SIAM J. Imaging Sci..

[29]  Michael Jackson,et al.  Optimal Design of Experiments , 1994 .

[30]  Piet Van Mieghem,et al.  Algebraic connectivity optimization via link addition , 2008, BIONETICS.

[31]  Mason A. Porter,et al.  Random Walker Ranking for NCAA Division I-A Football , 2007, Am. Math. Mon..

[32]  Damon Mosk-Aoyama,et al.  Maximum algebraic connectivity augmentation is NP-hard , 2008, Operations Research Letters.

[33]  Michael William Newman,et al.  The Laplacian spectrum of graphs , 2001 .