Optimal Data Collection for Improved Rankings Expose Well-Connected Graphs

Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking problem has maximal information. Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding graphs with large algebraic connectivity. This reduction of the data collection problem to graph-theoretic questions is one of the primary contributions of this work. As an application, we study the 2011-12 NCAA football schedule and propose schedules with the same number of games which are significantly more informative. Using spectral clustering methods to identify highly-connected communities within the division, we argue that the NCAA could improve its notoriously poor rankings by simply scheduling more out-of-conference games.

[1]  James P. Keener,et al.  The Perron-Frobenius Theorem and the Ranking of Football Teams , 1993, SIAM Rev..

[2]  Damon Mosk-Aoyama,et al.  Maximum algebraic connectivity augmentation is NP-hard , 2008, Operations Research Letters.

[3]  Eldad Haber,et al.  Experimental Design for Biological Systems , 2012, SIAM J. Control. Optim..

[4]  Michael William Newman,et al.  The Laplacian spectrum of graphs , 2001 .

[5]  Peter F. Stadler,et al.  Laplacian Eigenvectors of Graphs , 2007 .

[6]  Anil N. Hirani,et al.  Least Squares Ranking on Graphs , 2010, 1011.1716.

[7]  Stephen P. Boyd,et al.  The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem , 2006, SIAM Rev..

[8]  Fan Chung Graham,et al.  On the Spectra of General Random Graphs , 2011, Electron. J. Comb..

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  V. Melas Functional Approach to Optimal Experimental Design , 2005 .

[11]  Piet Van Mieghem,et al.  Algebraic connectivity optimization via link addition , 2008, BIONETICS.

[12]  Mason A. Porter,et al.  Random Walker Ranking for NCAA Division I-A Football , 2007, Am. Math. Mon..

[13]  David J. Hand,et al.  Who's #1? The science of rating and ranking , 2012 .

[14]  S. Osher,et al.  Statistical ranking using the $l^{1}$-norm on graphs , 2013 .

[15]  R. Oliveira Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges , 2009, 0911.0600.

[16]  A. Jamakovic,et al.  On the relationship between the algebraic connectivity and graph's robustness to node and link failures , 2007, 2007 Next Generation Internet Networks.

[17]  Qingming Huang,et al.  Random partial paired comparison for subjective video quality assessment via hodgerank , 2011, ACM Multimedia.

[18]  Matthias W. Seeger,et al.  Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models , 2011, SIAM J. Imaging Sci..

[19]  Michael Jackson,et al.  Optimal Design of Experiments , 1994 .

[20]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[21]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[22]  E. Haber,et al.  Numerical methods for experimental design of large-scale linear ill-posed inverse problems , 2008 .

[23]  Yoav Shoham,et al.  On the complexity of schedule control problems for knockout tournaments , 2009, AAMAS.

[24]  P. Moran On the method of paired comparisons. , 1947, Biometrika.

[25]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[26]  Peter J Mucha,et al.  Visualization of communities in networks. , 2009, Chaos.

[27]  E Ben-Naim,et al.  Efficiency of competitions. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Stephen P. Boyd,et al.  Growing Well-connected Graphs , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[29]  Shaun M. Fallat,et al.  On graphs with algebraic connectivity equal to minimum edge density , 2003 .

[30]  Stephen P. Boyd,et al.  Minimizing Effective Resistance of a Graph , 2008, SIAM Rev..

[31]  Lawrence Carin,et al.  Active learning for online bayesian matrix factorization , 2012, KDD.

[32]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[33]  J J DiStefano,et al.  Tracer experiment design for unique identification of nonlinear physiological systems. , 1976, The American journal of physiology.

[34]  László Lovász,et al.  Chip-firing Games on Graphs , 1991, Eur. J. Comb..

[35]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[36]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[37]  E. Haber,et al.  Optimal Experimental Design for the Large‐Scale Nonlinear Ill‐Posed Problem of Impedance Imaging , 2010 .

[38]  Shane T. Jensen,et al.  Adaptive Paired Comparison Design , 2005 .

[39]  Yuan Yao,et al.  Statistical ranking and combinatorial Hodge theory , 2008, Math. Program..

[40]  Stephen P. Boyd,et al.  Upper bounds on algebraic connectivity via convex optimization , 2006 .

[41]  Mark E. Glickman,et al.  Bayesian locally optimal design of knockout tournaments , 2008 .

[42]  Clayton D’Souza March Optimising a Tournament for Use with Ranking Algorithms , 2010 .

[43]  G. Grimmett,et al.  Probability on Graphs: Random Processes on Graphs and Lattices , 2018 .

[44]  Philip A. Scarf,et al.  A numerical study of tournament structure and seeding policy for the soccer World Cup Finals , 2011 .