Rating systems with multiple factors

Rating systems have been receiving increasing attention recently, especially after TrueSkill was introduced (Herbrich et al., 2007). Most existing models are based upon one latent variable associated with each player; the purpose of my project is to construct a multiple-feature model for rating players. Such a model associates more characteristics to a competitor and could – besides telling your skill and being used for matching players – provide insight into the characteristics of one’s play and strategy. We found that simply fitting the models through maximum likelihood has low generalising capacity, and also requires massive amounts of data in order to yield high accuracy. We turned towards a Bayesian approach and used Assumed density filtering and Expectation Propagation algorithms (Minka, 2001). They bring a significant accuracy bonus, even without a time series model to keep track of how players’ skills evolve. We have also implemented a version of TrueSkill adapted to our problem (game of Go) and use it for comparing our models. We present experimental evidence on the increased performance of the multiple factors models; they significantly raise the accuracy of Expectation Propagation model, and with enough data, more factors improve also the Assumed density filtering model. On small datasets, we discovered that an iterative method brings a significant advantage for Assumed density filtering, greatly surpassing even the TrueSkill algorithm. There are some additional benefits for the multiple factors approach, including higher accuracy for predicting results of balanced games.

[1]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[2]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[3]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[4]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[5]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[6]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[7]  Domonkos Tikk,et al.  Matrix factorization and neighbor based algorithms for the netflix prize problem , 2008, RecSys '08.

[8]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[9]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[10]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[11]  Tom Minka,et al.  TrueSkill Through Time: Revisiting the History of Chess , 2007, NIPS.

[12]  Tom Heskes,et al.  Expectation Propagation for Rating Players in Sports Competitions , 2007, PKDD.

[13]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[14]  Tiejian Luo,et al.  Towards an Introduction to Collaborative Filtering , 2009, 2009 International Conference on Computational Science and Engineering.

[15]  M. Glickman Parameter Estimation in Large Dynamic Paired Comparison Experiments , 1999 .

[16]  T. Minka,et al.  EP: A quick reference , 2008 .

[17]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[18]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[19]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .