KDD Cup 2013 - author-paper identification challenge: second place team

This paper describes our submission to the KDD Cup 2013 Track 1 Challenge: Author-Paper Indentification in the Microsoft Academic Search database. Our approach is based on Gradient Boosting Machine (GBM) of Friedman ([5]) and deep feature engineering. The method was second in the final standings with Mean Average Precision (MAP) of 0.98144, while the winning submission scored 0.98259.

[1]  Douglas W Mahoney,et al.  Linear mixed effects models. , 2007, Methods in molecular biology.

[2]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[5]  Tomasz Burzykowski,et al.  Linear Mixed Effects Model , 2021, Encyclopedia of Gerontology and Population Aging.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Mireia Díez,et al.  On the use of phone log-likelihood ratios as features in spoken language recognition , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[8]  Jianfeng Gao,et al.  Ranking, Boosting, and Model Adaptation , 2008 .

[9]  Martine De Cock,et al.  The Microsoft academic search dataset and KDD Cup 2013 , 2013, KDD Cup '13.

[10]  Huanhuan Chen,et al.  Negative correlation learning for classification ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[11]  Tomasz Burzykowski,et al.  Linear Mixed-Effects Models Using R: A Step-by-Step Approach , 2013 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[13]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[14]  Andrzej T. Galecki,et al.  Linear mixed-effects models using R , 2013 .