GBC: Gradient boosting consensus model for heterogeneous data †

With the rapid development of database technologies, multiple data sources may be available for a given learning task e.g. collaborative filtering. However, the data sources may contain different types of features. For example, users' profiles can be used to build recommendation systems. In addition, a model can also use users' historical behaviors and social networks to infer users' interests on related products. We argue that it is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models. We call this framework heterogeneous learning. In our proposed setting, data sources can include i nonoverlapping features, ii nonoverlapping instances, and iii multiple networks i.e. graphs that connect instances. In this paper, we propose a general optimization framework for heterogeneous learning, and devise a corresponding learning model from gradient boosting. The idea is to minimize the empirical loss with two constraints: 1 there should be consensus among the predictions of overlapping instances if any from different data sources; 2 connected instances in graph datasets may have similar predictions. The objective function is solved by stochastic gradient boosting trees. Furthermore, a weighting strategy is designed to emphasize informative data sources, and deemphasize the noisy ones. We formally prove that the proposed strategy leads to a tighter error bound. This approach consistently outperforms a standard concatenation of data sources on movie rating prediction, number recognition, and terrorist attack detection tasks. Furthermore, the approach is evaluated on AT&T's distributed database with over 500 000 instances, 91 different data sources, and over 45 000 000 joined features. We observe that the proposed model can improve out-of-sample error rate substantially.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Jennifer Neville,et al.  Across-Model Collective Ensemble Classification , 2011, AAAI.

[3]  J. Friedman Stochastic gradient boosting , 2002 .

[4]  Philip S. Yu,et al.  Dimensionality Reduction on Heterogeneous Feature Space , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[6]  Pavel Laskov,et al.  An Improved Decomposition Algorithm for Regression Support Vector Machines , 1999, NIPS.

[7]  Lusheng Ji,et al.  A first look at cellular machine-to-machine traffic: large scale measurement and characterization , 2012, SIGMETRICS '12.

[8]  Yizhou Sun,et al.  Heterogeneous source consensus learning via decision propagation and negotiation , 2009, KDD.

[9]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[10]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[11]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[12]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[13]  Philip S. Yu,et al.  A General Model for Multiple View Unsupervised Learning , 2008, SDM.

[14]  Motoaki Kawanabe,et al.  Heterogeneous Component Analysis , 2007, NIPS.

[15]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[16]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[17]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[18]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[19]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Xiaoli Z. Fern,et al.  Cluster Ensembles for High Dimensional Clustering: An Empirical Study , 2006 .

[21]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[22]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[23]  Sham M. Kakade,et al.  An Information Theoretic Framework for Multi-view Learning , 2008, COLT.

[24]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[25]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[26]  Maria-Florina Balcan,et al.  A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[27]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[28]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[29]  Deepak Agarwal,et al.  Localized factor models for multi-context recommendation , 2011, KDD.

[30]  Philip S. Yu,et al.  Transfer Learning on Heterogenous Feature Spaces via Spectral Transformation , 2010, 2010 IEEE International Conference on Data Mining.

[31]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[32]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[33]  Robert P. W. Duin,et al.  Neural network initialization by combined classifiers , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[34]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[35]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[36]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[37]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[38]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[39]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[40]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[41]  Sridhar Mahadevan,et al.  Manifold alignment using Procrustes analysis , 2008, ICML '08.

[42]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.