Learning from Heterogeneous Sources via Gradient Boosting Consensus

Multiple data sources containing different types of features may be available for a given task. For instance, users’ profiles can be used to build recommendation systems. In addition, a model can also use users’ historical behaviors and social networks to infer users’ interests on related products. We argue that it is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models. We call this framework heterogeneous learning. In our proposed setting, data sources can include (i) nonoverlapping features, (ii) non-overlapping instances, and (iii) multiple networks (i.e. graphs) that connect instances. In this paper, we propose a general optimization framework for heterogeneous learning, and devise a corresponding learning model from gradient boosting. The idea is to minimize the empirical loss with two constraints: (1) There should be consensus among the predictions of overlapping instances (if any) from different data sources; (2) Connected instances in graph datasets may have similar predictions. The objective function is solved by stochastic gradient boosting trees. Furthermore, a weighting strategy is designed to emphasize informative data sources, and deemphasize the noisy ones. We formally prove that the proposed strategy leads to a tighter error bound. This approach consistently outperforms a standard concatenation of data sources on movie rating prediction, number recognition and terrorist attack detection tasks. We observe that the proposed model can improve out-of-sample error rate by as much as 80%.

[1]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[2]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[3]  Philip S. Yu,et al.  A General Model for Multiple View Unsupervised Learning , 2008, SDM.

[4]  Motoaki Kawanabe,et al.  Heterogeneous Component Analysis , 2007, NIPS.

[5]  J. Friedman Stochastic gradient boosting , 2002 .

[6]  Pavel Laskov,et al.  An Improved Decomposition Algorithm for Regression Support Vector Machines , 1999, NIPS.

[7]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[11]  Yizhou Sun,et al.  Heterogeneous source consensus learning via decision propagation and negotiation , 2009, KDD.

[12]  Sham M. Kakade,et al.  An Information Theoretic Framework for Multi-view Learning , 2008, COLT.

[13]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[14]  Jennifer Neville,et al.  Across-Model Collective Ensemble Classification , 2011, AAAI.

[15]  Deepak Agarwal,et al.  Localized factor models for multi-context recommendation , 2011, KDD.

[16]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[17]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[18]  Maria-Florina Balcan,et al.  A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[19]  Xiaoli Z. Fern,et al.  Cluster Ensembles for High Dimensional Clustering: An Empirical Study , 2006 .

[20]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[21]  Robert P. W. Duin,et al.  Neural network initialization by combined classifiers , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[22]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..