Class-distribution regularized consensus maximization for alleviating overfitting in model combination

In data mining applications such as crowdsourcing and privacy-preserving data mining, one may wish to obtain consolidated predictions out of multiple models without access to features of the data. Besides, multiple models usually carry complementary predictive information, model combination can potentially provide more robust and accurate predictions by correcting independent errors from individual models. Various methods have been proposed to combine predictions such that the final predictions are maximally agreed upon by multiple base models. Though this maximum consensus principle has been shown to be successful, simply maximizing consensus can lead to less discriminative predictions and overfit the inevitable noise due to imperfect base models. We argue that proper regularization for model combination approaches is needed to alleviate such overfitting effect. Specifically, we analyze the hypothesis spaces of several model combination methods and identify the trade-off between model consensus and generalization ability. We propose a novel model called Regularized Consensus Maximization (RCM), which is formulated as an optimization problem to combine the maximum consensus and large margin principles. We theoretically show that RCM has a smaller upper bound on generalization error compared to the version without regularization. Experiments show that the proposed algorithm outperforms a wide spectrum of state-of-the-art model combination methods on 11 tasks.

[1]  H. Paugam-Moisy,et al.  Generalization performance of multiclass discriminant models , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[2]  Fuzhen Zhuang,et al.  Combining Supervised and Unsupervised Models via Unconstrained Probabilistic Embedding , 2011, IJCAI.

[3]  Dit-Yan Yeung,et al.  Ensemble-Based Tracking: Aggregating Crowdsourced Structured Time Series Data , 2014, ICML.

[4]  Jeff G. Schneider,et al.  Maximum Margin Output Coding , 2012, ICML.

[5]  Yizhou Sun,et al.  Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models , 2009, NIPS.

[6]  Jean-David Ruvini,et al.  Probabilistic Combination of Classifier and Cluster Ensembles for Non-transductive Learning , 2013, SDM.

[7]  Ulrike von Luxburg,et al.  Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions , 2009, J. Mach. Learn. Res..

[8]  Philip S. Yu,et al.  Multilabel Consensus Classification , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Kathryn B. Laskey,et al.  Nonparametric Bayesian Clustering Ensembles , 2010, ECML/PKDD.

[10]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[11]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2011, Stat. Anal. Data Min..

[12]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[13]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[14]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[15]  R. A. Bradley The rank analysis of incomplete block designs. II. Additional tables for the method of paired comparisons. , 1954 .

[16]  Qiang Yang,et al.  Cross-task crowdsourcing , 2013, KDD.

[17]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[18]  Fei Wang,et al.  Generalized Cluster Aggregation , 2009, IJCAI.

[19]  Jennifer G. Dy,et al.  Active Learning from Crowds , 2011, ICML.

[20]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[23]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[24]  Liang Xu,et al.  Regularized spectral learning , 2005, AISTATS.

[25]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[26]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[27]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[28]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[29]  Philip S. Yu,et al.  An Iterative and Re-weighting Framework for Rejection and Uncertainty Resolution in Crowdsourcing , 2012, SDM.

[30]  Jinfeng Yi,et al.  Robust Ensemble Clustering by Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[31]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[32]  Arun Rajkumar,et al.  A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data , 2014, ICML.

[33]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[34]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[35]  Deepak S. Turaga,et al.  Consensus extraction from heterogeneous detectors to improve performance over network traffic anomaly detection , 2011, 2011 Proceedings IEEE INFOCOM.

[36]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[37]  Chih-Jen Lin,et al.  A Bayesian Approximation Method for Online Ranking , 2011, J. Mach. Learn. Res..

[38]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[39]  Joydeep Ghosh,et al.  An Optimization Framework for Semi-Supervised and Transfer Learning using Multiple Classifiers and Clusterers , 2012, ArXiv.

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .