OnlineCM: Real-time Consensus Classification with Missing Values

Combining predictions from multiple sources or models has been shown to be a useful technique in data mining. For example, in network anomaly detection, multiple detectors’ output have to be combined to obtain the diagnostic decisions. Unfortunately, as data are generated at an increasingly high speed, existing prediction aggregation methods are facing new challenges. First, the high velocity and hugh volume of the data render existing batch mode prediction aggregation algorithms infeasible. Second, due to the heterogeneity, predictions from multiple models or data sources might not be perfectly synchronized, leading to abundant missing values in the prediction stream. We propose OnlineCM, short for Online Consensus Maximization, to address the above challenges. OnlineCM keeps only a minimal yet sufficient footprint for both consensus prediction and missing value imputation over the prediction stream. In particular, we show that the correlations among base models or data sources are sufficient for effective consensus prediction, require small storage and can be updated in an online fashion. Further, we identify a reinforcing relationship between missing value imputation and the consensus predictions, leading to a novel consensus-based missing values imputation method, which in turn makes model correlation estimation more accurate. Experiments demonstrates that OnlineCM achieves aggregated predictions that has close performance to the batch mode consensus maximization algorithm, and outperforms baseline methods significantly in 4 large real world datasets.

[1]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[4]  LastMark Online classification of nonstationary data streams , 2002 .

[5]  Mark Last,et al.  Online classification of nonstationary data streams , 2002, Intell. Data Anal..

[6]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7]  Deepak S. Turaga,et al.  Consensus extraction from heterogeneous detectors to improve performance over network traffic anomaly detection , 2011, 2011 Proceedings IEEE INFOCOM.

[8]  Philip S. Yu,et al.  Class-distribution regularized consensus maximization for alleviating overfitting in model combination , 2014, KDD.

[9]  Jean-David Ruvini,et al.  Probabilistic Combination of Classifier and Cluster Ensembles for Non-transductive Learning , 2013, SDM.

[10]  Kathryn B. Laskey,et al.  Nonparametric Bayesian Clustering Ensembles , 2010, ECML/PKDD.

[11]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[12]  Philip S. Yu,et al.  An Iterative and Re-weighting Framework for Rejection and Uncertainty Resolution in Crowdsourcing , 2012, SDM.

[13]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[14]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[15]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[16]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2011, Stat. Anal. Data Min..

[17]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[18]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[19]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2013, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Jinfeng Yi,et al.  Robust Ensemble Clustering by Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[22]  Joydeep Ghosh,et al.  An Optimization Framework for Semi-Supervised and Transfer Learning using Multiple Classifiers and Clusterers , 2012, ArXiv.

[23]  Fuzhen Zhuang,et al.  Combining Supervised and Unsupervised Models via Unconstrained Probabilistic Embedding , 2011, IJCAI.

[24]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[25]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[26]  Yizhou Sun,et al.  Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models , 2009, NIPS.

[27]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[30]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[31]  Anna Choromanska,et al.  Online Clustering with Experts , 2012, AISTATS.

[32]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).