Joint sparse learning for classification ensemble

Ensemble methods use multiple classifiers to achieve better decisions than could be achieved using any of the constituent classifiers alone. However, both theoretical and experimental evidence have shown that very large ensembles are not necessarily superior, and small ensembles can often achieve better results. In this paper, we show how to combine a set of weak classifiers into a robust ensemble by using a joint sparse representation method, which assigns a sparse coefficient vector to the decision of each classifier. The sparse vector contains many zero entries, and thus the final ensemble only employs a small number of classifiers, corresponding to non-zero entries. Training data are partitioned into several sub-groups to generate sub-underdetermined systems. The joint sparse method enables these sub-groups to then share their information about individual classifiers, to obtain an improved overall classification. Partitioning the training dataset into subgroups makes the proposed joint sparse ensemble method parallelizable, making it suitable for large scale problems. In contrast, previous work on sparse approaches to ensemble learning was limited to datasets smaller than the number of classifiers. Two different strategies are described for generating the sub-underdetermined systems, and experiments show these to be effective when tested with two different data manipulation methods. Experiments evaluate the performance of the joint sparse ensemble learning method in comparison to five other state-of-the-art methods from the literature, each designed to train small and efficient ensembles. Results suggest that joint sparse ensemble learning outperforms other algorithms on most datasets.

[1]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[2]  Rama Chellappa,et al.  Joint Sparse Representation for Robust Multimodal Biometrics Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[5]  Fang Liu,et al.  A compressed sensing approach for efficient ensemble learning , 2014, Pattern Recognit..

[6]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[7]  Allen Y. Yang,et al.  Fast ℓ1-minimization algorithms and an application in robust face recognition: A review , 2010, 2010 IEEE International Conference on Image Processing.

[8]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[9]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[10]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[11]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[12]  Quan Pan,et al.  Deformable Dictionary Learning for SAR Image Change Detection , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[13]  L. Kuncheva,et al.  Combining classifiers: Soft computing solutions. , 2001 .

[14]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[15]  Li Zhang,et al.  Sparse ensembles using weighted combination methods based on linear programming , 2011, Pattern Recognit..

[16]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[17]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[23]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[24]  Xin Yao,et al.  Making use of population information in evolutionary artificial neural networks , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[25]  Shuiping Gou,et al.  Greedy optimization classifiers ensemble based on diversity , 2011, Pattern Recognit..

[26]  Alberto Suárez,et al.  Aggregation Ordering in Bagging , 2004 .

[27]  Fang Liu,et al.  Compressive Sensing SAR Image Reconstruction Based on Bayesian Framework and Evolutionary Computation , 2011, IEEE Transactions on Image Processing.

[28]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[32]  Huanhuan Chen,et al.  Predictive Ensemble Pruning by Expectation Propagation , 2009, IEEE Transactions on Knowledge and Data Engineering.

[33]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[34]  D. Steinberg CART: Classification and Regression Trees , 2009 .

[35]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[36]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[37]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[38]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[39]  Licheng Jiao,et al.  Kernel matching pursuit classifier ensemble , 2006, Pattern Recognit..

[40]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[41]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[42]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[43]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[44]  Xin Yao,et al.  An Evolutionary Multiobjective Approach to Sparse Reconstruction , 2014, IEEE Transactions on Evolutionary Computation.

[45]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[46]  L. Breiman Arcing Classifiers , 1998 .