Multiple models fusion for pattern classification on noise data

An important characteristic of real-world learning process is that the data frequently contains uncertainties. The uncertainties in the datasets deteriorate the learning process. Hence, to properly represent and handle the uncertainty problem is one of the key issues in the decision learning system. This paper offers a multiple models fusion method to address the uncertainty problem, by conducting the fusion of two models, Bayesian classifier and Probabilistic based Noise Aware Support Vector Machine. Specifically, we take the advantage of noise-insensitive characteristic of the Naïve Bayesian classifier, to enhance the noise-tolerant ability of probabilistic information based Support Vector Machine. The method fuses the probabilistic decision information obtained from the two classifiers in a flexible way to give the final decision. Furthermore, the multiple models fusion method is evaluated on an artificial dataset for a classification task. The experiment results show good performance when compared with using only one learning technique in the noise environment.

[1]  Xindong Wu,et al.  Bridging Local and Global Data Cleansing: Identifying Class Noise in Large, Distributed Data Datasets , 2006, Data Mining and Knowledge Discovery.

[2]  Zhi Liu,et al.  A Probabilistic Neural-Fuzzy Learning System for Stochastic Modeling , 2008, IEEE Transactions on Fuzzy Systems.

[3]  J. Ben Atkinson,et al.  Modeling and Analysis of Stochastic Systems , 1996 .

[4]  Hung T. Nguyen,et al.  Uncertainty Models for Knowledge-Based Systems; A Unified Approach to the Measurement of Uncertainty , 1985 .

[5]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[6]  Malik Magdon-Ismail,et al.  No Free Lunch for Early Stopping , 1999, Neural Computation.

[7]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[10]  Andrew W. Moore,et al.  Probabilistic noise identification and data cleaning , 2003, Third IEEE International Conference on Data Mining.

[11]  Hui Xiong,et al.  Enhancing data analysis with noise removal , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[13]  Sally A. Goldman,et al.  Can PAC learning algorithms tolerate random attribute noise? , 1995, Algorithmica.

[14]  Carl E. Rasmussen,et al.  Gaussian Process Training with Input Noise , 2011, NIPS.

[15]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[16]  Han-Xiong Li,et al.  A probabilistic support vector machine for uncertain data , 2009, 2009 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications.

[17]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[18]  Zhi Liu,et al.  A probabilistic fuzzy logic system for modeling and control , 2005, IEEE Transactions on Fuzzy Systems.

[19]  Carla E. Brodley,et al.  Improving automated land cover mapping by identifying and eliminating mislabeled observations from training data , 1996, IGARSS '96. 1996 International Geoscience and Remote Sensing Symposium.