So near and yet so far: New insight into properties of some well-known classifier paradigms

This article provides some new insight into the properties of four well-established classifier paradigms, namely support vector machines (SVM), classifiers based on mixture density models (CMM), fuzzy classifiers (FCL), and radial basis function neural networks (RBF). It will be shown that these classifiers can be formulated in a way such that they are functionally equivalent or at least highly similar. The interpretation of a specific classifier as being an SVM, CMM, FCL, or RBF then only depends on the objective function and the optimization algorithm used to adjust the parameters. The properties of these four paradigms, however, are very different: a discriminative classifier such as an SVM is expected to have optimal generalization capabilities on new data, a generative classifier such as a CMM also aims at modeling the processes from which the observed data originate, and a comprehensible classifier such as an FCL is intended to be parameterized and understood by human domain experts. We will discuss the advantages and disadvantages of these properties and show how they can be measured numerically in order to compare these classifiers. In such a way, the article aims at supporting a practitioner in assessing the properties of classifier paradigms and in selecting or combining certain paradigms for a given application problem.

[1]  M. Glesner,et al.  A new method for generating fuzzy classification systems using RBF neurons with extended RCE learning , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[2]  Lotfi A. Zadeh,et al.  Fuzzy Algorithms , 1968, Inf. Control..

[3]  Åke Björck,et al.  Numerical methods for least square problems , 1996 .

[4]  Olivier Sigaud,et al.  A comparison between ATNoSFERES and Learning Classifier Systems on non-Markov problems , 2008, Inf. Sci..

[5]  Yaochu Jin,et al.  An approach to rule-based knowledge extraction , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[6]  Yixin Chen,et al.  Support vector learning for fuzzy rule-based classification systems , 2003, IEEE Trans. Fuzzy Syst..

[7]  Jung-Hsien Chiang,et al.  Support vector learning mechanism for fuzzy rule-based modeling: a new approach , 2004, IEEE Trans. Fuzzy Syst..

[8]  Lalit M. Patnaik,et al.  Application of genetic programming for multicategory pattern classification , 2000, IEEE Trans. Evol. Comput..

[9]  Minqiang Li,et al.  A hybrid coevolutionary algorithm for designing fuzzy classifiers , 2009, Inf. Sci..

[10]  Ethem Alpaydin,et al.  Incremental construction of classifier and discriminant ensembles , 2009, Inf. Sci..

[11]  János Abonyi,et al.  Learning fuzzy classification rules from labeled data , 2003, Inf. Sci..

[12]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[13]  Sankar K. Pal,et al.  Rough-Fuzzy MLP: Modular Evolution, Rule Generation, and Evaluation , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Xizhao Wang,et al.  Induction of multiple fuzzy decision trees based on rough set technique , 2008, Inf. Sci..

[15]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[16]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[17]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[18]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[19]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[20]  Yuanyuan Wang,et al.  A rough margin based support vector machine , 2008, Inf. Sci..

[21]  Patrick P. K. Chan,et al.  Radial Basis Function network learning using localized generalization error bound , 2009, Inf. Sci..

[22]  Lipo Wang,et al.  Rule extraction using a novel gradient-based method and data dimensionality reduction , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[23]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[24]  Thomas S. Huang,et al.  Utilizing Information Theoretic Diversity for SVM Active Learn , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[25]  Wei-Pang Yang,et al.  Designing a classifier by a layered multi-population genetic programming approach , 2007, Pattern Recognit..

[26]  Rudolf Kruse,et al.  NEFCLASSmdash;a neuro-fuzzy approach for the classification of data , 1995, SAC '95.

[27]  Andreu Català,et al.  Rule extraction from support vector machines , 2002, ESANN.

[28]  David Casasent,et al.  Radial basis function neural networks for nonlinear Fisher discrimination and Neyman-Pearson classification , 2003, Neural Networks.

[29]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[30]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[31]  Roberto Brunelli,et al.  Identity verification through finger matching: A comparison of Support Vector Machines and Gaussian Basis Functions classifiers , 2006, Pattern Recognit. Lett..

[32]  Bernhard Sick,et al.  Learning by teaching versus learning by doing: Knowledge exchange in organic agent systems , 2009, 2009 IEEE Symposium on Intelligent Agents.

[33]  Seppo J. Ovaska,et al.  Fusion of soft and hard computing: multi-dimensional categorization of computationally intelligent hybrid systems , 2007, Neural Computing and Applications.

[34]  Bernhard Sick,et al.  Training of radial basis function classifiers with resilient propagation and variational Bayesian inference , 2009, 2009 International Joint Conference on Neural Networks.

[35]  U. Rajendra Acharya,et al.  Detection and differentiation of breast cancer using neural classifiers with first warning thermal sensors , 2007, Inf. Sci..

[36]  Robert Wagner,et al.  Technical data mining with evolutionary radial basis function classifiers , 2009, Appl. Soft Comput..

[37]  Yanqing Zhang,et al.  Support vector machines with genetic fuzzy feature transformation for biomedical data classification , 2007, Inf. Sci..

[38]  Hans-Jürgen Zimmermann,et al.  Fuzzy Set Theory - and Its Applications , 1985 .

[39]  Bernhard Sick,et al.  Goodness of Fit: Measures for a Fuzzy Classifier , 2007, 2007 IEEE Symposium on Foundations of Computational Intelligence.

[40]  Tzung-Pei Hong,et al.  Learning discriminant functions with fuzzy attributes for classification using genetic programming , 2002, Expert systems with applications.

[41]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[42]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[43]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[44]  Robert Sabourin,et al.  A dynamic overproduce-and-choose strategy for the selection of classifier ensembles , 2008, Pattern Recognit..

[45]  Joydeep Ghosh,et al.  Evaluation and ordering of rules extracted from feedforward networks , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[46]  Aristidis Likas,et al.  Shared kernel models for class conditional density estimation , 2001, IEEE Trans. Neural Networks.

[47]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[48]  David G. Stork,et al.  Pattern Classification , 1973 .

[49]  Klaus Weber,et al.  Fuzzy rules generation from data through fuzzy evaluation of fuzzy rules , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[50]  John A. Bather,et al.  Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions , 2000, The Mathematical Gazette.

[51]  Wen Gao,et al.  Classification of Facial Images Using Gaussian Mixture Models , 2001, IEEE Pacific Rim Conference on Multimedia.

[52]  B. Sick,et al.  A strategy for an efficient training of radial basis function networks for classification applications , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[53]  Stefan Wermter,et al.  Rule-extraction from radial basis function networks , 1999 .

[54]  Yi-Chung Hu,et al.  Finding useful fuzzy concepts for pattern classification using genetic algorithm , 2005, Inf. Sci..

[55]  Friedhelm Schwenker,et al.  Three learning phases for radial-basis-function networks , 2001, Neural Networks.

[56]  Aristidis Likas,et al.  A probabilistic RBF network for classification , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[57]  Alexander Hofmann,et al.  On the versatility of radial basis function neural networks: A case study in the field of intrusion detection , 2010, Inf. Sci..

[58]  Bernhard Sick,et al.  Evolutionary optimization of radial basis function classifiers for data mining applications , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[59]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[60]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[61]  Dianhui Wang,et al.  Data mining for constructing ellipsoidal fuzzy classifier with various input features using GRBF neural networks , 2002, Proceedings 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS 2002).

[62]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[63]  Bernhard Sendhoff,et al.  Extracting Interpretable Fuzzy Rules from RBF Networks , 2003, Neural Processing Letters.

[64]  Xiuju Fu,et al.  Extracting the knowledge embedded in support vector machines , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[65]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[66]  Lotfi A. Zadeh,et al.  Is there a need for fuzzy logic? , 2008, NAFIPS 2008 - 2008 Annual Meeting of the North American Fuzzy Information Processing Society.

[67]  Bo Yang,et al.  Data gravitation based classification , 2009, Inf. Sci..

[68]  Hung-Hsu Tsai,et al.  Color image watermark extraction based on support vector machines , 2007, Inf. Sci..

[69]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[70]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[71]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[72]  Seppo J. Ovaska Computationally Intelligent Hybrid Systems , 2004 .

[73]  Zhang Lei,et al.  Designing of classifiers based on immune principles and fuzzy rules , 2008, Inf. Sci..

[74]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[75]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[76]  Chuen-Tsai Sun,et al.  Functional equivalence between radial basis function networks and fuzzy inference systems , 1993, IEEE Trans. Neural Networks.

[77]  B. Sick,et al.  Techniques for the Fusion of Symbolic Rules in Distributed Organic Systems , 2006, 2006 IEEE Mountain Workshop on Adaptive and Learning Systems.

[78]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[79]  Hong Qiao,et al.  Associated evolution of a support vector machine-based classifier for pedestrian detection , 2009, Inf. Sci..

[80]  Yaochu Jin,et al.  Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement , 2000, IEEE Trans. Fuzzy Syst..

[81]  Mohammad Hossein Fazel Zarandi,et al.  Data-driven fuzzy modeling for Takagi-Sugeno-Kang fuzzy system , 2010, Inf. Sci..

[82]  Ahmad Lotfi,et al.  Comments on "Functional equivalence between radial basis function networks and fuzzy inference systems" [and reply] , 1998, IEEE Trans. Neural Networks.