Generalized Hidden-Mapping Minimax Probability Machine for the training and reliability learning of several classical intelligent models

Abstract Minimax Probability Machine (MPM) is a binary classifier that optimizes the upper bound of the misclassification probability. This upper bound of the misclassification probability can be used as an explicit indicator to characterize the reliability of the classification model and thus makes the classification model more transparent. However, the existing related work is constrained to linear models or the corresponding nonlinear models by applying the kernel trick. To relax such constraints, we propose the Generalized Hidden-Mapping Minimax Probability Machine (GHM-MPM). GHM-MPM is a generalized MPM. It is capable of training many classical intelligent models, such as feedforward neural networks, fuzzy logic systems, and linear and kernelized linear models for classification tasks, and realizing the reliability learning of these models simultaneously. Since the GHM-MPM, similarly to the classical MPM, was originally developed only for binary classification, it is further extended to multi-class classification by using the obtained reliability indices of the binary classifiers of two arbitrary classes. The experimental results show that GHM-MPM makes the trained models more transparent and reliable than those trained by classical methods.

[1]  Edwin Lughofer,et al.  Single-pass active learning with conflict and ignorance , 2012, Evolving Systems.

[2]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[3]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[4]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[5]  Chuen-Tsai Sun,et al.  Functional equivalence between radial basis function networks and fuzzy inference systems , 1993, IEEE Trans. Neural Networks.

[6]  Yi Lu Murphey,et al.  Multi-class pattern classification using neural networks , 2007, Pattern Recognit..

[7]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[8]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[9]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[10]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[11]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[12]  Longbing Cao,et al.  Minimax Probability TSK Fuzzy System Classifier: A More Transparent and Highly Interpretable Classification Model , 2015, IEEE Transactions on Fuzzy Systems.

[13]  Stephen P. Boyd,et al.  Applications of second-order cone programming , 1998 .

[14]  Michael I. Jordan,et al.  Minimax Probability Machine , 2001, NIPS.

[15]  Lai-Wan Chan,et al.  The Minimum Error Minimax Probability Machine , 2004, J. Mach. Learn. Res..

[16]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[17]  Edwin Lughofer,et al.  Reliable All-Pairs Evolving Fuzzy Classifiers , 2013, IEEE Transactions on Fuzzy Systems.

[18]  Zhaohong Deng,et al.  Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation , 2011, IEEE Transactions on Fuzzy Systems.

[19]  Zhaohong Deng,et al.  Generalized Hidden-Mapping Ridge Regression, Knowledge-Leveraged Inductive Transfer Learning for Neural Networks, Fuzzy Systems and Kernel Methods , 2014, IEEE Transactions on Cybernetics.

[20]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[21]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[22]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Kaizhu Huang,et al.  Imbalanced learning with a biased minimax probability machine , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[26]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[27]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[28]  Gregory Z. Grudic,et al.  A Formulation for Minimax Probability Machine Regression , 2002, NIPS.

[29]  Roderick Murray-Smith,et al.  Extending the functional equivalence of radial basis function networks and fuzzy inference systems , 1996, IEEE Trans. Neural Networks.

[30]  Marco Muselli,et al.  Binary Rule Generation via Hamming Clustering , 2002, IEEE Trans. Knowl. Data Eng..