Assessing Generalization Ability of Majority Vote Point Classifiers

Classification algorithms have been traditionally designed to simultaneously reduce errors caused by bias as well by variance. However, there occur many situations where low generalization error becomes extremely crucial to getting tangible classification solutions, and even slight overfitting causes serious consequences in the test results. In such situations, classifiers with low Vapnik–Chervonenkis (VC) dimension can bring out positive differences due to two main advantages: 1) the classifier manages to keep the test error close to training error and 2) the classifier learns effectively with small number of samples. This paper shows that a class of classifiers named majority vote point (MVP) classifiers, on account of very low VC dimension, can exhibit a generalization error that is even lower than that of linear classifiers. This paper proceeds by theoretically formulating an upper bound on the VC dimension of the MVP classifier. Later, through empirical analysis, the trend of exact values of VC dimension is estimated. Finally, case studies on machine fault diagnosis problems and prostate tumor detection problem revalidate the fact that an MVP classifier can achieve a lower generalization error than most other classifiers.

[1]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Michalis E. Zervakis,et al.  Classification of washing machines vibration signals using discrete wavelet analysis for feature extraction , 2002, IEEE Trans. Instrum. Meas..

[3]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[4]  Gaston H. Gonnet,et al.  On the LambertW function , 1996, Adv. Comput. Math..

[5]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[6]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[7]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Yaguo Lei,et al.  Health condition identification of multi-stage planetary gearboxes using a mRVM-based method , 2015 .

[9]  S. Shelah A combinatorial problem; stability and order for models and theories in infinitary languages. , 1972 .

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[14]  J. Brian Gray,et al.  Introduction to Linear Regression Analysis , 2002, Technometrics.

[15]  Eduardo Sontag VC dimension of neural networks , 1998 .

[16]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[17]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Sarangapani Jagannathan,et al.  Mahalanobis-Taguchi System as a Multi-Sensor Based Decision Making Prognostics Tool for Centrifugal Pump Failures , 2011, IEEE Transactions on Reliability.

[20]  Isabelle Guyon,et al.  Capacity control in linear classifiers for pattern recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[21]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[22]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[23]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[24]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[25]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[26]  Xueli Yu,et al.  High pressure air compressor valve fault diagnosis using feedforward neural networks , 1995 .

[27]  Larry D. Hostetler,et al.  k-nearest-neighbor Bayes-risk estimation , 1975, IEEE Trans. Inf. Theory.

[28]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[29]  Bo-Suk Yang,et al.  Condition classification of small reciprocating compressor for refrigeration using artificial neural networks and support vector machines , 2005 .

[30]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A review on the combination of binary classifiers in multiclass problems , 2008, Artificial Intelligence Review.

[31]  Loredana Cristaldi,et al.  An Inverter-Fed Induction Motor Diagnostic Tool Based on Time-Domain Current Analysis , 2009, IEEE Transactions on Instrumentation and Measurement.

[32]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[33]  Ching Y. Suen,et al.  A theoretical analysis of the application of majority voting to pattern recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[34]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[35]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[36]  Ethem Alpaydin,et al.  Calculating the VC-dimension of decision trees , 2009, 2009 24th International Symposium on Computer and Information Sciences.

[37]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[38]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[39]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[40]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[41]  Bhim Singh,et al.  Incipient Interturn Fault Diagnosis in Induction Machines Using an Analytic Wavelet-Based Optimized Bayesian Inference , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[43]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[44]  Amanda J. C. Sharkey,et al.  On Combining Artificial Neural Nets , 1996, Connect. Sci..

[45]  Eduardo D. Sontag,et al.  Neural Networks with Quadratic VC Dimension , 1995, J. Comput. Syst. Sci..

[46]  Nishchal K. Verma,et al.  Intelligent Condition Based Monitoring Using Acoustic Signals for Air Compressors , 2016, IEEE Transactions on Reliability.