A New Fuzzy Stacked Generalization Technique and Analysis of its Performance

In this study, a new Stacked Generalization technique called Fuzzy Stacked Generalization (FSG) is proposed to minimize the difference between N -sample and large-sample classification error of the Nearest Neighbor classifier. The proposed FSG employs a new hierarchical distance learning strategy to minimize the error difference. For this purpose, we first construct an ensemble of base-layer fuzzy k- Nearest Neighbor (k-NN) classifiers, each of which receives a different feature set extracted from the same sample set. The fuzzy membership values computed at the decision space of each fuzzy k-NN classifier are concatenated to form the feature vectors of a fusion space. Finally, the feature vectors are fed to a meta-layer classifier to learn the degree of accuracy of the decisions of the base-layer classifiers for meta-layer classification. Rather than the power of the individual base layer-classifiers, diversity and cooperation of the classifiers become an important issue to improve the overall performance of the proposed FSG. A weak base-layer classifier may boost the overall performance more than a strong classifier, if it is capable of recognizing the samples, which are not recognized by the rest of the classifiers, in its own feature space. The experiments explore the type of the collaboration among the individual classifiers required for an improved performance of the suggested architecture. Experiments on multiple feature real-world datasets show that the proposed FSG performs better than the state of the art ensemble learning algorithms such as Adaboost, Random Subspace and Rotation Forest. On the other hand, compatible performances are observed in the experiments on single feature multi-attribute datasets.

[1]  Fatos T. Yarman-Vural,et al.  Automatic Image Annotation by Ensemble of Visual Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Tony R. Martinez,et al.  Using multiple measures to predict confidence in instance classification , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[3]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Frank Nielsen,et al.  K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching , 2010, 2010 IEEE International Conference on Image Processing.

[5]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[6]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yuhua Li,et al.  Selecting Critical Patterns Based on Local Geometrical and Statistical Information , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[9]  Chris D. Nugent,et al.  Non-strict heterogeneous Stacking , 2007, Pattern Recognit. Lett..

[10]  Andrzej Drygajlo,et al.  Global and local feature based multi-classifier A-stack model for aging face identification , 2010, 2010 IEEE International Conference on Image Processing.

[11]  Romesh Nagarajah,et al.  Uncertainty Estimation Using Fuzzy Measures for Multiclass Classification , 2007, IEEE Transactions on Neural Networks.

[12]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Francisco Herrera,et al.  Integrating Instance Selection, Instance Weighting, and Feature Weighting for Nearest Neighbor Classifiers by Coevolutionary Algorithms , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  George Tzanetakis,et al.  Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs , 2009, ACM Multimedia.

[16]  Naonori Ueda,et al.  Optimal Linear Combination of Neural Networks for Improving Classification Performance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[18]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[19]  Nina Sumiko Tomita Hirata,et al.  Multilevel Training of Binary Morphological Operators , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[21]  Georgios Paliouras,et al.  Combining Information Extraction Systems Using Voting and Stacked Generalization , 2005, J. Mach. Learn. Res..

[22]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[23]  Bernard Zenko,et al.  A comparison of stacking with meta decision trees to bagging, boosting, and stacking with other methods , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  Fernando Fernández,et al.  Local Feature Weighting in Nearest Prototype Classification , 2008, IEEE Transactions on Neural Networks.

[25]  Geoff Holmes,et al.  Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking , 2010, ACML.

[26]  Elena Marchiori,et al.  Hit Miss Networks with Applications to Instance Selection , 2008, J. Mach. Learn. Res..

[27]  Remco C. Veltkamp,et al.  Spatial pyramids and two-layer stacking SVM classifiers for image categorization: A comparative study , 2009, 2009 International Joint Conference on Neural Networks.

[28]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[29]  Elena Marchiori,et al.  Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Enrique Vidal,et al.  Learning weighted metrics to minimize nearest-neighbor classification error , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Cheng Wang,et al.  Using Stacked Generalization to Combine SVMs in Magnitude and Shape Feature Spaces for Classification of Hyperspectral Data , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[33]  Ludmila I. Kuncheva,et al.  "Fuzzy" versus "nonfuzzy" in combining classifiers designed by Boosting , 2003, IEEE Trans. Fuzzy Syst..

[34]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[35]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[37]  Jun Zhou,et al.  MILIS: Multiple Instance Learning with Instance Selection , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[39]  Tal Hassner,et al.  The Action Similarity Labeling Challenge , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Fatos T. Yarman-Vural,et al.  On the Performance of Stacked Generalization Classifiers , 2008, ICIAR.

[41]  David G. Stork,et al.  Pattern Classification , 1973 .

[42]  Fatos T. Yarman-Vural,et al.  A new decision fusion technique for image classification , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[43]  Sung-Bae Cho,et al.  Multiple network fusion using fuzzy logic , 1995, IEEE Trans. Neural Networks.

[44]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[45]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[46]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[47]  K. Wallis,et al.  A note on the calculation of entropy from histograms , 2006 .

[48]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[49]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[50]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[51]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[52]  Hakan Erdogan,et al.  Linear classifier combination and selection using group sparse regularization and hinge loss , 2013, Pattern Recognit. Lett..

[53]  Chunyan Miao,et al.  Enhanced Extreme Learning Machine with stacked generalization , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[54]  Petri Toiviainen,et al.  A Matlab Toolbox for Music Information Retrieval , 2007, GfKl.

[55]  Horst M. Eidenberger,et al.  Statistical analysis of content-based MPEG-7 descriptors for image retrieval , 2004, Multimedia Systems.

[56]  O. Lartillot,et al.  A MATLAB TOOLBOX FOR MUSICAL FEATURE EXTRACTION FROM AUDIO , 2007 .

[57]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[58]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[59]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[60]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[61]  Zhi-Hua Zhou,et al.  Recognizing partially occluded, expression variant faces from single training image per person with SOM and soft k-NN ensemble , 2005, IEEE Transactions on Neural Networks.

[62]  A. A. Ghorbani,et al.  Stacked generalization in neural networks: generalization on statistically neutral problems , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[63]  Tal Hassner,et al.  Effective Unconstrained Face Recognition by Combining Multiple Descriptors and Learned Background Statistics , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.