Choosing the proper autoencoder for feature fusion based on data complexity and classifiers: Analysis, tips and guidelines

Abstract Classifying data patterns is one of the most recurrent applications in machine learning. The number of input features influences the predictive performance of many classification models. Most classifiers work with high-dimensional spaces. Therefore, there is a great interest in facing the task of reducing the input space. Manifold learning has been shown to perform better than classical dimensionality reduction approaches, such as Principal Component Analysis and Linear Discriminant Analysis. In this sense, Autoencoders (AEs) provide an automated way of performing feature fusion, finding the best manifold to reconstruct the data. There are several models and architectures of AEs. For this reason, in this study an exhaustive analysis of the predictive performance of different AEs models with a large number of datasets is proposed, aiming to provide a set of useful guidelines. These will allow users to choose the appropriate AE model for each case, depending on data traits and the classifier to be used. A thorough empirical analysis is conducted including four AE models, four classification paradigms and a group of datasets with a variety of traits. A convenient set of rules to follow is obtained as a result.

[1]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[2]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[3]  Teh Ying Wah,et al.  Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions , 2019, Inf. Fusion.

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  Jie Liu,et al.  Fusion of Low-level Features with Stacked Autoencoder for Condition based Monitoring of Machines , 2018, 2018 IEEE International Conference on Prognostics and Health Management (ICPHM).

[6]  Yoshua Bengio,et al.  A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.

[7]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[8]  Guangfeng Lin,et al.  Feature structure fusion and its application , 2014, Inf. Fusion.

[9]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[10]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[11]  Laurence T. Yang,et al.  A survey on deep learning for big data , 2018, Inf. Fusion.

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  Weihua Li,et al.  Multisensor Feature Fusion for Bearing Fault Diagnosis Using Sparse Autoencoder and Deep Belief Network , 2017, IEEE Transactions on Instrumentation and Measurement.

[14]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[15]  Yaonan Wang,et al.  Autoencoder With Invertible Functions for Dimension Reduction and Image Reconstruction , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[18]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[19]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[20]  Francisco Charte,et al.  A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines , 2018, Inf. Fusion.

[21]  Weifeng Liu,et al.  Multiview dimension reduction via Hessian multiset canonical correlations , 2018, Inf. Fusion.

[22]  H. Robbins A Stochastic Approximation Method , 1951 .

[23]  Ya Ju Fan,et al.  Autoencoder Node Saliency: Selecting Relevant Latent Representations , 2017, Pattern Recognit..

[24]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[25]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[26]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[27]  R. Bellman Dynamic programming. , 1957, Science.

[28]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[29]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[30]  Francisco Charte,et al.  AEkNN: An AutoEncoder kNN-based classifier with built-in dimensionality reduction , 2018, Int. J. Comput. Intell. Syst..

[31]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[32]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[33]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[34]  Yoshua Bengio,et al.  Boosting Neural Networks , 2000, Neural Computation.

[35]  Davide Anguita,et al.  Transition-Aware Human Activity Recognition Using Smartphones , 2016, Neurocomputing.

[36]  Erik Marchi,et al.  Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[37]  Yoshua Bengio,et al.  Training Methods for Adaptive Boosting of Neural Networks , 1997, NIPS.

[38]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[39]  Sukhendu Das,et al.  A Survey of Decision Fusion and Feature Fusion Strategies for Pattern Classification , 2010, IETE Technical Review.

[40]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[41]  Wei Wang,et al.  Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[42]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[43]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[44]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[45]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[46]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[47]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[48]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[49]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[50]  Jianjun Li A two-step rejection procedure for testing multiple hypotheses , 2008 .

[51]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[53]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[54]  Yan Cui,et al.  Feature Learning Using Stacked Autoencoder for Shared and Multimodal Fusion of Medical Images , 2018, Computational Intelligence: Theories, Applications and Future Directions - Volume I.

[55]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[56]  R Hecht-Nielsen,et al.  Replicator neural networks for universal optimal source coding. , 1995, Science.

[57]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[58]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[59]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[60]  José Manuel Benítez,et al.  Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS , 2012 .

[61]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[62]  Shankar Vembu,et al.  Chemical gas sensor drift compensation using classifier ensembles , 2012 .

[63]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[64]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[65]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[66]  Weifeng Liu,et al.  Correntropy: A Localized Similarity Measure , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[67]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[68]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[69]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[70]  Lotfi A. Zadeh,et al.  Outline of a New Approach to the Analysis of Complex Systems and Decision Processes , 1973, IEEE Trans. Syst. Man Cybern..

[71]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[72]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[73]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[74]  David G. Stork,et al.  Pattern Classification , 1973 .

[75]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[76]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[77]  Zhaohui Wu,et al.  Robust feature learning by stacked autoencoder with maximum correntropy criterion , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[78]  Qi He,et al.  Deep learning with multi-scale feature fusion in remote sensing for automatic oceanic eddy detection , 2019, Inf. Fusion.

[79]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[80]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[81]  Dora Blanco Heras,et al.  Stacked Autoencoders for Multiclass Change Detection in Hyperspectral Images , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[82]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[83]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[84]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[85]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[86]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[87]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.