Bringing Interpretability and Visualization with Artificial Neural Networks.

Extreme Learning Machine (ELM) is a training algorithm for Single-Layer Feed-forward Neural Network (SLFN). The difference in theory of ELM from other training algorithms is in the existence of explicitly-given solution due to the immutability of initialed weights. In practice, ELMs achieve performance similar to that of other state-of-the-art training techniques, while taking much less time to train a model. Experiments show that the speedup of training ELM is up to the 5 orders of magnitude comparing to standard Error Back-propagation algorithm. ELM is a recently discovered technique that has proved its efficiency in classic regression and classification tasks, including multi-class cases. In this thesis, extensions of ELMs for non-typical for Artificial Neural Networks (ANNs) problems are presented. The first extension, described in the third chapter, allows to use ELMs to get probabilistic outputs for multi-class classification problems. The standard way of solving this type of problems is based ’majority vote’ of classifier’s raw outputs. This approach can rise issues if the penalty for misclassification is different for different classes. In this case, having probability outputs would be more useful. In the scope of this extension, two methods are proposed. Additionally, an alternative way of interpreting probabilistic outputs is proposed. ELM method prove useful for non-linear dimensionality reduction and visualization, based on repetitive re-training and re-evaluation of model. The forth chapter introduces adaptations of ELM-based visualization for classification and regression

[1]  R. Kimmel,et al.  Matching shapes by eigendecomposition of the Laplace-Beltrami operator , 2010 .

[2]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[3]  Fred L. Collopy,et al.  Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons , 1992 .

[4]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[5]  Amaury Lendasse,et al.  OP-ELM: Theory, Experiments and a Toolbox , 2008, ICANN.

[6]  Daniel Cremers,et al.  Pose-Consistent 3D Shape Segmentation Based on a Quantum Mechanical Feature Descriptor , 2011, DAGM-Symposium.

[7]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[8]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[9]  Hao Zhang,et al.  A spectral approach to shape-based retrieval of articulated 3D models , 2007, Comput. Aided Des..

[10]  Erkki Oja,et al.  GPU-accelerated and parallelized ELM ensembles for large-scale regression , 2011, Neurocomputing.

[11]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Amaury Lendasse,et al.  Proceedings of ELM-2015 Volume 2: Theory, Algorithms and Applications (II) , 2016 .

[14]  Halbert White,et al.  Chapter 9 Approximate Nonlinear Forecasting Methods , 2006 .

[15]  Ajay S. Patil,et al.  Automated Classification of Web Sites using Naive Bayesian Algorithm , 2012 .

[16]  M. Kilian,et al.  Geometric modeling in shape space , 2007, SIGGRAPH 2007.

[17]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[18]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[19]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  A. Ben Hamza,et al.  Spatially aggregating spectral descriptors for nonrigid 3D shape retrieval: a comparative survey , 2013, Multimedia Systems.

[21]  Miki Sirola,et al.  SOM based methods in early fault detection of nuclear industry , 2009, ESANN.

[22]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[23]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[24]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[25]  Raif M. Rustamov,et al.  Laplace-Beltrami eigenfunctions for deformation invariant shape representation , 2007 .

[26]  Amaury Lendasse,et al.  Probabilistic Methods for Multiclass Classification Problems , 2016 .

[27]  Amaury Lendasse,et al.  Deep Spectral Descriptors: Learning the point-wise correspondence metric via Siamese deep neural networks , 2017, ArXiv.

[28]  Amaury Lendasse,et al.  TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization , 2011, Neurocomputing.

[29]  A. Tikhonov,et al.  Numerical Methods for the Solution of Ill-Posed Problems , 1995 .

[30]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[31]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[32]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[33]  Amaury Lendasse,et al.  OP-ELM: Optimally Pruned Extreme Learning Machine , 2010, IEEE Transactions on Neural Networks.

[34]  D. Sagi,et al.  Gabor filters as texture discriminator , 1989, Biological Cybernetics.

[35]  Amaury Lendasse,et al.  Time series forecasting with SOM and local non-linear models - Application to the DAX30 index prediction , 2003 .

[36]  Alexander Vergara,et al.  On the calibration of sensor arrays for pattern recognition using the minimal number of experiments , 2014 .

[37]  Amaury Lendasse,et al.  A Methodology for Building Regression Models using Extreme Learning Machine: OP-ELM , 2008, ESANN.

[38]  Shankar Vembu,et al.  Chemical gas sensor drift compensation using classifier ensembles , 2012 .

[39]  Amaury Lendasse,et al.  Combined nonlinear visualization and classification: ELMVIS++C , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[40]  Amaury Lendasse,et al.  Linear Projection based on Noise Variance Estimation - Application to Spectral Data , 2008, ESANN.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Chee Kheong Siew,et al.  Can threshold networks be trained directly? , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[43]  Amaury Lendasse,et al.  Regularized extreme learning machine for regression with missing data , 2013, Neurocomputing.

[44]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[45]  Kunwoo Lee,et al.  Isometric shape interpolation , 2015, Comput. Graph..

[46]  Amaury Lendasse,et al.  ELMVIS+: Fast nonlinear visualization technique based on cosine distance and extreme learning machines , 2016, Neurocomputing.

[47]  Paul N. Bennett Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[48]  Meng Joo Er,et al.  A Novel Extreme Learning Control Framework of Unmanned Surface Vehicles , 2016, IEEE Transactions on Cybernetics.

[49]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[50]  Bálint Antal,et al.  An ensemble-based system for automatic screening of diabetic retinopathy , 2014, Knowl. Based Syst..

[51]  Leonidas J. Guibas,et al.  One Point Isometric Matching with the Heat Kernel , 2010, Comput. Graph. Forum.

[52]  Donald D. Lucas,et al.  Failure analysis of parameter-induced simulation crashes in climate models , 2013 .

[53]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[54]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[55]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.

[56]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[57]  Michel Verleysen,et al.  Prediction of electric load using Kohonen maps - Application to the Polish electricity consumption , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[58]  Daniela Giorgi,et al.  Discrete Laplace-Beltrami operators for shape analysis and segmentation , 2009, Comput. Graph..

[59]  Yachen Lin,et al.  Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns , 2002, Technometrics.

[60]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[61]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[62]  Meng Joo Er,et al.  Parsimonious Extreme Learning Machine Using Recursive Orthogonal Least Squares , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[63]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[64]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[65]  H. White,et al.  An additional hidden unit test for neglected nonlinearity in multilayer feedforward networks , 1989, International 1989 Joint Conference on Neural Networks.

[66]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[67]  Paul Suetens,et al.  A comparison of methods for non-rigid 3D shape retrieval , 2013, Pattern Recognit..

[68]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[69]  Alan J. Mayne,et al.  Generalized Inverse of Matrices and its Applications , 1972 .

[70]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[71]  Iasonas Kokkinos,et al.  Scale-invariant heat kernel signatures for non-rigid shape recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[72]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[73]  Chi-Man Vong,et al.  Sparse Bayesian Extreme Learning Machine for Multi-classification , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[74]  Sergio Bermejo,et al.  Oriented principal component analysis for large margin classifiers , 2001, Neural Networks.

[75]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[76]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[77]  Manjunatha K Prasad,et al.  Generalized Inverse of a Matrix and its Applications , 2011 .

[78]  B. Lévy Laplace-Beltrami Eigenfunctions Towards an algorithm , 2006 .

[79]  Chi-Man Vong,et al.  Local Receptive Fields Based Extreme Learning Machine , 2015, IEEE Computational Intelligence Magazine.

[80]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[81]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[82]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[83]  J O Westgard,et al.  Criteria for judging precision and accuracy in method development and evaluation. , 1974, Clinical chemistry.

[84]  Amaury Lendasse,et al.  X-SOM and L-SOM: A double classification approach for missing value imputation , 2010, Neurocomputing.

[85]  Amaury Lendasse,et al.  High-Performance Extreme Learning Machines: A Complete Toolbox for Big Data Applications , 2015, IEEE Access.

[86]  Amaury Lendasse,et al.  Mixture of Gaussians for distance estimation with missing data , 2014, Neurocomputing.

[87]  S. Rosenberg The Laplacian on a Riemannian Manifold: An Introduction to Analysis on Manifolds , 1997 .

[88]  Guang-Bin Huang,et al.  An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , 2014, Cognitive Computation.

[89]  Amaury Lendasse,et al.  A Two-Stage Methodology Using K-NN and False-Positive Minimizing ELM for Nominal Data Classification , 2014, Cognitive Computation.

[90]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[91]  Amaury Lendasse,et al.  Advanced query strategies for Active Learning with Extreme Learning Machines , 2017, ESANN.

[92]  Kunwoo Lee,et al.  Parametric human body shape modeling framework for human-centered product design , 2012, Comput. Aided Des..

[93]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[94]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[95]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .