Brief Introduction to Statistical Machine Learning

In this chapter, an overview of the theory of probability, statistical and machine learning is made covering the main ideas and the most popular and widely used methods in this area. As a starting point, the randomness and determinism as well as the nature of the real-world problems are discussed. Then, the basic and well-known topics of the traditional probability theory and statistics including the probability mass and distribution, probability density and moments, density estimation, Bayesian and other branches of the probability theory, are recalled followed by a analysis. The well-known data pre-processing techniques, unsupervised and supervised machine learning methods are covered. These include a brief introduction of the distance metrics, normalization and standardization, feature selection, orthogonalization as well as a review of the most representative clustering, classification, regression and prediction approaches of various types. In the end, the topic of image processing is also briefly covered including the popular image transformation techniques, and a number of image feature extraction techniques at three different levels. © 2019, Springer Nature Switzerland AG.

[1]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[3]  Plamen P. Angelov,et al.  Recursive SVM Based on TEDA , 2015, SLDS.

[4]  Sarah Jane Delany k-Nearest Neighbour Classifiers , 2007 .

[5]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[8]  B. Jaumard,et al.  Efficient algorithms for divisive hierarchical clustering with the diameter criterion , 1990 .

[9]  Plamen P. Angelov,et al.  Simpl_eClass: Simplified potential-free evolving fuzzy rule-based classifiers , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[10]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[11]  Plamen P. Angelov,et al.  Symbol recognition with a new autonomously evolving classifier autoclass , 2014, 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  Plamen Angelov,et al.  Autonomously evolving classifier TEDAClass , 2016, Inf. Sci..

[14]  Tengke Xiong,et al.  DHCC: Divisive hierarchical clustering of categorical data , 2011, Data Mining and Knowledge Discovery.

[15]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[16]  Robert R. Korfhage,et al.  A distance and angle similarity measure method , 1999 .

[17]  Rauf Izmailov,et al.  Statistical Inference Problems and Their Rigorous Solutions - In memory of Alexey Chervonenkis , 2015, SLDS.

[18]  Xiaofeng Wang,et al.  Human action recognition using transfer learning with deep representations , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[19]  Claudia-Adina Dragos,et al.  Online identification of evolving Takagi-Sugeno-Kang fuzzy models for crane systems , 2014, Appl. Soft Comput..

[20]  Paramasivan Saratchandran,et al.  Sequential Adaptive Fuzzy Inference System (SAFIS) for nonlinear system identification and prediction , 2006, Fuzzy Sets Syst..

[21]  William Leigh,et al.  Forecasting the New York stock exchange composite index with past price and interest rate on condition of volume spike , 2005, Expert Syst. Appl..

[22]  Wenzhong Guo,et al.  Land-Use Classification via Extreme Learning Classifier Based on Deep Convolutional Features , 2017, IEEE Geoscience and Remote Sensing Letters.

[23]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Plamen P. Angelov,et al.  Evolving local means method for clustering of streaming data , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[25]  William Stafford Noble,et al.  Support vector machine , 2013 .

[26]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Vidya Setlur,et al.  A Linguistic Approach to Categorical Color Assignment for Data Visualization , 2016, IEEE Transactions on Visualization and Computer Graphics.

[29]  R. Brereton,et al.  The Mahalanobis distance and its relationship to principal component scores , 2015 .

[30]  Max Kuhn,et al.  Data Pre-processing , 2013 .

[31]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[32]  George A. F. Seber,et al.  Linear regression analysis , 1977 .

[33]  Fernando Gomide,et al.  Interval Approach for Evolving Granular System Modeling , 2012 .

[34]  Xiaowei Gu,et al.  Self-Organised direction aware data partitioning algorithm , 2018, Inf. Sci..

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  G. J. Babu,et al.  Linear regression in astronomy. II , 1990 .

[37]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[38]  Plamen P. Angelov,et al.  Online evolving fuzzy rule-based prediction model for high frequency trading financial data stream , 2016, 2016 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[39]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[40]  R. Casey Moment normalization of handprinted characters , 1970 .

[41]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[42]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[43]  Judi Scheffer,et al.  Dealing with Missing Data , 2020, The Big R‐Book.

[44]  Edwin Lughofer,et al.  FLEXFIS: A Robust Incremental Learning Approach for Evolving Takagi–Sugeno Fuzzy Models , 2008, IEEE Transactions on Fuzzy Systems.

[45]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[46]  M. M. El-gayar,et al.  A comparative study of image low level feature extraction algorithms , 2013 .

[47]  D. M. Titterington,et al.  Variational approximations in Bayesian model selection for finite mixture distributions , 2007, Comput. Stat. Data Anal..

[48]  M. Unser,et al.  Interpolation revisited [medical images application] , 2000, IEEE Transactions on Medical Imaging.

[49]  Hisao Ishibuchi,et al.  Classification and modeling with linguistic information granules - advanced approaches to linguistic data mining , 2004, Advanced information processing.

[50]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[51]  L X Wang,et al.  Fuzzy basis functions, universal approximation, and orthogonal least-squares learning , 1992, IEEE Trans. Neural Networks.

[52]  Gaurav Kumar,et al.  A Detailed Review of Feature Extraction in Image Processing Systems , 2014, 2014 Fourth International Conference on Advanced Computing & Communication Technologies.

[53]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[54]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[55]  Plamen Angelov,et al.  A fully autonomous Data Density based Clustering technique , 2014, 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS).

[56]  Simon Haykin,et al.  Communication Systems , 1978 .

[57]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[58]  Jae Won Lee,et al.  Content-based image classification using a neural network , 2004, Pattern Recognit. Lett..

[59]  D.P. Filev,et al.  An approach to online identification of Takagi-Sugeno fuzzy models , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[60]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[61]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[62]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[64]  Nikola K. Kasabov,et al.  DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction , 2002, IEEE Trans. Fuzzy Syst..

[65]  Plamen Angelov,et al.  Fully online clustering of evolving data streams into arbitrarily shaped clusters , 2017, Inf. Sci..

[66]  Hisao Ishibuchi,et al.  Selecting fuzzy if-then rules for classification problems using genetic algorithms , 1995, IEEE Trans. Fuzzy Syst..

[67]  Plamen Angelov,et al.  Evolving Intelligent Systems: Methodology and Applications , 2010 .

[68]  Ebrahim H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Hum. Comput. Stud..

[69]  P. Angelov,et al.  Evolving rule-based models: A tool for intelligent adaptation , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[70]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[71]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[72]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[73]  Hwang Soo Lee,et al.  Adaptive image interpolation based on local gradient features , 2004, IEEE Signal Process. Lett..

[74]  N. Sundararajan,et al.  Extended sequential adaptive fuzzy inference system for classification problems , 2011, Evol. Syst..

[75]  R. Keys Cubic convolution interpolation for digital image processing , 1981 .

[76]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[77]  Plamen P. Angelov,et al.  Evolving Fuzzy-Rule-Based Classifiers From Data Streams , 2008, IEEE Transactions on Fuzzy Systems.

[78]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[79]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[80]  Plamen P. Angelov,et al.  On-line Design of Takagi-Sugeno Models , 2003, IFSA.

[81]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[82]  E. Lughofer,et al.  Evolving fuzzy classifiers using different model architectures , 2008, Fuzzy Sets Syst..

[83]  Plamen P. Angelov,et al.  Architectures for evolving fuzzy rule-based classifiers , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[84]  Plamen Angelov,et al.  Evolving Rule-Based Models: A Tool For Design Of Flexible Adaptive Systems , 2002 .

[85]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[86]  Plamen P. Angelov,et al.  Fuzzily Connected Multimodel Systems Evolving Autonomously From Data Streams , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[87]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[88]  Plamen Angelov,et al.  Autonomous Learning Systems: From Data Streams to Knowledge in Real-time , 2013 .

[89]  Plamen P. Angelov,et al.  PANFIS: A Novel Incremental Learning Machine , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[90]  Plamen P. Angelov,et al.  A new type of distance metric and its use for clustering , 2017, Evol. Syst..

[91]  V. Nagesh,et al.  Metabolic alterations: A biomarker for radiation‐induced normal brain injury—an MR spectroscopy study , 2009, Journal of magnetic resonance imaging : JMRI.

[92]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[93]  Jean-Michel Marin,et al.  Bayesian Modelling and Inference on Mixtures of Distributions , 2005 .

[94]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[95]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[96]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[97]  Plamen P. Angelov,et al.  IRootLab: a free and open-source MATLAB toolbox for vibrational biospectroscopy data analysis , 2013, Bioinform..

[98]  Christopher Clapham,et al.  The Concise Oxford Dictionary of Mathematics , 1990 .

[99]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[100]  Plamen P. Angelov,et al.  An approach for fuzzy rule-base adaptation using on-line clustering , 2004, Int. J. Approx. Reason..

[101]  Plamen P. Angelov,et al.  Identification of evolving fuzzy rule-based models , 2002, IEEE Trans. Fuzzy Syst..

[102]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[103]  Thomas Martin Deserno,et al.  Survey: interpolation methods in medical image processing , 1999, IEEE Transactions on Medical Imaging.

[104]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[105]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[106]  Dominic Welsh,et al.  Probability: An Introduction , 1986 .

[107]  Plamen P. Angelov,et al.  Correntropy-Based Evolving Fuzzy Neural System , 2018, IEEE Transactions on Fuzzy Systems.

[108]  Kuo-Lung Wu,et al.  Mean shift-based clustering , 2007, Pattern Recognit..

[109]  Driss Aboutajdine,et al.  Document clustering based on diffusion maps and a comparison of the k-means performances in various spaces , 2008, 2008 IEEE Symposium on Computers and Communications.

[110]  V. Bianco,et al.  Electricity consumption forecasting in Italy using linear regression models , 2009 .

[111]  Plamen P. Angelov,et al.  Semi-supervised deep rule-based approach for image classification , 2018, Appl. Soft Comput..

[112]  Mahardhika Pratama,et al.  GENEFIS: Toward an Effective Localist Network , 2014, IEEE Transactions on Fuzzy Systems.

[113]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[114]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[115]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[116]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[117]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[118]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[119]  S. Eguchi A differential geometric approach to statistical inference on the basis of contrast functionals , 1985 .

[120]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[121]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[122]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[123]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[124]  A. Agresti Categorical data analysis , 1993 .

[125]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[126]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[127]  J. G. Saw,et al.  Chebyshev Inequality With Estimated Mean and Variance , 1984 .

[128]  Curt H. Davis,et al.  Fusion of Deep Convolutional Neural Networks for Land Cover Classification of High-Resolution Imagery , 2017, IEEE Geoscience and Remote Sensing Letters.

[129]  Plamen P. Angelov,et al.  Advances in classification of EEG signals via evolving fuzzy classifiers and dependant multiple HMMs , 2006, Comput. Biol. Medicine.

[130]  Themos Stafylakis,et al.  Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[131]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[132]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[133]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[134]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[135]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[137]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[138]  H. Ishibuchi,et al.  Distributed representation of fuzzy rules and its application to pattern classification , 1992 .

[139]  Olga Lyandres,et al.  Rapid detection of an anthrax biomarker by surface-enhanced Raman spectroscopy. , 2005, Journal of the American Chemical Society.

[140]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[141]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.