Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets

In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge. In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial generation of new data samples (data augmentation). We illustrate the application of these signal decomposition methods in diverse selected practical machine learning examples including: brain computer interface, epileptic intracranial electroencephalogram signals classification, face recognition/verification and water networks data analysis. We show that a signal decomposition approach can provide valuable tools to improve machine learning performance with low quality datasets.

[1]  Huanhuan Song,et al.  Data Augmentation for Radio Frequency Fingerprinting via Pseudo-Random Integration , 2020, IEEE Transactions on Emerging Topics in Computational Intelligence.

[2]  Andong Wang,et al.  Classification of Epileptic IEEG Signals by CNN and Data Augmentation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Toshihisa Tanaka,et al.  Multiband entropy-based feature-extraction method for automatic identification of epileptic focus based on high-frequency components in interictal iEEG , 2020, bioRxiv.

[4]  Pere Marti-Puig,et al.  Double Tensor-Decomposition for SCADA Data Completion in Water Networks , 2019, Water.

[5]  Ming Chen,et al.  Data Simulation by Resampling—A Practical Data Augmentation Algorithm for Periodical Signal Analysis-Based Fault Diagnosis , 2019, IEEE Access.

[6]  Gaohang Yu,et al.  Orthogonal tensor dictionary learning for accelerated dynamic MRI , 2019, Medical & Biological Engineering & Computing.

[7]  Bernd Ludwig,et al.  A recognition–verification system for noisy faces based on an empirical mode decomposition with Green’s functions , 2019, Soft Computing.

[8]  Pere Marti-Puig,et al.  Different Approaches to SCADA Data Completion in Water Networks , 2019, Water.

[9]  Yixin Chen,et al.  Inductive Matrix Completion Based on Graph Neural Networks , 2019, ICLR.

[10]  Feng Duan,et al.  A Novel Deep Learning Approach With Data Augmentation to Classify Motor Imagery Signals , 2019, IEEE Access.

[11]  Hui Xiong,et al.  Exploratory data analysis , 2018, Encyclopedia of Database Systems.

[12]  A. Cichocki,et al.  Brain-Computer Interface with Corrupted EEG Data: a Tensor Completion Approach , 2018, Cognitive Computation.

[13]  Rupert Ortner,et al.  A New Method to Generate Artificial Frames Using the Empirical Mode Decomposition for an EEG-Based Motor Imagery BCI , 2018, Front. Neurosci..

[14]  Jacek Tabor,et al.  Processing of missing data by neural networks , 2018, NeurIPS.

[15]  Tri Dao,et al.  A Kernel Theory of Modern Data Augmentation , 2018, ICML.

[16]  Gang Niu,et al.  Active Feature Acquisition with Supervised Matrix Completion , 2018, KDD.

[17]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Yuval Noah Harari,et al.  Reboot for the AI revolution , 2017, Nature.

[19]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[20]  Franck Giron,et al.  Improving music source separation based on deep neural networks through data augmentation and network blending , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Elmar Wolfgang Lang,et al.  A green's function-based Bi-dimensional empirical mode decomposition , 2016, Inf. Sci..

[22]  Jeremy Levesley,et al.  Handling missing data in large healthcare dataset: A case study of unknown trauma outcomes , 2016, Comput. Biol. Medicine.

[23]  M. Burge,et al.  Digital Image Processing , 2016, Texts in Computer Science.

[24]  Hongwei Liu,et al.  Convolutional Neural Network With Data Augmentation for SAR Target Recognition , 2016, IEEE Geoscience and Remote Sensing Letters.

[25]  Roi Livni,et al.  Classification with Low Rank and Missing Data , 2015, ICML.

[26]  Andrzej Cichocki,et al.  Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.

[27]  Liqing Zhang,et al.  Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xinhao Liu,et al.  Single-Image Noise Level Estimation for Blind Denoising , 2013, IEEE Transactions on Image Processing.

[29]  Andrzej Cichocki,et al.  Multidimensional compressed sensing and their applications , 2013, WIREs Data Mining Knowl. Discov..

[30]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[31]  Ralph G Andrzejak,et al.  Nonrandomness, nonlinear dependence, and nonstationarity of electroencephalographic recordings from epilepsy patients. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Robert D. Nowak,et al.  Transduction with Matrix Completion: Three Birds with One Stone , 2010, NIPS.

[33]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[34]  Stephen J. Wright,et al.  Computational Methods for Sparse Solution of Linear Inverse Problems , 2010, Proceedings of the IEEE.

[35]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[36]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[37]  Michael Elad,et al.  On the Role of Sparse and Redundant Representations in Image Processing , 2010, Proceedings of the IEEE.

[38]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[40]  F. Bach,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[41]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[42]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[43]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[44]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[45]  Mohamed-Jalal Fadili,et al.  Sparsity and Morphological Diversity in Blind Source Separation , 2007, IEEE Transactions on Image Processing.

[46]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[47]  R. Ward,et al.  EMG and EOG artifacts in brain computer interface systems: A survey , 2007, Clinical Neurophysiology.

[48]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[49]  D. Donoho,et al.  Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA) , 2005 .

[50]  Sophie Midenet,et al.  Self-Organising Map for Data Imputation and Correction in Surveys , 2002, Neural Computing & Applications.

[51]  D. DeCoste,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[52]  G. Pfurtscheller,et al.  Optimal spatial filtering of single trial EEG during imagined hand movement. , 2000, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[53]  Soo-Young Lee,et al.  Training Algorithm with Incomplete Data for Feed-Forward Neural Networks , 1999, Neural Processing Letters.

[54]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[55]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[56]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[57]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[58]  Yoshua Bengio,et al.  Recurrent Neural Networks for Missing or Asynchronous Data , 1995, NIPS.

[59]  Zhifeng Zhang,et al.  Adaptive time-frequency decompositions , 1994 .

[60]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[61]  Jorge Herbert de Lira,et al.  Two-Dimensional Signal and Image Processing , 1989 .

[62]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[63]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[64]  Kang Ryoung Park,et al.  Conditional Generative Adversarial Network- Based Data Augmentation for Enhancement of Iris Recognition Accuracy , 2019, IEEE Access.

[65]  Karolin Baecker,et al.  Two Dimensional Signal And Image Processing , 2016 .

[66]  Geoffrey E. Hinton,et al.  Deep Learning , 2015 .

[67]  Andrzej Cichocki,et al.  Computing Sparse Representations of Multidimensional Signals Using Kronecker Bases , 2013, Neural Computation.

[68]  Gitta Kutyniok Compressed Sensing , 2012 .

[69]  Gustavo E. A. P. A. Batista,et al.  A Study of K-Nearest Neighbour as an Imputation Method , 2002, HIS.

[70]  Kari Karhunen,et al.  Über lineare Methoden in der Wahrscheinlichkeitsrechnung , 1947 .

[71]  Martin A. Riedmiller,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2022 .