Kernel Mode Decomposition and programmable/interpretable regression networks

Mode decomposition is a prototypical pattern recognition problem that can be addressed from the (a priori distinct) perspectives of numerical approximation, statistical inference and deep learning. Could its analysis through these combined perspectives be used as a Rosetta stone for deciphering mechanisms at play in deep learning? Motivated by this question we introduce programmable and interpretable regression networks for pattern recognition and address mode decomposition as a prototypical problem. The programming of these networks is achieved by assembling elementary modules decomposing and recomposing kernels and data. These elementary steps are repeated across levels of abstraction and interpreted from the equivalent perspectives of optimal recovery, game theory and Gaussian process regression (GPR). The prototypical mode/kernel decomposition module produces an optimal approximation $(w_1,w_2,\cdots,w_m)$ of an element $(v_1,v_2,\ldots,v_m)$ of a product of Hilbert subspaces of a common Hilbert space from the observation of the sum $v:=v_1+\cdots+v_m$. The prototypical mode/kernel recomposition module performs partial sums of the recovered modes $w_i$ based on the alignment between each recovered mode $w_i$ and the data $v$. We illustrate the proposed framework by programming regression networks approximating the modes $v_i= a_i(t)y_i\big(\theta_i(t)\big)$ of a (possibly noisy) signal $\sum_i v_i$ when the amplitudes $a_i$, instantaneous phases $\theta_i$ and periodic waveforms $y_i$ may all be unknown and show near machine precision recovery under regularity and separation assumptions on the instantaneous amplitudes $a_i$ and frequencies $\dot{\theta}_i$. The structure of some of these networks share intriguing similarities with convolutional neural networks while being interpretable, programmable and amenable to theoretical analysis.

[1]  Patrick Flandrin,et al.  A complete ensemble empirical mode decomposition with adaptive noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  J. C. A. Barata,et al.  The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory , 2011, 1110.6882.

[3]  Matthew Hutson,et al.  AI researchers allege that machine learning is alchemy , 2018 .

[4]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[5]  Milan Lukić,et al.  Stochastic processes with sample paths in reproducing kernel Hilbert spaces , 2001 .

[6]  Sylvain Meignen,et al.  The fourier-based synchrosqueezing transform , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[8]  Thomas W. Yee,et al.  Vector Generalized Linear and Additive Models: With an Implementation in R , 2015 .

[9]  Dominique Zosso,et al.  Variational Mode Decomposition , 2014, IEEE Transactions on Signal Processing.

[10]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[11]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[12]  Norden E. Huang,et al.  Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[13]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[14]  Li Su,et al.  Wave-Shape Function Analysis , 2016, 1605.01805.

[15]  Gabriel Rilling,et al.  On empirical mode decomposition and its algorithms , 2003 .

[16]  Gene Ryan Yoo Learning Patterns with Kernels and Learning Kernels from Patterns , 2020 .

[17]  James Hensman,et al.  Scalable transformed additive signal decomposition by non-conjugate Gaussian process inference , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[18]  Gareth M. James,et al.  Functional additive regression , 2015, 1510.04064.

[19]  N. Cressie The origins of kriging , 1990 .

[20]  Anton Schwaighofer,et al.  Transductive and Inductive Methods for Approximate Gaussian Process Regression , 2002, NIPS.

[21]  Ole Winther,et al.  TAP Gibbs Free Energy, Belief Propagation and Sparsity , 2001, NIPS.

[22]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  Gabriel Rilling,et al.  Empirical mode decomposition as a filter bank , 2004, IEEE Signal Processing Letters.

[25]  Yi Liu,et al.  Hilbert-Huang Transform and the Application , 2020, 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS).

[26]  G. Matheron Principles of geostatistics , 1963 .

[27]  K. Coughlin,et al.  11-Year solar cycle in the stratosphere extracted by the empirical mode decomposition method , 2004 .

[28]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[29]  Gaigai Cai,et al.  Matching Demodulation Transform and SynchroSqueezing in Time-Frequency Analysis , 2014, IEEE Transactions on Signal Processing.

[30]  Chao Huang,et al.  Convergence of a Convolution-Filtering-Based Algorithm for Empirical Mode Decomposition , 2009, Adv. Data Sci. Adapt. Anal..

[31]  Y. Katznelson An Introduction to Harmonic Analysis: Interpolation of Linear Operators , 1968 .

[32]  C. Peng,et al.  Noise and poise: Enhancement of postural complexity in the elderly with a stochastic-resonance–based therapy , 2007, Europhysics letters.

[33]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[34]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[35]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[36]  Paris Perdikaris,et al.  Machine learning of linear differential equations using Gaussian processes , 2017, J. Comput. Phys..

[37]  N. Huang,et al.  A study of the characteristics of white noise using the empirical mode decomposition method , 2004, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[38]  Rafik Djemili,et al.  Application of empirical mode decomposition and artificial neural network for the classification of normal and epileptic EEG signals , 2016 .

[39]  Wenping Ma,et al.  Variational mode decomposition denoising combined with the Hausdorff distance. , 2017, The Review of scientific instruments.

[40]  Chuan Li,et al.  Time-frequency signal analysis for gearbox fault diagnosis using a generalized synchrosqueezing transform , 2012 .

[41]  Gabriel Rilling,et al.  One or Two Frequencies? The Empirical Mode Decomposition Answers , 2008, IEEE Transactions on Signal Processing.

[42]  Norden E. Huang,et al.  A review on Hilbert‐Huang transform: Method and its applications to geophysical studies , 2008 .

[43]  Hau-Tieng Wu,et al.  The Synchrosqueezing algorithm for time-varying spectral analysis: Robustness properties and new paleoclimate applications , 2011, Signal Process..

[44]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[45]  Antoine Liutkus,et al.  Gaussian Processes for Underdetermined Source Separation , 2011, IEEE Transactions on Signal Processing.

[46]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[47]  Subhransu Maji,et al.  Efficient Classification for Additive Kernel SVMs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Malempati M. Rao Foundations of stochastic analysis , 1981 .

[49]  D. Gabor,et al.  Theory of communication. Part 1: The analysis of information , 1946 .

[50]  C. Scovel,et al.  Statistical Numerical Approximation , 2019, Notices of the American Mathematical Society.

[51]  Marcus R. Frean,et al.  Dependent Gaussian Processes , 2004, NIPS.

[52]  Houman Owhadi,et al.  Operator-Adapted Wavelets, Fast Solvers, and Numerical Homogenization , 2019 .

[53]  Ingrid Daubechies,et al.  A Nonlinear Squeezing of the Continuous Wavelet Transform Based on Auditory Nerve Models , 2017 .

[54]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[55]  Stephen M. Stigler,et al.  STIGLER'S LAW OF EPONYMY† , 1980 .

[56]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[57]  T. Plate ACCURACY VERSUS INTERPRETABILITY IN FLEXIBLE MODELING : IMPLEMENTING A TRADEOFF USING GAUSSIAN PROCESS MODELS , 1999 .

[58]  E P Souza Neto,et al.  Assessment of Cardiovascular Autonomic Control by the Empirical Mode Decomposition , 2004, Methods of Information in Medicine.

[59]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[60]  Thomas Y. Hou,et al.  Adaptive Data Analysis via Sparse Time-Frequency Representation , 2011, Adv. Data Sci. Adapt. Anal..

[61]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[62]  Dennis Gabor,et al.  Theory of communication , 1946 .

[63]  C. Wild,et al.  Vector Generalized Additive Models , 1996 .

[64]  Trevor Cohn,et al.  A temporal model of text periodicities using Gaussian Processes , 2013, EMNLP.

[65]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[66]  Houman Owhadi,et al.  Learning dynamical systems from data: a simple cross-validation perspective , 2020, Physica D: Nonlinear Phenomena.

[67]  Joseph Lipka,et al.  A Table of Integrals , 2010 .

[68]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[69]  R. Irizarry,et al.  Travelling waves in the occurrence of dengue haemorrhagic fever in Thailand , 2004, Nature.

[70]  R. Kress Linear Integral Equations , 1989 .

[71]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals , 1992, Proc. IEEE.

[72]  Yang Wang,et al.  Iterative Filtering as an Alternative Algorithm for Empirical Mode Decomposition , 2009, Adv. Data Sci. Adapt. Anal..

[73]  I. Daubechies,et al.  Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool , 2011 .

[74]  Houman Owhadi,et al.  Multigrid with Rough Coefficients and Multiresolution Operator Decomposition from Hierarchical Information Games , 2015, SIAM Rev..

[75]  Carl E. Rasmussen,et al.  Additive Gaussian Processes , 2011, NIPS.

[76]  R. Merton,et al.  The Sociology of Science: Theoretical and Empirical Investigations , 1975, Journal for the Scientific Study of Religion.

[77]  Neil D. Lawrence,et al.  Gaussian process models for periodicity detection , 2013, 1303.7090.

[78]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[79]  Jérôme Gilles,et al.  Empirical Wavelet Transform , 2013, IEEE Transactions on Signal Processing.

[80]  Michael Feldman,et al.  Time-varying vibration decomposition and analysis based on the Hilbert transform , 2006 .

[81]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[82]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[83]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[84]  S. Helgason The Radon Transform , 1980 .

[85]  David Ginsbourger,et al.  Additive Kernels for Gaussian Process Modeling , 2011, 1103.4023.

[86]  D. Ginsbourger,et al.  Additive Covariance Kernels for High-Dimensional Gaussian Process Modeling , 2011, 1111.6233.

[87]  Joaquin Quiñonero-Candela,et al.  Learning with Uncertainty: Gaussian Processes and Relevance Vector Machines , 2004 .

[88]  Thomas Y. Hou,et al.  Sparse Time Frequency Representations and Dynamical Systems , 2013, ArXiv.

[89]  Richard G. Baraniuk,et al.  A Probabilistic Framework for Deep Learning , 2016, NIPS.

[90]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[91]  Gaurav Thakur,et al.  The Synchrosqueezing transform for instantaneous spectral analysis , 2014, ArXiv.

[92]  Charles A. Micchelli,et al.  A Survey of Optimal Recovery , 1977 .

[93]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[94]  Norman Kaplan,et al.  The Sociology of Science: Theoretical and Empirical Investigations , 1974 .

[95]  M. Fisher C and C , 2004 .

[96]  Sylvain Meignen,et al.  Time-Frequency Reassignment and Synchrosqueezing: An Overview , 2013, IEEE Signal Processing Magazine.

[97]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[98]  Paulo Gonçalves,et al.  Empirical Mode Decompositions as Data-Driven Wavelet-like Expansions , 2004, Int. J. Wavelets Multiresolution Inf. Process..

[99]  Fabio Tozeto Ramos,et al.  Multi-Kernel Gaussian Processes , 2011, IJCAI.

[100]  Norden E. Huang,et al.  INTRODUCTION TO THE HILBERT–HUANG TRANSFORM AND ITS RELATED MATHEMATICAL PROBLEMS , 2005 .

[101]  Neil D. Lawrence,et al.  Detecting periodicities with Gaussian processes , 2016, PeerJ Comput. Sci..

[102]  Florian Schäfer,et al.  Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity , 2017, Multiscale Model. Simul..

[103]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[104]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[105]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[106]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[107]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[108]  Sunho Park,et al.  Gaussian processes for source separation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[109]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[110]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. II. A/lgorithms and applications , 1992, Proc. IEEE.