Backtransformation: a new representation of data processing chains with a scalar decision function

Data processing often transforms a complex signal using a set of different preprocessing algorithms to a single value as the outcome of a final decision function. Still, it is challenging to understand and visualize the interplay between the algorithms performing this transformation. Especially when dimensionality reduction is used, the original data structure (e.g., spatio-temporal information) is hidden from subsequent algorithms. To tackle this problem, we introduce the backtransformation concept suggesting to look at the combination of algorithms as one transformation which maps the original input signal to a single value. Therefore, it takes the derivative of the final decision function and transforms it back through the previous processing steps via backward iteration and the chain rule. The resulting derivative of the composed decision function in the sample of interest represents the complete decision process. Using it for visualizations might improve the understanding of the process. Often, it is possible to construct a feasible processing chain with affine mappings which simplifies the calculation for the backtransformation and the interpretation of the result a lot. In this case, the affine backtransformation provides the complete parameterization of the processing chain. This article introduces the theory, provides implementation guidelines, and presents three application examples.

[1]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[2]  Ricardo Chavarriaga,et al.  Self-paced movement intention detection from human brain signals: Invasive and non-invasive EEG , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[3]  Lenka Lhotská,et al.  EEG Data and Data Analysis Visualization , 2004, ISBMDA.

[4]  Guillaume Gibert,et al.  xDAWN Algorithm to Enhance Evoked Potentials: Application to Brain–Computer Interface , 2009, IEEE Transactions on Biomedical Engineering.

[5]  Robert M. Haralick,et al.  Feature normalization and likelihood-based similarity measures for image retrieval , 2001, Pattern Recognit. Lett..

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Sirko Straube,et al.  An adaptive and efficient spatial filter for event-related potentials , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[8]  Jean-Pierre Martens,et al.  A Practical Approach to Model Selection for Support Vector Machines With a Gaussian Kernel , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  R. Wulf,et al.  An image processing chain for land-cover classification using multitemporal ERS-1 data. , 1999 .

[10]  T. Lagerlund,et al.  Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition. , 1997, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[11]  Stefan Haufe,et al.  Single-trial analysis and classification of ERP components — A tutorial , 2011, NeuroImage.

[12]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[13]  Mario Michael Krell,et al.  Balanced Relative Margin Machine - The missing piece between FDA and SVM classification , 2014, Pattern Recognit. Lett..

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Li Li,et al.  Support Vector Machines , 2015 .

[16]  M. Fahle,et al.  On the Applicability of Brain Reading for Predictive Human-Machine Interfaces in Robotics , 2013, PloS one.

[17]  David Feess,et al.  Comparison of Sensor Selection Mechanisms for an ERP-Based Brain-Computer Interface , 2013, PloS one.

[18]  Chun-Houh Chen,et al.  Handbook of Data Visualization (Springer Handbooks of Computational Statistics) , 2008 .

[19]  Chun-Houh Chen,et al.  Handbook of Data Visualization , 2016 .

[20]  Elsa Andrea Kirchner,et al.  Rapid Adaptation of Brain Reading Interfaces based on Threshold Adjustment , 2010 .

[21]  Bernhard Schölkopf,et al.  Support vector channel selection in BCI , 2004, IEEE Transactions on Biomedical Engineering.

[22]  Mario Michael Krell,et al.  Generalizing, decoding, and optimizing support vector machine classification , 2018, ArXiv.

[23]  Mark Hallett,et al.  The Bereitschaftspotential : movement-related cortical potentials , 2003 .

[24]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[25]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[26]  Mario Michael Krell,et al.  Memory and Processing Efficient Formula for Moving Variance Calculation in EEG and EMG Signal Processing , 2013, NEUROTECHNIX.

[27]  Mario Michael Krell,et al.  pySPACE—a signal processing and classification environment in Python , 2013, Front. Neuroinform..

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  W. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[30]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[32]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[33]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[34]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[35]  Elsa Andrea Kirchner,et al.  EMG Onset Detection - Comparison of Different Methods for a Movement Prediction Task based on EMG , 2013, BIOSIGNALS.

[36]  K.-R. Muller,et al.  Optimizing Spatial filters for Robust EEG Single-Trial Analysis , 2008, IEEE Signal Processing Magazine.

[37]  Mario Michael Krell,et al.  New one-class classifiers based on the origin separation approach , 2015, Pattern Recognit. Lett..

[38]  H. Abdi,et al.  Principal component analysis , 2010 .

[39]  Gunnar Rätsch,et al.  A Mathematical Programming Approach to the Kernel Fisher Algorithm , 2000, NIPS.

[40]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[41]  Stefan Haufe,et al.  On the interpretation of weight vectors of linear models in multivariate neuroimaging , 2014, NeuroImage.

[42]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[43]  Stephen C. Strother,et al.  Support vector machines for temporal classification of block design fMRI data , 2005, NeuroImage.

[44]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[45]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[46]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[47]  Frank Kirchner,et al.  An Adaptive Spatial Filter for User-Independent Single Trial Detection of Event-Related Potentials , 2015, IEEE Transactions on Biomedical Engineering.

[48]  Sirko Straube,et al.  Online movement prediction in a robotic application scenario , 2013, 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER).

[49]  D. Feess,et al.  Looking at ERPs from Another Perspective: Polynomial Feature Analysis , 2013 .

[50]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.