Large Margin Filtering

Many signal processing problems are tackled by filtering the signal for subsequent feature classification or regression. Both steps are critical and need to be designed carefully to deal with the particular statistical characteristics of both signal and noise. Optimal design of the filter and the classifier are typically aborded in a separated way, thus leading to suboptimal classification schemes. This paper proposes an efficient methodology to learn an optimal signal filter and a support vector machine (SVM) classifier jointly. In particular, we derive algorithms to solve the optimization problem, prove its theoretical convergence, and discuss different filter regularizers for automated scaling and selection of the feature channels. The latter gives rise to different formulations with the appealing properties of sparseness and noise-robustness. We illustrate the performance of the method in several problems. First, linear and nonlinear toy classification examples, under the presence of both Gaussian and convolutional noise, show the robustness of the proposed methods. The approach is then evaluated on two challenging real life datasets: BCI time series classification and multispectral image segmentation. In all the examples, large margin filtering shows competitive classification performances while offering the advantage of interpretability of the filtered channels retrieved.

[1]  William J. Emery,et al.  Classification of Very High Spatial Resolution Imagery Using Mathematical Morphology and Support Vector Machines , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Jdel.R. Millan,et al.  On the need for on-line learning in brain-computer interfaces , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[3]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[4]  Klaus-Robert Müller,et al.  Optimizing spatio-temporal filters for improving Brain-Computer Interfacing , 2005, NIPS.

[5]  José Carlos Príncipe,et al.  The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[6]  Johannes R. Sveinsson,et al.  Classification of hyperspectral data from urban areas based on extended morphological profiles , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Manuela M. Veloso,et al.  Non-Parametric Time Series Classification , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[8]  O. Chapelle Second order optimization of kernel parameters , 2008 .

[9]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[10]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[11]  Luis Gómez-Chova,et al.  Explicit signal to noise ratio in reproducing kernel Hilbert spaces , 2011, 2011 IEEE International Geoscience and Remote Sensing Symposium.

[12]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[13]  Antoine Bordes,et al.  Sequence Labelling SVMs Trained in One Pass , 2008, ECML/PKDD.

[14]  Joseph Picone,et al.  Applications of support vector machines to speech recognition , 2004, IEEE Transactions on Signal Processing.

[15]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[16]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[17]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[18]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[19]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[20]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[21]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[22]  David Burshtein,et al.  Support Vector Machine Training for Improved Hidden Markov Modeling , 2008, IEEE Transactions on Signal Processing.

[23]  Andreas Jakobsson,et al.  Optimal Filter Designs for Separating and Enhancing Periodic Signals , 2010, IEEE Transactions on Signal Processing.

[24]  Xavier Rodet,et al.  Short-time Viterbi for online HMM decoding: Evaluation on a real-time phone recognition task , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[26]  José Carlos Príncipe,et al.  The gamma-filter-a new class of adaptive IIR filters with restricted feedback , 1993, IEEE Trans. Signal Process..

[27]  Klaus-Robert Müller,et al.  The BCI competition 2003: progress and perspectives in detection and discrimination of EEG single trials , 2004, IEEE Transactions on Biomedical Engineering.

[28]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[29]  Andreas Jakobsson,et al.  On Optimal Filter Designs for Fundamental Frequency Estimation , 2008, IEEE Signal Processing Letters.

[30]  Ah Chung Tsoi,et al.  The Gamma MLP for Speech Phoneme Recognition , 1995, NIPS.

[31]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[32]  Alexander Shapiro,et al.  Optimization Problems with Perturbations: A Guided Tour , 1998, SIAM Rev..

[33]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[34]  William J. Emery,et al.  A neural network approach using multi-scale textural metrics from very high-resolution panchromatic imagery for urban land-use classification , 2009 .

[35]  Alain Rakotomamonjy,et al.  Large margin filtering for Signal Sequence Labeling , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  W. Hager,et al.  A SURVEY OF NONLINEAR CONJUGATE GRADIENT METHODS , 2005 .

[37]  Xi Chen,et al.  Accelerated Gradient Method for Multi-task Sparse Learning Problem , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[38]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[39]  Andreas Schulze-Bonhage,et al.  Prediction of arm movement trajectories from ECoG-recordings in humans , 2008, Journal of Neuroscience Methods.

[40]  Qunsheng Peng,et al.  Bisection approach for pixel labelling problem , 2010, Pattern Recognit..

[41]  Johannes R. Sveinsson,et al.  Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles , 2008, 2007 IEEE International Geoscience and Remote Sensing Symposium.

[42]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[43]  George Manis,et al.  Heartbeat Time Series Classification With Support Vector Machines , 2009, IEEE Transactions on Information Technology in Biomedicine.

[44]  Ting Wang,et al.  Color image segmentation using pixel wise support vector machine classification , 2011, Pattern Recognit..

[45]  Seyed Kamaledin Setarehdan,et al.  Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal , 2008, Artif. Intell. Medicine.

[46]  Domenec Puig,et al.  Automatic texture feature selection for image pixel classification , 2006, Pattern Recognit..

[47]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[48]  G. Pfurtscheller,et al.  Human cortical 40 Hz rhythm is closely related to EMG rhythmicity , 1996, Neuroscience Letters.