A Supervised Feature Subset Selection Technique for Multivariate Time Series

Feature subset selection (FSS) is a known technique to pre-process the data before performing any data mining tasks, e.g., classification and clustering. FSS provides both cost-effective predictors and a better understanding of the underlying process that generated data. We propose Corona, a simple yet effective supervised feature subset selection technique for Multivariate Time Series (MTS). Traditional FSS techniques, such as Recursive Feature Elimination (RFE) and Fisher Criterion (FC), have been applied to MTS datasets, e.g., Brain Computer Interface (BCI) datasets. However, these techniques may lose the correlation information among MTS variables, since each variable is considered separately when an MTS item is vectorized before applying RFE and FC. Corona maintains the correlation information by utilizing the correlation coefficient matrix of each MTS item as features to be employed for SVM. Our exhaustive sets of experiments show that Corona consistently outperforms RFE and FC by up to 100% in terms of classification accuracy, and takes more than one order of magnitude less time than RFE and FC in terms of the overall processing time.

[1]  H. Begleiter,et al.  Event related potentials during object recognition tasks , 1995, Brain Research Bulletin.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[4]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[5]  Xiaohui Liu,et al.  Variable grouping in multivariate time series via correlation , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Mohamed A. Deriche,et al.  A new algorithm for EEG feature selection using mutual information , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[8]  Joydeep Ghosh,et al.  HMMs and Coupled HMMs for multi-channel EEG classification , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[9]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[12]  Huan Liu,et al.  Active Feature Selection Using Classes , 2003, PAKDD.

[13]  T. Hinterberger,et al.  Automated EEG feature selection for brain computer interfaces , 2003, First International IEEE EMBS Conference on Neural Engineering, 2003. Conference Proceedings..

[14]  Aaron F. Bobick,et al.  Performance Analysis of Time-Distance Gait Parameters under Different Speeds , 2003, AVBPA.

[15]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[16]  Cyrus Shahabi,et al.  AIMS: An Immersidata Management System , 2003, CIDR.

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  Bernhard Schölkopf,et al.  Support vector channel selection in BCI , 2004, IEEE Transactions on Biomedical Engineering.

[20]  Cyrus Shahabi,et al.  A PCA-based similarity measure for multivariate time series , 2004, MMDB '04.

[21]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.