Feature subset selection and feature ranking for multivariate time series

Feature subset selection (FSS) is a known technique to preprocess the data before performing any data mining tasks, e.g., classification and clustering. FSS provides both cost-effective predictors and a better understanding of the underlying process that generated the data. We propose a family of novel unsupervised methods for feature subset selection from multivariate time series (MTS) based on common principal component analysis, termed CLeVer. Traditional FSS techniques, such as recursive feature elimination (RFE) and Fisher criterion (FC), have been applied to MTS data sets, e.g., brain computer interface (BCI) data sets. However, these techniques may lose the correlation information among features, while our proposed techniques utilize the properties of the principal component analysis to retain that information. In order to evaluate the effectiveness of our selected subset of features, we employ classification as the target data mining task. Our exhaustive experiments show that CLeVer outperforms RFE, FC, and random selection by up to a factor of two in terms of the classification accuracy, while taking up to 2 orders of magnitude less processing time than RFE and FC.

[1]  Aaron F. Bobick,et al.  Performance Analysis of Time-Distance Gait Parameters under Different Speeds , 2003, AVBPA.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Cyrus Shahabi,et al.  AIMS: An Immersidata Management System , 2003, CIDR.

[4]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[6]  Mubarak Shah,et al.  Motion-Based Recognition , 1997, Computational Imaging and Vision.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Joydeep Ghosh,et al.  HMMs and Coupled HMMs for multi-channel EEG classification , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[10]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[11]  D. Leibovici,et al.  A singular value decomposition of a k-way array for a principal component analysis of multiway data, PTA-k , 1998 .

[12]  H. Begleiter,et al.  Event related potentials during object recognition tasks , 1995, Brain Research Bulletin.

[13]  Zhenya He,et al.  Self-Organizing Transient Chaotic Neural Network for Cellular Channel Assignment , 2002, Neural Processing Letters.

[14]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[15]  B. Flury Common Principal Components in k Groups , 1984 .

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  J. R. Schott Some tests for common principal component subspaces in several groups , 1991 .

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Huan Liu,et al.  Active Feature Selection Using Classes , 2003, PAKDD.

[21]  Sankar K. Pal,et al.  Pattern Recognition Algorithms for Data Mining: Scalability, Knowledge Discovery, and Soft Granular Computing , 2004 .

[22]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[23]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[24]  Cyrus Shahabi,et al.  A PCA-based similarity measure for multivariate time series , 2004, MMDB '04.

[25]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[26]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .

[27]  Xiaohui Liu,et al.  Variable grouping in multivariate time series via correlation , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[28]  Bernhard Schölkopf,et al.  Support vector channel selection in BCI , 2004, IEEE Transactions on Biomedical Engineering.