CL eVer: A Feature Subset Selection Technique for Multivariate Time Series

Feature subset selection (FSS) is one of the data pre-processing techniques to identify a subset of the original features from a given dataset before performing any data mining tasks. We propose a novel FSS method for Multivariate Time Series (MTS) based on Common Principal Components, termed CLeVer. It utilizes the properties of the principal components to retain the correlation information among original features while traditional FSS techniques, such as Recursive Feature Elimination (RFE), may lose it. In order to evaluate the effectiveness of our selected subset of features, classification is employed as the target data mining task. Our experiments show that CLeVer outperforms RFE and Fisher Criterion by up to a factor of two in terms of classification accuracy, while requiring up to 2 orders of magnitude less processing time.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  B. Flury Common Principal Components in k Groups , 1984 .

[3]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[4]  Huan Liu,et al.  Active Feature Selection Using Classes , 2003, PAKDD.

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  Bernhard Schölkopf,et al.  Support vector channel selection in BCI , 2004, IEEE Transactions on Biomedical Engineering.

[7]  Cyrus Shahabi,et al.  A PCA-based similarity measure for multivariate time series , 2004, MMDB '04.

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .

[10]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[11]  Xiaohui Liu,et al.  Variable grouping in multivariate time series via correlation , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Aaron F. Bobick,et al.  Performance Analysis of Time-Distance Gait Parameters under Different Speeds , 2003, AVBPA.

[13]  D. Leibovici,et al.  A singular value decomposition of a k-way array for a principal component analysis of multiway data, PTA-k , 1998 .

[14]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[15]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Cyrus Shahabi,et al.  AIMS: An Immersidata Management System , 2003, CIDR.