Feature Subset Selection on Multivariate Time Series with Extremely Large Spatial Features

Several spatio-temporal data collected in many applications, such as fMRI data in medical applications, can be represented as a multivariate time series (MTS) matrix with m rows (capturing the spatial features) and n columns (capturing the temporal observations). Any data mining task such as clustering or classification on MTS datasets are usually hindered by the large size (i.e., dimensions) of these MTS items. In order to reduce the dimensions without losing the useful discriminative features of the dataset, feature selection techniques are usually preferred by domain experts since the relation of the selected subset of features to the originally acquired features is maintained. In this paper, we propose a new feature selection technique for MTS datasets where their spatial features (i.e., number of rows) are much larger than their temporal observations (i.e., number of columns), or m Gt n. Our approach is based on principal component analysis, recursive feature elimination and support vector machines. Our empirical results on real-world datasets show that our technique significantly outperforms the closest competitor technique