We present a flexible, general-purpose technique for generating time series classifiers. These classifiers are two-stage algorithms; each consists of a set of feature extraction programs, used for transforming the time series into a vector of descriptive scalar features, and a back-end classifier (such as a support vector machine) which uses these features to predict a label. We use grammars to constrain the set of valid feature extraction programs and to provide a mechanism for incorporating domain expertise. We test our algorithm on a variety of problems, and compare its performance against conventional classifiers such as a Support Vector Machine (SVM) and Fisher Linear Discriminant (FLD). 1 Feature Extraction Feature extraction is a process for generating numerical descriptions of data instances. The task of choosing appropriate features is notoriously domain-specific. Automated techniques for feature extraction exist but they often fall short of what a domain expert can suggest. A manual approach involves identifying physical characteristics and deriving mathematical expressions to generate numerical measures to describe them. In practice, the manual design of feature extractors is an incremental and often tedious process. Features are added or removed or in other ways tweaked until the desired performance is achieved. This can consume significant time and resources. Our aim is to enable the expert to give advice to the automated feature extractor, but for the computer to do all the grunt work. Grammars provide a mechanism for incorporating domain knowledge at whatever level of detail is available, and providing a framework within which the automated feature extractor can search for pertinent features. 1.1 Automated Feature Extraction An automated approach to feature extraction using Genetic Programming (GP) was used by Harvey et al.[6] to classify pixels in multispectral images. They used GP to evolve feature extraction algorithms composed of primitive image processing operators (e.g. edge detectors, texture energy, morphological operations, etc.). The images produced by these algorithms were fed into a pixel-by-pixel linear classifier. The extracted features incorporated spatial information for each pixel to augment its spectral profile. Previously, we developed a machine learning algorithm we called ZEUS for generating Figure 1: A ZEUS solution is a time series classifier. It consists of a set of feature extractors (the dashed box), and a back-end classifier (the dashed circle). When applied to a time series, the time series is fed into each individual feature extractor (labeled FE), and each produces at least one numerical descriptor of the input signal. These features are then fed into a back-end classifier and a predicted label (Y) results. feature extractors for time series classification [3, 4]. We evaluated its performance on a FORTE lightning classification task. We have since extended our approach to incorporate the use of grammars to guide the extraction of a richer and more systematic set of features for classification. As Figure 1 illustrates, a solution consists of two parts: a set of feature extractors (programs composed of primitive signal processing operators) which generate scalar features from the training data, and a back-end classifier which combines these features to predict a label. ZEUS iteratively refines its classifier until a stopping condition is met. Upon completion, ZEUS provides a classifier to categorize new data of the same form as the input data. 1.2 Human-Readable Code The classifier (or regressor) that ZEUS produces takes the form of MATLAB code that can be integrated into a user’s standalone application. The first part of this code is a set of MATLAB expressions for extracting features from the time series data. The second part takes these features and classifies them with a back-end classifier such as a linear discriminant. Human-readable algorithms provide insight into the physical and descriptive characteristics of time series, and permit an expert to more easily incorporate domain knowledge. 1.3 Dimensionality Reduction Extracting a set of scalar features from time series also serves the purpose of reducing the dimensionality of the data. In our experiments, it was not uncommon to attain decent performance with only 5 scalar features generated from time series consisting of thousands of values. This is useful when computing a Fisher Linear Discriminant as it is significantly cheaper to compute a covariance matrix for a lower dimensional data set. Since SVMs can handle high dimensional spaces very well, the benefits of dimensionality reduction are less clear. However, the benefits of feature extraction are realized with SVMs; e.g. ZEUS can produce feature extractors which are invariant to offset shifts.
[1]
Conor Ryan,et al.
Grammatical evolution
,
2007,
GECCO '07.
[2]
Katharina Morik,et al.
Automatic Feature Extraction for Classifying Audio Data
,
2005,
Machine Learning.
[3]
Georgios Dounias,et al.
An evolutionary system for neural logic networks using genetic programming and indirect encoding
,
2004,
J. Appl. Log..
[4]
James Theiler,et al.
Two realizations of a general feature extraction framework
,
2004,
Pattern Recognit..
[5]
Eamonn J. Keogh,et al.
Everything you know about Dynamic Time Warping is Wrong
,
2004
.
[6]
Neal R. Harvey,et al.
Multimodal approach to feature extraction for image and signal learning problems
,
2004,
SPIE Optics + Photonics.
[7]
Eric R. Ziegel,et al.
The Elements of Statistical Learning
,
2003,
Technometrics.
[8]
Simon J. Perkins,et al.
Genetic Algorithms and Support Vector Machines for Time Series Classification
,
2002,
Optics + Photonics.
[9]
Neal R. Harvey,et al.
Comparison of GENIE and conventional supervised classifiers for multispectral image feature extraction
,
2002,
IEEE Trans. Geosci. Remote. Sens..
[10]
Shigeo Abe DrEng.
Pattern Classification
,
2001,
Springer London.
[11]
Kwong-Sak Leung,et al.
Data Mining Using Grammar Based Genetic Programming and Applications
,
2000
.
[12]
J. C. BurgesChristopher.
A Tutorial on Support Vector Machines for Pattern Recognition
,
1998
.
[13]
Michael O'Neill,et al.
Grammatical Evolution: Evolving Programs for an Arbitrary Language
,
1998,
EuroGP.
[14]
David J. Montana,et al.
Strongly Typed Genetic Programming
,
1995,
Evolutionary Computation.
[15]
Kurt R. Moore,et al.
Classification of rf transients in space using digital signal processing and neural network techniques
,
1995,
SPIE Defense + Commercial Sensing.
[16]
Ron Kohavi,et al.
Irrelevant Features and the Subset Selection Problem
,
1994,
ICML.
[17]
Melanie Mitchell,et al.
Relative Building-Block Fitness and the Building Block Hypothesis
,
1992,
FOGA.
[18]
Eamonn J. Keogh,et al.
UCR Time Series Data Mining Archive
,
1983
.