INSIGHT: Efficient and Effective Instance Selection for Time-Series Classification

Time-series classification is a widely examined data mining task with various scientific and industrial applications. Recent research in this domain has shown that the simple nearest-neighbor classifier using Dynamic Time Warping (DTW) as distance measure performs exceptionally well, in most cases outperforming more advanced classification algorithms. Instance selection is a commonly applied approach for improving efficiency of nearest-neighbor classifier with respect to classification time. This approach reduces the size of the training set by selecting the best representative instances and use only them during classification of new instances. In this paper, we introduce a novel instance selection method that exploits the hubness phenomenon in time-series data, which states that some few instances tend to be much more frequently nearest neighbors compared to the remaining instances. Based on hubness, we propose a framework for score-based instance selection, which is combined with a principled approach of selecting instances that optimize the coverage of training data. We discuss the theoretical considerations of casting the instance selection problem as a graph-coverage problem and analyze the resulting complexity. We experimentally compare the proposed method, denoted as INSIGHT, against FastAWARD, a state-of-the-art instance selection method for time series. Our results indicate substantial improvements in terms of classification accuracy and drastic reduction (orders of magnitude) in execution times.

[1]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[2]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[3]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[4]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[5]  Alexandros Nanopoulos,et al.  Nearest neighbors in high-dimensional data: the emergence and influence of hubs , 2009, ICML '09.

[6]  Alexandros Nanopoulos,et al.  Time-Series Classification in Many Intrinsic Dimensions , 2010, SDM.

[7]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[8]  Lars Schmidt-Thieme,et al.  Time-Series Classification Based on Individualised Error Prediction , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.

[9]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[10]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[11]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[12]  Thomas G. Dietterich,et al.  Locally Adaptive Nearest Neighbor Algorithms , 1993, NIPS.

[13]  Marek Grochowski,et al.  Comparison of Instance Selection Algorithms II. Results and Comments , 2004, ICAISC.

[14]  Yannis Manolopoulos,et al.  Adaptive k-Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors , 2007, ADBIS.

[15]  Dimitrios Gunopulos,et al.  Time series similarity measures and time series indexing (abstract only) , 2001, SIGMOD '01.

[16]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[17]  Huan Liu,et al.  On Issues of Instance Selection , 2002, Data Mining and Knowledge Discovery.

[18]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[19]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[20]  Jörg H. Siekmann,et al.  Artificial Intelligence and Soft Computing - ICAISC 2004 , 2004, Lecture Notes in Computer Science.