On Reduction of Data Series Dimensionality

In this paper we introduce a complex procedure of reducing dimensionality of multidimensional data series. The procedure consists of several steps, and each step gives a new data series representation as well as dimension reduction. The approach is based on the concept of data series aggregated envelopes, and principal components called here ‘essential attributes’ generated by a multilayer neural network. The essential attributes are generated by outputs of hidden layer neurons. Next, all differences of the essential attributes are treated as new attributes. The real values of the new attributes are nominalized in order to obtain a nominal representation of data series. The approach creates a nominal representation of the original data series and considerably reduces their dimension. Practical verification of the proposed approach was verified for classification and clustering of time series problems, the results are set out in different papers of the authors. Here, the short summarization confirms utilities of time series dimension reduction procedure.

[1]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[2]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[3]  Maciej Krawczak Heuristic Dynamic Programming for Neural Networks Learning Part 1: Learning as a Control Problem , 2003 .

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[6]  Maciej Krawczak,et al.  Time series envelopes for classification , 2010, 2010 5th IEEE International Conference Intelligent Systems.

[7]  Janusz Kacprzyk,et al.  Inductive Learning: A Combinatorial Optimization Approach , 2010, Advances in Machine Learning I.

[8]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[9]  Leszek Rutkowski,et al.  Neural Networks and Soft Computing , 2003 .

[10]  Jacek M. Zurada,et al.  Artificial Intelligence and Soft Computing, 10th International Conference, ICAISC 2010, Zakopane, Poland, June 13-17, 2010, Part I , 2010, International Conference on Artificial Intelligence and Soft Computing.

[11]  Bin Wang A New Clustering Algorithm On Nominal Data Sets , 2010 .

[12]  Janusz Kacprzyk,et al.  MACHINE LEARNING FROM EXAMPLES UNDER ERRORS IN DATA , 1995 .

[13]  Juan José Rodríguez Diez,et al.  Interval and dynamic time warping-based decision trees , 2004, SAC '04.

[14]  Lakhmi C. Jain,et al.  New Learning Paradigms in Soft Computing , 2002 .

[15]  Janusz Kacprzyk,et al.  An integer programming approach to inductive learning using genetic and greedy algorithms , 2002 .

[16]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[17]  Janusz Kacprzyk,et al.  A Softened Formulation of Inductive Learning and Its Use for Coronary Disease Data , 2005, ISMIS.

[18]  Maciej Krawczak A Novel Modeling Methodology: Generalized Nets , 2006, ICAISC.

[19]  Yannis Manolopoulos,et al.  Feature-based classification of time-series data , 2001 .

[20]  Jan de Leeuw,et al.  Models and techniques , 1988 .

[21]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[22]  Edward Y. Chang,et al.  Distance-function design and fusion for sequence data , 2004, CIKM '04.

[23]  Janusz Kacprzyk,et al.  An Inductive Learning Algorithm with a Partial Completeness and Consistence via a Modified Set Covering Problem , 2005, ICANN.

[24]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[25]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[26]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[27]  Maarten van Someren,et al.  Advances in Machine Learning , 2021, Advances in Computational Intelligence and Learning.

[28]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[29]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[30]  M. Krawczak,et al.  A hybrid approach to dimension reduction in classification , 2011 .