An ensemble of shapelet-based classifiers on inter-class and intra-class imbalanced multivariate time series at the early stage

Early classification of time series will weaken the accuracy to some degree. If the time series data are imbalanced, it will be also challenging to accurately identify minority class examples. Up to now, these two problems have been intensively addressed separately on univariate time series data, but yet to be well studied when they occur together. Compared with univariate time series, multivariate time series (MTS) is more complex, which contains multiple variables, and the interconnections between variables are hidden. Therefore, it is even more challenging to handle the combination of both problems on multivariate time series. In this paper, we propose an adaptive classification ensemble method called early prediction on imbalanced MTS to deal with early classification on inter-class and intra-class imbalanced MTS data simultaneously. First, an adaptive ensemble framework is designed to learn an early classification model on imbalanced MTS data. Based on a multiple under-sampling approach and dynamical subspace generation method, the diversity of base classifiers is realized as well as all majority class examples being fully utilized. Second, to deal with the implicit issue of intra-class imbalance in the training data, a cluster-based shapelet selection method is introduced to obtain an optimal set of stable and robust shapelets. Finally, an associate-pattern mining approach is designed to efficiently learn base classifiers, which could enhance the interpretability of classification. Experimental results show that our proposed method can achieve effective early prediction on inter-class and intra-class imbalanced MTS data.

[1]  Philip S. Yu,et al.  Extracting Interpretable Features for Early Classification on Time Series , 2011, SDM.

[2]  Nitesh V. Chawla,et al.  Building Decision Trees for the Multi-class Imbalance Problem , 2012, PAKDD.

[3]  Longin Jan Latecki,et al.  Improving SVM classification on imbalanced time series data sets with ghost points , 2011, Knowledge and Information Systems.

[4]  F. Harrell,et al.  Abnormal Heart Rate Characteristics Preceding Neonatal Sepsis and Sepsis-Like Illness , 2003, Pediatric Research.

[5]  Ling Shao,et al.  A rapid learning algorithm for vehicle classification , 2015, Inf. Sci..

[6]  Cyrus Shahabi,et al.  Feature subset selection and feature ranking for multivariate time series , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Chengqi Zhang,et al.  A Comparative Study of Sampling Methods and Algorithms for Imbalanced Time Series Classification , 2012, Australasian Conference on Artificial Intelligence.

[8]  Juan José Rodríguez Diez,et al.  Boosting Interval-Based Literals: Variable Length and Early Classification , 2003 .

[9]  Xingming Sun,et al.  Structural Minimax Probability Machine , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Shahaboddin Shamshirband,et al.  ADAPTIVE NEURO-FUZZY COMPUTING TECHNIQUE FOR PRECIPITATION ESTIMATION , 2016 .

[11]  Fuzhen Zhuang,et al.  Fast Time Series Classification Based on Infrequent Shapelets , 2012, 2012 11th International Conference on Machine Learning and Applications.

[12]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[13]  Yifei Li,et al.  An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification , 2017, Knowl. Based Syst..

[14]  Mohamed F. Ghalwash,et al.  Utilizing temporal patterns for estimating uncertainty in interpretable early decision making , 2014, KDD.

[15]  Lu Chen,et al.  Probabilistic skyline queries on uncertain time series , 2016, Neurocomputing.

[16]  Juan José Rodríguez Diez,et al.  Early Fault Classification in Dynamic Systems Using Case-Based Reasoning , 2005, CAEPIA.

[17]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[18]  Mohamed F. Ghalwash,et al.  Early classification of multivariate temporal observations by extraction of interpretable shapelets , 2012, BMC Bioinformatics.

[19]  Gajendra Singh,et al.  Predictive Data Mining for Highly Imbalanced Classification , 2012 .

[20]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[21]  Yuhui Zheng,et al.  Image segmentation by generalized hierarchical fuzzy C-means algorithm , 2015, J. Intell. Fuzzy Syst..

[22]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[23]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[24]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[25]  Carlo Vercellis,et al.  Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification , 2010, Pattern Recognit..

[26]  Vincent Y. F. Tan,et al.  MOGT: Oversampling with a parsimonious mixture of Gaussian trees model for imbalanced time-series classification , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[27]  George C. Runger,et al.  Learning a symbolic representation for multivariate time series classification , 2015, Data Mining and Knowledge Discovery.

[28]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Xu Chen,et al.  Early prediction on imbalanced multivariate time series , 2013, CIKM.

[30]  Feipei Lai,et al.  A multiple measurements case-based reasoning method for predicting recurrent status of liver cancer patients , 2015, Comput. Ind..

[31]  See-Kiong Ng,et al.  Integrated Oversampling for Imbalanced Time Series Classification , 2013, IEEE Transactions on Knowledge and Data Engineering.

[32]  Mohamed F. Ghalwash,et al.  Extraction of Interpretable Multivariate Patterns for Early Diagnostics , 2013, 2013 IEEE 13th International Conference on Data Mining.

[33]  Yong Duan,et al.  Early classification on multivariate time series , 2015, Neurocomputing.

[34]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[35]  Edgar S. García-Treviño,et al.  Structural Generative Descriptions for Time Series Classification , 2014, IEEE Transactions on Cybernetics.

[36]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  See-Kiong Ng,et al.  SPO: Structure Preserving Oversampling for Imbalanced Time Series Classification , 2011, 2011 IEEE 11th International Conference on Data Mining.

[38]  Dalibor Petković,et al.  DETERMINATION OF IMPORTANT PARAMETERS FOR PATENT APPLICATIONS , 2017 .

[39]  Guohua Liang,et al.  An Effective Method for Imbalanced Time Series Classification: Hybrid Sampling , 2013, Australasian Conference on Artificial Intelligence.

[40]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[41]  Feipei Lai,et al.  Multiple-Time-Series Clinical Data Processing for Classification With Merging Algorithm and Statistical Measures , 2015, IEEE Journal of Biomedical and Health Informatics.

[42]  Henrik Boström,et al.  Boosting interval based literals , 2001, Intell. Data Anal..