Integrated Oversampling for Imbalanced Time Series Classification

This paper proposes a novel Integrated Oversampling (INOS) method that can handle highly imbalanced time series classification. We introduce an enhanced structure preserving oversampling (ESPO) technique and synergistically combine it with interpolation-based oversampling. ESPO is used to generate a large percentage of the synthetic minority samples based on multivariate Gaussian distribution, by estimating the covariance structure of the minority-class samples and by regularizing the unreliable eigen spectrum. To protect the key original minority samples, we use an interpolation-based technique to oversample a small percentage of synthetic population. By preserving the main covariance structure and intelligently creating protective variances in the trivial eigen dimensions, ESPO effectively expands the synthetic samples into the void area in the data space without being too closely tied with existing minority-class samples. This also addresses a key challenge for applying oversampling for imbalanced time series classification, i.e., maintaining the correlation between consecutive values through preserving the main covariance structure. Extensive experiments based on seven public time series data sets demonstrate that our INOS approach, used with support vector machines (SVM), achieved better performance over existing oversampling methods as well as state-of-the-art methods in time series classification.

[1]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[3]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Li Wei,et al.  Semi-supervised time series classification , 2006, KDD '06.

[6]  Xudong Jiang,et al.  Eigenfeature Regularization and Extraction in Face Recognition , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  See-Kiong Ng,et al.  SPO: Structure Preserving Oversampling for Imbalanced Time Series Classification , 2011, 2011 IEEE 11th International Conference on Data Mining.

[8]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[9]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[12]  Tieniu Tan,et al.  Null space-based kernel Fisher discriminant analysis for face recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[13]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[14]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[15]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[16]  Carla E. Brodley,et al.  Class Imbalance, Redux , 2011, 2011 IEEE 11th International Conference on Data Mining.

[17]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[18]  Alex ChiChung Kot,et al.  Manipulation Detection on Image Patches Using FusionBoost , 2012, IEEE Transactions on Information Forensics and Security.

[19]  Longin Jan Latecki,et al.  Improving SVM classification on imbalanced time series data sets with ghost points , 2011, Knowledge and Information Systems.

[20]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[21]  Xudong Jiang,et al.  Linear Subspace Learning-Based Dimensionality Reduction , 2011, IEEE Signal Processing Magazine.

[22]  M. N. Nguyen,et al.  pro-Positive Unlabeled Learning for Time Series Classification , 2022 .

[23]  Hanqing Lu,et al.  Solving the small sample size problem of LDA , 2002, Object recognition supported by user interaction for service robots.

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  Haibo He,et al.  RAMOBoost: Ranked Minority Oversampling in Boosting , 2010, IEEE Transactions on Neural Networks.

[26]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.