A SMOTE Extension for Balancing Multivariate Epilepsy-Related Time Series Datasets

In some cases, big data bunches are in the form of Time Series (TS), where the occurrence of complex TS events are rarely presented. In this scenario, learning algorithms need to cope with the TS data balancing problem, which has been barely studied for TS datasets. This research addresses this issue, describing a very simple TS extension of the well-known SMOTE algorithm for balancing datasets. To validate the proposal, it is applied to a realistic dataset publicly available containing epilepsy-related TS. A study on the characteristics of the dataset before and after the performance of this TS balancing algorithm is performed, showing evidence on the requirements for the research on this topic, the energy efficiency of the algorithm and the TS generation process among them.

[1]  María José del Jesús,et al.  A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets , 2013, Knowl. Based Syst..

[2]  Szymon Wilk,et al.  Selective Pre-processing of Imbalanced Data for Improving Classification Performance , 2008, DaWaK.

[3]  Víctor M. González Suárez,et al.  Generalized Models for the Classification of Abnormal Movements in Daily Life and its Applicability to Epilepsy Convulsion Recognition , 2016, Int. J. Neural Syst..

[4]  Chandran Saravanan,et al.  Discovering flood rising pattern in hydrological time series data mining during the pre monsoon period , 2015 .

[5]  Longin Jan Latecki,et al.  Improving SVM classification on imbalanced time series data sets with ghost points , 2011, Knowledge and Information Systems.

[6]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[7]  S. Tang,et al.  The generation mechanism of synthetic minority class examples , 2008, 2008 International Conference on Information Technology and Applications in Biomedicine.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Theresa L. Utlaut,et al.  Introduction to Time Series Analysis and Forecasting , 2008 .

[10]  Herna L. Viktor,et al.  SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[11]  Víctor M. González Suárez,et al.  Identification of abnormal movements with 3D accelerometer sensors for seizure recognition , 2017, J. Appl. Log..

[12]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[13]  Sándor Beniczky,et al.  Detection of generalized tonic–clonic seizures by a wireless wrist accelerometer: A prospective, multicenter study , 2013, Epilepsia.

[14]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[15]  Camelia Chira,et al.  Improving Human Activity Recognition and its Application in Early Stroke Diagnosis , 2015, Int. J. Neural Syst..

[16]  Sieu Phan,et al.  A novel pattern based clustering methodology for time-series microarray data , 2007, Int. J. Comput. Math..

[17]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[18]  Diana Moses,et al.  A survey of data mining algorithms used in cardiovascular disease diagnosis from multi-lead ECG data , 2015 .