On the stopping criteria for k-Nearest Neighbor in positive unlabeled time series classification problems

Positive unlabeled time series classification has become an important area during the last decade, as often vast amounts of unlabeled time series data are available but obtaining the corresponding labels is difficult. In this situation, positive unlabeled learning is a suitable option to mitigate the lack of labeled examples. In particular, self-training is a widely used technique due to its simplicity and adaptability. Within this technique, the stopping criterion, i.e., the decision of when to stop labeling, is a critical part, especially in the positive unlabeled context. We propose a self-training method that follows the positive unlabeled approach for time series classification and a family of parameter-free stopping criteria for this method. Our proposal uses a graphical analysis, applied to the minimum distances obtained by the k-Nearest Neighbor as the base learner, to estimate the class boundary. The proposed method is evaluated in an experimental study involving various time series classification datasets. The results show that our method outperforms the transductive results obtained by previous models.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[3]  Li Wei,et al.  Semi-supervised time series classification , 2006, KDD '06.

[4]  Eamonn J. Keogh,et al.  Towards Automatic Classification on Flying Insects Using Inexpensive Sensors , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[5]  Eamonn J. Keogh,et al.  A Minimum Description Length Technique for Semi-Supervised Time Series Classification , 2013, IRI.

[6]  Pedro Larrañaga,et al.  Learning Bayesian classifiers from positive and unlabeled examples , 2007, Pattern Recognit. Lett..

[7]  Shuang Yu,et al.  PE-PUC: A Graph Based PU-Learning Approach for Text Classification , 2007, MLDM.

[8]  Joan Serrà,et al.  An empirical evaluation of similarity measures for time series classification , 2014, Knowl. Based Syst..

[9]  Min Sheng,et al.  Robust Energy Efficiency Maximization in Cognitive Radio Networks: The Worst-Case Optimization Approach , 2015, IEEE Transactions on Communications.

[10]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Tim Oates,et al.  Visualization of multivariate time-series data in a neonatal ICU , 2012, IBM J. Res. Dev..

[13]  Duong Tuan Anh,et al.  Some Novel Improvements for MDL-Based Semi-supervised Classification of Time Series , 2014, ICCCI.

[14]  See-Kiong Ng,et al.  Ensemble Based Positive Unlabeled Learning for Time Series Classification , 2012, DASFAA.

[15]  J. L. Hodges,et al.  Rank Methods for Combination of Independent Experiments in Analysis of Variance , 1962 .

[16]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[17]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[18]  Hailin Li On-line and dynamic time warping for time series data mining , 2015, Int. J. Mach. Learn. Cybern..

[19]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[20]  Eamonn J. Keogh,et al.  DTW-D: time series semi-supervised learning from a single example , 2013, KDD.

[21]  Marco Zaffalon,et al.  JNCC2: The Java Implementation Of Naive Credal Classifier 2 , 2008 .

[22]  Elio Masciari,et al.  Exploiting structural similarity for effective Web information extraction , 2007, Data Knowl. Eng..

[23]  See-Kiong Ng,et al.  Learning to Identify Unexpected Instances in the Test Set , 2007, IJCAI.

[24]  See-Kiong Ng,et al.  Positive Unlabeled Leaning for Time Series Classification , 2011, IJCAI.

[25]  Rémi Gilleron,et al.  Text Classification from Positive and Unlabeled Examples , 2002 .

[26]  Mirjana Ivanovic,et al.  The Influence of Global Constraints on Similarity Measures for Time-Series Databases , 2011, Knowl. Based Syst..

[27]  Dechawut Wanichsan,et al.  Stopping Criterion Selection for Efficient Semi-supervised Time Series Classification , 2008, Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[28]  Giulia Bruno,et al.  Temporal Pattern Mining for Medical Applications , 2012 .

[29]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[30]  G. Hommel,et al.  Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses , 1988 .

[31]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[32]  Wanli Zuo,et al.  Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples , 2009, J. Comput..

[33]  Pedro Larrañaga,et al.  A partially supervised classification approach to dominant and recessive human disease gene prediction , 2007, Comput. Methods Programs Biomed..

[34]  C. Kirkpatrick,et al.  Technical Analysis: The Complete Resource for Financial Market Technicians , 2006 .

[35]  See-Kiong Ng,et al.  Negative Training Data Can be Harmful to Text Classification , 2010, EMNLP.

[36]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[37]  Witold Pedrycz,et al.  A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning , 2015, IEEE Transactions on Fuzzy Systems.

[38]  Robert P. W. Duin,et al.  Uniform Object Generation for Optimizing One-class Classifiers , 2002, J. Mach. Learn. Res..

[39]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[40]  Yosef Hochberg,et al.  Extensions of multiple testing procedures based on Simes' test , 1995 .

[41]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[42]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.