Matrix Profile XVI: Efficient and Effective Labeling of Massive Time Series Archives

In domains as diverse as entomology and sports medicine, analysts are routinely required to label large amounts of time series data. In a few rare cases, this can be done automatically with a classification algorithm. In many domains however, complex, noisy, and polymorphic data can defeat state-of-the-art classifiers, yet easily yield to human inspection and annotation. This is especially true if the human can access auxiliary information and previous annotations. This labeling task can be a significant bottleneck in scientific progress. For example, an entomology or sports physiology lab may produce several days worth of time series each day. In this work, we introduce an algorithm that greatly reduces the human effort required. Our interactive algorithm groups subsequences and invites the user to label a group's prototype, brushing the label to all members of the group. Thus, our task reduces to optimizing the grouping(s), to allow our system to ask the fewest questions of the user. As we shall show, on diverse domains, we can reduce the human effort by at least an order of magnitude, with no decrease in accuracy.

[1]  Denis S. Willett,et al.  Machine Learning for Characterization of Insect Vector Feeding , 2016, PLoS Comput. Biol..

[2]  Víctor M. González Suárez,et al.  Analyzing Accelerometer Data for Epilepsy Episode Recognition , 2015, SOCO.

[3]  Eamonn J. Keogh,et al.  A general framework for never-ending learning from time series streams , 2015, Data Mining and Knowledge Discovery.

[4]  Eamonn J. Keogh,et al.  Classification of streaming time series under more realistic assumptions , 2015, Data Mining and Knowledge Discovery.

[5]  Guang-Zhong Yang,et al.  Sensor Positioning for Activity Recognition Using Wearable Accelerometers , 2011, IEEE Transactions on Biomedical Circuits and Systems.

[6]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[7]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[8]  Alberto Fereres,et al.  Characterization of electrical penetration graphs of the Asian citrus psyllid, Diaphorina citri, in sweet orange seedlings , 2010 .

[9]  W. F. Tjallingii Comparison of AC and DC systems for electronic monitoring of stylet penetration activities by homopterans , 2001 .

[10]  Qiong Luo,et al.  ACTS: An Active Learning Method for Time Series Classification , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[11]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[12]  G. Hodges,et al.  AN IDENTIFICATION GUIDE TO THE WHITEFLIES (HEMIPTERA: ALEYRODIDAE) OF THE SOUTHEASTERN UNITED STATES , 2005 .

[13]  Francisco Adasme-Carreño,et al.  A2EPG: A new software for the analysis of electrical penetration graphs to study plant probing behaviour of hemipteran insects , 2015, Comput. Electron. Agric..

[14]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[15]  Robert S. Laramee,et al.  TimeClassifier: a visual analytic system for the classification of multi-dimensional time series data , 2015, The Visual Computer.

[16]  Wanli Ma,et al.  A Comprehensive Survey of the Feature Extraction Methods in the EEG Research , 2012, ICA3PP.

[17]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[18]  R G Mark,et al.  PhysioNet: a research resource for studies of complex physiologic and biomedical signals , 2000, Computers in Cardiology 2000. Vol.27 (Cat. 00CH37163).

[19]  Eamonn J. Keogh,et al.  Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL , 2011, 2011 IEEE 11th International Conference on Data Mining.