EXTRACT: Strong Examples from Weakly-Labeled Sensor Data

Thanks to the rise of wearable and connected devices, sensor-generated time series comprise a large and growing fraction of the world's data. Unfortunately, extracting value from this data can be challenging, since sensors report low-level signals (e.g., acceleration), not the high-level events that are typically of interest (e.g., gestures). We introduce a technique to bridge this gap by automatically extracting examples of real-world events in low-level data, given only a rough estimate of when these events have taken place. By identifying sets of features that repeat in the same temporal arrangement, we isolate examples of such diverse events as human actions, power consumption patterns, and spoken words with up to 96% precision and recall. Our method is fast enough to run in real time and assumes only minimal knowledge of which variables are relevant or the lengths of events. Our evaluation uses numerous publicly available datasets and over 1 million samples of manually labeled sensor data.

[1]  Eamonn J. Keogh,et al.  Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[2]  Abdullah Mueen,et al.  Enumeration of time series motifs of all lengths , 2013, 2013 IEEE 13th International Conference on Data Mining.

[3]  Irfan A. Essa,et al.  Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  Eamonn J. Keogh,et al.  Flying Insect Classification with Inexpensive Sensors , 2014, Journal of Insect Behavior.

[5]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[6]  Majid Sarrafzadeh,et al.  Multi-dimensional signal search with applications in remote medical monitoring , 2013, 2013 IEEE International Conference on Body Sensor Networks.

[7]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[8]  Toyoaki Nishida,et al.  G-SteX: Greedy Stem Extension for Free-Length Constrained Motif Discovery , 2012, IEA/AIE.

[9]  Martin L. Griss,et al.  NuActiv: recognizing unseen new activities using semantic attribute-based learning , 2013, MobiSys '13.

[10]  Eamonn J. Keogh,et al.  Dot plots for time series analysis , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[11]  Peter Grosche,et al.  Unsupervised Music Structure Annotation by Time Series Structure Features and Segment Similarity , 2014, IEEE Transactions on Multimedia.

[12]  Eamonn J. Keogh,et al.  Detecting time series motifs under uniform scaling , 2007, KDD '07.

[13]  V. Niennattrakul,et al.  Parameter-free motif discovery for time series data , 2012, 2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.

[14]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[15]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[16]  Chotirat Ratanamahatana,et al.  Efficient Proper Length Time Series Motif Discovery , 2013, 2013 IEEE 13th International Conference on Data Mining.

[17]  Irfan A. Essa,et al.  Improving Activity Discovery with Automatic Neighborhood Estimation , 2007, IJCAI.

[18]  Irfan A. Essa,et al.  Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning , 2007, AAAI.

[19]  Eamonn J. Keogh,et al.  Scalable Clustering of Time Series with U-Shapelets , 2015, SDM.

[20]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[21]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[22]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[23]  Eamonn J. Keogh,et al.  Rare Time Series Motif Discovery from Unbounded Streams , 2014, Proc. VLDB Endow..

[24]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[25]  Machiko Toyoda,et al.  Pattern discovery in data streams under the time warping distance , 2012, The VLDB Journal.

[26]  Irfan A. Essa,et al.  Discovering Characteristic Actions from On-Body Sensor Data , 2006, 2006 10th IEEE International Symposium on Wearable Computers.

[27]  Fred Popowich,et al.  AMPds: A public dataset for load disaggregation and eco-feedback research , 2013, 2013 IEEE Electrical Power & Energy Conference.