Putting the Human in the Time Series Analytics Loop

Time series are one of the most common data types in nature. Given this fact, there are dozens of query-by-sketching/ query-by-example/ query-algebra systems proposed to allow users to search large time series collections. However, none of these systems have seen widespread adoption. We argue that there are two reasons why this is so. The first reason is that these systems are often complex and unintuitive, requiring the user to understand complex syntax/interfaces to construct high-quality queries. The second reason is less well appreciated. The expressiveness of most query-by-content systems is surprisingly limited. There are well defined, simple queries that cannot be answered by any current query-by-content system, even if it uses a state-of-the-art distance measure such as Dynamic Time Warping. In this work, we propose a natural language search mechanism for searching time series. We show that our system is expressive, intuitive, and requires little space and time overhead. Because our system is text-based, it can leverage decades of research text retrieval, including ideas such as relevance feedback. Moreover, we show that our system subsumes both motif/discord discovery and most existing query-by-content systems in the literature. We demonstrate the utility of our system with case studies in domains as diverse as animal motion studies, medicine and industry.

[1]  Zhe Zhao,et al.  Predicting bursts and popularity of hashtags in real-time , 2014, SIGIR.

[2]  Michael Gleicher,et al.  The semantics of sketch: Flexibility in visual query systems for time series data , 2016, 2016 IEEE Conference on Visual Analytics Science and Technology (VAST).

[3]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[4]  S. N. Sivanandam,et al.  Introduction to Data Mining and its Applications , 2006, Studies in Computational Intelligence.

[5]  Liqing Zhang,et al.  Sketch-based Image Retrieval via Shape Words , 2015, ICMR.

[6]  Karrie Karahalios,et al.  ShapeSearch: Flexible Pattern-based Querying of Trend Line Visualizations , 2018, Proc. VLDB Endow..

[7]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[8]  Maarten de Rijke Learning to Search for Datasets , 2018, WWW.

[9]  Azza Abouzeid,et al.  Qetch: Time Series Querying with Expressive Sketches , 2018, SIGMOD Conference.

[10]  Milad Shokouhi,et al.  Detecting seasonal queries by time-series analysis , 2011, SIGIR.

[11]  Milos Hauskrecht,et al.  A Flexible Forecasting Framework for Hierarchical Time Series with Seasonal Patterns: A Case Study of Web Traffic , 2018, SIGIR.

[12]  Monique A Ladds,et al.  Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours , 2016, PloS one.

[13]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[14]  Elad Yom-Tov,et al.  Inferring Individual Attributes from Search Engine Queries and Auxiliary Information , 2016, WWW.

[15]  David B. Lomet,et al.  Foundations of Data Organization and Algorithms , 1993, Lecture Notes in Computer Science.

[16]  Qifa Ke,et al.  Conversational Query Understanding Using Sequence to Sequence Modeling , 2018, WWW.

[17]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2013, Data Mining and Knowledge Discovery.

[18]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[19]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[20]  Seán F. McLoone,et al.  The Use of Ensemble Empirical Mode Decomposition With Canonical Correlation Analysis as a Novel Artifact Removal Technique , 2013, IEEE Transactions on Biomedical Engineering.

[21]  Ben Shneiderman,et al.  A dynamic query interface for finding patterns in time series data , 2002, CHI Extended Abstracts.

[22]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[23]  Panagiotis Papapetrou,et al.  On searching and indexing sequences of temporal intervals , 2017, Data Mining and Knowledge Discovery.

[24]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[25]  Arun Kumar,et al.  SpeakQL: Towards Speech-driven Multi-modal Querying , 2017, HILDA@SIGMOD.

[26]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[27]  J. Ahern,et al.  Identification of Spikes in Time Series , 2018, Epidemiologic Methods.

[28]  J. Shane Culpepper,et al.  Torch: A Search Engine for Trajectory Data , 2018, SIGIR.

[29]  Carsten Binnig,et al.  Making the Case for Query-by-Voice with EchoQuery , 2016, SIGMOD Conference.

[30]  Fabio Crestani,et al.  Tracking Sentiment by Time Series Analysis , 2016, SIGIR.

[31]  Eamonn J. Keogh,et al.  Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining , 2018, 2018 IEEE International Conference on Big Knowledge (ICBK).

[32]  S. Sumathi,et al.  Introduction to Data Mining and its Applications (Studies in Computational Intelligence) , 2006 .

[33]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[34]  Walter Augustine Wells,et al.  A Thesaurus of Medical Words and Phrases , 2009 .

[35]  Amit P. Sheth,et al.  RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem , 2017, AAAI.

[36]  Eamonn J. Keogh,et al.  Relevance feedback retrieval of time series data , 1999, SIGIR '99.