Interactive Event-driven Knowledge Discovery from Data Streams

Author(s): Jalali, Laleh | Advisor(s): Jain, Ramesh | Abstract: With the proliferation of sensor data, a critical challenge is to interpret and extract knowledge from large-scale heterogeneous observational data. Most knowledge discovery frameworks relay on data mining techniques to extract interesting patterns. The problem of finding such patterns is NP-complete and the property of interestingness is not monotone since a pattern may be interesting, even if its subpatterns are not. In this dissertation a framework for interactive knowledge discovery from heterogeneous high-dimensional temporal data is presented. First, a high-level pattern formulation language is introduced. The language consists of an event model for fusing and abstracting data streams, a semi-interval time model for effectively representing temporal relations, and a set of expressive operators. Based on these operators, a visual and interactive framework is proposed which combines data-driven (bottom-up) and hypothesis-driven (top-down) analyses.This framework takes advantage of data-driven operators for pattern mining and investi- gating unknown unknowns to generate a basic model and derive a preliminary knowledge. It also uses domain expert knowledge to guide the process of revealing known unknowns. An expert can seed a hypothesis, based on prior knowledge or the knowledge derived from data-driven analysis, and grow it interactively using hypothesis-driven operators. In the con- text of the pattern mining component, novel time efficient algorithms are introduced which allow discovery of hidden event co-occurrences from multiple event streams. A prototype of the framework is implemented as a web based system which can be utilized as an effective tool for explanation and decision making in almost all disciplines. The applicability of this framework is evaluated in a healthcare application for asthma risk management and a human behavior understanding application, called Objective Self. These applications and experiments highlight the actionable knowledge that the framework can help uncover.

[1]  Yen-Liang Chen,et al.  Mining Nonambiguous Temporal Patterns for Interval-Based Events , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[3]  Gustavo Rossi,et al.  An approach to discovering temporal association rules , 2000, SAC '00.

[4]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ayumi Shinohara,et al.  A Practical Algorithm to Find the Best Episode Patterns , 2001, Discovery Science.

[6]  Katharina Morik,et al.  The Representation Race - Preprocessing for Handling Time Phenomena , 2000, ECML.

[7]  Murray Shanahan,et al.  An abductive event calculus planner , 2000, J. Log. Program..

[8]  Anthony K. H. Tung,et al.  Breaking the barrier of transactions: mining inter-transaction association rules , 1999, KDD '99.

[9]  Lior Rokach,et al.  Introduction to Knowledge Discovery in Databases , 2005, The Data Mining and Knowledge Discovery Handbook.

[10]  Kenneth D. Forbus Qualitative Process Theory , 1984, Artificial Intelligence.

[11]  Gerhard Deon Oosthuizen The use of a lattice in knowledge processing , 1988 .

[12]  C. A. R. Hoare,et al.  A Calculus of Durations , 1991, Inf. Process. Lett..

[13]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[14]  David C Logan,et al.  Known knowns, known unknowns, unknown unknowns and the propagation of scientific enquiry. , 2009, Journal of experimental botany.

[15]  Serene W. H. Wong,et al.  Integration, visualization and analysis of human interactome. , 2014, Biochemical and biophysical research communications.

[16]  Sago Deroski,et al.  Discovering Dynamics: From Inductive Logic Programming To Machine Discovery , 2002 .

[17]  Yutaka Hata,et al.  Asthmatic attacks prediction considering weather factors based on Fuzzy-AR model , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[18]  Paul R. Cohen,et al.  Fluent Learning: Elucidating the Structure of Episodes , 2001, IDA.

[19]  Gordon Bell,et al.  MyLifeBits: fulfilling the Memex vision , 2002, MULTIMEDIA '02.

[20]  James Abello,et al.  ASK-GraphView: A Large Scale Graph Visualization System , 2006, IEEE Transactions on Visualization and Computer Graphics.

[21]  Sabit Cakmak,et al.  Does air pollution increase the effect of aeroallergens on hospitalization for asthma? , 2012, The Journal of allergy and clinical immunology.

[22]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[23]  A. John Mallinckrodt,et al.  Qualitative reasoning: Modeling and simulation with incomplete knowledge , 1994, at - Automatisierungstechnik.

[24]  Rokia Missaoui,et al.  A partition-based approach towards constructing Galois (concept) lattices , 2002, Discret. Math..

[25]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[26]  Philippe Dague,et al.  Mathematical Foundations of Qualitative Reasoning , 2004, AI Mag..

[27]  Alan F. Smeaton,et al.  LifeLogging: Personal Big Data , 2014, Found. Trends Inf. Retr..

[28]  Cem Ersoy,et al.  A Review and Taxonomy of Activity Recognition on Mobile Phones , 2013 .

[29]  David Gotz,et al.  Interactive Intervention Analysis , 2012, AMIA.

[30]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[31]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[32]  James F. Allen An Interval-Based Representation of Temporal Knowledge , 1981, IJCAI.

[33]  Jiawei Han,et al.  TSP: mining top-K closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[34]  Chia-Hui Chang,et al.  COCOA: Compressed Continuity Analysis for Temporal Databases , 2004, PKDD.

[35]  Fabian Mörchen,et al.  Algorithms for time series knowledge mining , 2006, KDD '06.

[36]  Mohamed Medhat Gaber,et al.  Data Science and Distributed Intelligence: Recent Developments and Future Insights , 2012, IDC.

[37]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[38]  Ivan Bratko,et al.  Learning Qualitative Models through Partial Derivatives by Padé , 2007 .

[39]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[40]  Hayit Greenspan,et al.  Content-Based Image Retrieval in Radiology: Current Status and Future Directions , 2010, Journal of Digital Imaging.

[41]  Tetsuji Satoh,et al.  Twitter Bursts: Analysis of their Occurrences and Classifications , 2014, ICDS 2014.

[42]  Fabian Mörchen,et al.  Efficient mining of understandable patterns from multivariate interval time series , 2007, Data Mining and Knowledge Discovery.

[43]  Guoliang Xing,et al.  iSleep: unobtrusive sleep quality monitoring using smartphones , 2013, SenSys '13.

[44]  Ben Shneiderman,et al.  LifeLines: using visualization to enhance navigation and analysis of patient records , 1998, AMIA.

[45]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[46]  中園 薫 A Qualitative Physics Based on Confluences , 1986 .

[47]  J. Ager,et al.  Changes in weather and the effects on pediatric asthma exacerbations. , 2009, Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology.

[48]  Lhouari Nourine,et al.  A Fast Algorithm for Building Lattices , 1999, Inf. Process. Lett..

[49]  Krist Wongsuphasawat,et al.  Outflow : Visualizing Patient Flow by Symptoms and Outcome , 2011 .

[50]  Gemma Casas-Garriga Discovering Unbounded Episodes in Sequential Data , 2003 .

[51]  M S Magnusson,et al.  Discovering hidden time patterns in behavior: T-patterns and their detection , 2000, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[52]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[53]  Philip S. Yu,et al.  HierarchyScan: a hierarchical similarity search algorithm for databases of long sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[54]  Mehmet A. Orgun,et al.  Temporal Data Mining Using Hidden Markov-Local Polynomial Models , 2001, PAKDD.

[55]  Hudson Turner,et al.  Causal Theories of Action and Change , 1997, AAAI/IAAI.

[56]  J. Kleer Qualitative and Quantitative Knowledge in Classical Mechanics , 1975 .

[57]  Tadeusz Morzy,et al.  Efficient Constraint-Based Sequential Pattern Mining Using Dataset Filtering Techniques , 2002, BalticDB&IS.

[58]  Heidrun Schumann,et al.  CGV - An interactive graph visualization system , 2009, Comput. Graph..

[59]  Sowmya Ramachandran and Raymond J. Mooney and Benjamin J. Kuipers Learning Qualitative Models for Systems with Multiple Operating Regions , 1994 .

[60]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[61]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[62]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[63]  Michael P. Wellman Qualitative Simulation with Multivariate Constraints , 1991, KR.

[64]  Suh-Yin Lee,et al.  Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[65]  Daniel E. O'Leary,et al.  Artificial Intelligence and Big Data , 2013, IEEE Intelligent Systems.

[66]  Vincent S. Tseng,et al.  A novel data mining mechanism considering bio-signal and environmental data with applications on asthma monitoring , 2011, Comput. Methods Programs Biomed..

[67]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[68]  Gregory D. Abowd,et al.  A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications , 2001, Hum. Comput. Interact..

[69]  J. Moran,et al.  Sensation and perception , 1980 .

[70]  Dmitriy Fradkin,et al.  Robust Mining of Time Intervals with Semi-interval Partial Order Patterns , 2010, SDM.

[71]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[72]  Marek Wojciechowski Interactive Constraint-Based Sequential Pattern Mining , 2001, ADBIS.

[73]  Ivan Bratko,et al.  Q2 Prediction of ozone concentrations , 2006 .

[74]  Benjamin Charles Moszkowski Reasoning about Digital Circuits , 1983 .

[75]  Vinny Cahill,et al.  A framework for developing mobile, context-aware applications , 2004, Second IEEE Annual Conference on Pervasive Computing and Communications, 2004. Proceedings of the.

[76]  Mark Witkowski,et al.  Event Calculus Planning Through Satisfiability , 2004, J. Log. Comput..

[77]  Raymond Reiter,et al.  The Frame Problem in the Situation Calculus: A Simple Solution (Sometimes) and a Completeness Result for Goal Regression , 1991, Artificial and Mathematical Theory of Computation.

[78]  Ivan Bratko,et al.  Qualitatively Faithful Quantitative Prediction , 2003, IJCAI.

[79]  P. S. Sastry,et al.  Discovering frequent episodes and learning hidden Markov models: a formal connection , 2005, IEEE Transactions on Knowledge and Data Engineering.

[80]  Dimitrios Gunopulos,et al.  Episode Matching , 1997, CPM.

[81]  David Gotz,et al.  Exploring Flow, Factors, and Outcomes of Temporal Event Sequences with the Outflow Visualization , 2012, IEEE Transactions on Visualization and Computer Graphics.

[82]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[83]  Erik T. Mueller,et al.  Reasoning in the Event Calculus Using First-Order Automated Theorem Proving , 2005, FLAIRS Conference.

[84]  Sayan Ghosh,et al.  Challenges in Deep Learning for Multimodal Applications , 2015, ICMI.

[85]  Ansgar Scherp,et al.  Survey on modeling and indexing events in multimedia , 2014, Multimedia Tools and Applications.

[86]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[87]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[88]  Christian Freksa,et al.  Temporal Reasoning Based on Semi-Intervals , 1992, Artif. Intell..

[89]  Shantanu H. Joshi,et al.  Visual Systems for Interactive Exploration and Mining of Large-Scale Neuroimaging Data Archives , 2012, Front. Neuroinform..

[90]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[91]  Fangzhen Lin,et al.  Embracing Causality in Specifying the Indeterminate Effects of Actions , 1996, AAAI/IAAI, Vol. 1.

[92]  P. Barnes,et al.  Air pollution and asthma. , 1994, Postgraduate medical journal.

[93]  Tim W. Nattkemper,et al.  WHIDE—a web tool for visual data mining colocation patterns in multivariate bioimages , 2012, Bioinform..

[94]  Silvia Miksch,et al.  CareVis: Integrated visualization of computerized protocols and temporal patient data , 2006, Artif. Intell. Medicine.

[95]  Ben Shneiderman,et al.  Finding comparable temporal categorical records: A similarity measure with an interactive visualization , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[96]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[97]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..

[98]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[99]  Tobias Schreck,et al.  Visual Analytics of Urban Environments using High-Resolution Geographic Data , 2010, AGILE Conf..

[100]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[101]  Fei Wang,et al.  A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data , 2014, J. Biomed. Informatics.

[102]  Louiqa Raschid,et al.  A Graph Analytical Approach for Topic Detection , 2013, TOIT.

[103]  S. Sheridan,et al.  Relating Weather Types to Asthma-Related Hospital Admissions in New York State , 2012, EcoHealth.

[104]  Peter Struss,et al.  Model-Based Systems in the Automotive Industry , 2004, AI Mag..

[105]  Erik T. Mueller,et al.  Event Calculus Reasoning Through Satisfiability , 2004, J. Log. Comput..

[106]  J. Banegas,et al.  Short-term effects of air pollution on daily asthma emergency room admissions , 2003, European Respiratory Journal.

[107]  Bernhard Ganter,et al.  Two Basic Algorithms in Concept Analysis , 2010, ICFCA.

[108]  Yen-Liang Chen,et al.  Mining sequential patterns from multidimensional sequence data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[109]  Alan F. Smeaton,et al.  Evaluating Access Mechanisms for Multimodal Representations of Lifelogs , 2016, MMM.

[110]  John F. Roddick,et al.  Mining Relationships Between Interacting Episodes , 2004, SDM.

[111]  Michael Thielscher,et al.  Ramification and Causality , 1997, Artif. Intell..

[112]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[113]  Armistead G Russell,et al.  A focus on particulate matter and health. , 2009, Environmental science & technology.

[114]  Melody Y. Kiang,et al.  Qualitative reasoning in business, finance, and economics: Introduction , 1995, Decis. Support Syst..

[115]  Arbee L. P. Chen,et al.  An efficient algorithm for mining frequent sequences by a new strategy without support counting , 2004, Proceedings. 20th International Conference on Data Engineering.

[116]  Gary Milavetz,et al.  Global Surveillance, Prevention and Control of Chronic Respiratory Diseases: A Comprehensive Approach , 2008 .

[117]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[118]  M. Cazzola,et al.  Outdoor air pollution, climatic changes and allergic bronchial asthma , 2002, European Respiratory Journal.

[119]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[120]  Cláudia Antunes,et al.  Temporal Data Mining: an overview , 2001 .

[121]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[122]  Kenneth D. Forbus,et al.  Qualitative Modeling in Education , 2004, AI Mag..

[123]  Erik T. Mueller A Tool for Satisfiability-Based Commonsense Reasoning in the Event Calculus , 2004, FLAIRS Conference.

[124]  Yoav Shoham,et al.  A propositional modal logic of time intervals , 1991, JACM.

[125]  Haym Hirsh,et al.  Learning to Predict Rare Events in Event Sequences , 1998, KDD.

[126]  Riccardo Bellazzi,et al.  A Hybrid Input-Output Approach to Model Metabolic Systems: An Application to Intracellular Thiamine Kinetics , 2001, Journal of Biomedical Informatics.

[127]  Alain Ketterlin,et al.  Clustering Sequences of Complex Objects , 1997, KDD.

[128]  Denzil Ferreira,et al.  AWARE: Mobile Context Instrumentation Framework , 2015, Front. ICT.

[129]  Ben Shneiderman,et al.  A Visual Interface for Multivariate Temporal Data: Finding Patterns of Events across Multiple Histories , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[130]  Xiaodong Chen,et al.  Discovering Temporal Association Rules in Temporal Databases , 1998, IADT.

[131]  Henry A. Kautz,et al.  Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[132]  Chris North,et al.  Semantic Interaction for Sensemaking: Inferring Analytical Reasoning for Model Steering , 2012, IEEE Transactions on Visualization and Computer Graphics.

[133]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[134]  Ivan Bratko,et al.  Induction of Qualitative Trees , 2001, ECML.

[135]  Wan D. Bae,et al.  A Mobile Data Analysis Framework for Environmental Health Decision Support , 2012, 2012 Ninth International Conference on Information Technology - New Generations.