Discovering metric temporal constraint networks on temporal databases

OBJECTIVE In this paper, we propose the ASTPminer algorithm for mining collections of time-stamped sequences to discover frequent temporal patterns, as represented in the simple temporal problem (STP) formalism: a representation of temporal knowledge as a set of event types and a set of metric temporal constraints among them. To focus the mining process, some initial knowledge can be provided by the user, also expressed as an STP, that acts as a seed pattern for the searching procedure. In this manner, the mining algorithm will search for those frequent temporal patterns consistent with the initial knowledge. BACKGROUND Health organisations demand, for multiple areas of activity, new computational tools that will obtain new knowledge from huge collections of data. Temporal data mining has arisen as an active research field that provides new algorithms for discovering new temporal knowledge. An important point in defining different proposals is the expressiveness of the resulting temporal knowledge, which is commonly found in the bibliography in a qualitative form. METHODOLOGY ASTPminer develops an Apriori-like strategy in an iterative algorithm where, as a result of each iteration i, a set of frequent temporal patterns of size i is found that incorporates three distinctive mechanisms: (1) use of a clustering procedure over distributions of temporal distances between events to recognise similar occurrences as temporal patterns; (2) consistency checking of every combination of temporal patterns, which ensures the soundness of the resultant patterns; and (3) use of seed patterns to allow the user to drive the mining process. RESULTS To validate our proposal, several experiments were conducted over a database of time-stamped sequences obtained from polysomnography tests in patients with sleep apnea-hypopnea syndrome. ASTPminer was able to extract well-known temporal patterns corresponding to different manifestations of the syndrome. Furthermore, the use of seed patterns resulted in a reduction in the size of the search space, which reduced the number of possible patterns from 2.1×10⁷ to 1219 and reduced the number of frequent patterns found from 1167 to 340, thereby increasing the efficiency of the mining algorithm. CONCLUSIONS A temporal data mining technique for discovering frequent temporal patterns in collections of time-stamped event sequences is presented. The resulting patterns describe different and distinguishable temporal arrangements among sets of event types in terms of repetitive appearance and similarity of the dispositions between the same events. ASTPminer allows users to participate in the mining process by introducing domain knowledge in the form of a temporal pattern using the STP formalism. This knowledge constrains the search to patterns consistent with the provided pattern and improves the performance of the procedure.

[1]  Yen-Liang Chen,et al.  On mining multi-time-interval sequential patterns , 2009, Data Knowl. Eng..

[2]  Cynthia Brandt,et al.  Temporal query of attribute-value patient data: utilizing the constraints of clinical studies , 2003, Int. J. Medical Informatics.

[3]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[4]  Christophe Dousson,et al.  Discovering Chronicles with Numerical Time Constraints from Alarm Logs for Monitoring Dynamic Systems , 1999, IJCAI.

[5]  Paulo Félix,et al.  Mining Temporal Constraint Networks by Seed Knowledge Extension , 2011, AIME.

[6]  Riccardo Bellazzi,et al.  Mining Healthcare Data with Temporal Association Rules: Improvements and Assessment for a Practical Use , 2009, AIME.

[7]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[8]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Riccardo Bellazzi,et al.  Precedence Temporal Networks to represent temporal relationships in gene expression data , 2007, J. Biomed. Informatics.

[11]  Hongjun Lu,et al.  Beyond intratransaction association analysis: mining multidimensional intertransaction association rules , 2000, TOIS.

[12]  Riccardo Bellazzi,et al.  Predictive data mining in clinical medicine: a focus on selected methods and applications , 2011, WIREs Data Mining Knowl. Discov..

[13]  Sushil Jajodia,et al.  A general framework for time granularity and its application to temporal reasoning , 1998, Annals of Mathematics and Artificial Intelligence.

[14]  Silvia Miksch,et al.  Verification of temporal scheduling constraints in clinical practice guidelines , 2002, Artif. Intell. Medicine.

[15]  Philip S. Yu,et al.  Discovering Frequent Closed Partial Orders from Strings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Kaizhong Zhang,et al.  Combinatorial pattern discovery for scientific data: some preliminary results , 1994, SIGMOD '94.

[17]  Carlo Combi,et al.  Data mining with Temporal Abstractions: learning rules from time series , 2007, Data Mining and Knowledge Discovery.

[18]  Richard T. Snodgrass,et al.  Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data : SIGMOD '94, Minneapolis, Minnesota, May 24-27, 1994 , 1994, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[19]  W. Flemons,et al.  Obstructive Sleep Apnea , 2002 .

[20]  Roque Marín,et al.  Fuzzy theory approach for temporal model-based diagnosis: An application to medical domains , 2006, Artif. Intell. Medicine.

[21]  Roque Marín,et al.  Using temporal constraints for temporal abstraction , 2010, Journal of Intelligent Information Systems.

[22]  R. Yager,et al.  Approximate Clustering Via the Mountain Method , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[23]  Alessio Bottrighi,et al.  Towards a comprehensive treatment of repetitions, periodicity and temporal constraints in clinical guidelines , 2006, Artif. Intell. Medicine.

[24]  Philip Laird,et al.  Identifying and Using Patterns in Sequential Data , 1993, ALT.

[25]  Riccardo Bellazzi,et al.  Temporal data mining for the quality assessment of hemodialysis services , 2005, Artif. Intell. Medicine.

[26]  Roque Marín,et al.  Temporal similarity measures for querying clinical workflows , 2009, Artif. Intell. Medicine.

[27]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[28]  Abraham Otero,et al.  Algorithms for the analysis of polysomnographic recordings with customizable criteria , 2011, Expert Syst. Appl..

[29]  V. J. Rayward-Smith,et al.  Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition , 1999 .

[30]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[31]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[32]  Jacques Wainer,et al.  A Temporal Extension to the Parsimonious Covering Theory , 1996, SBIA.

[33]  Wolfgang Nejdl,et al.  Abstract temporal diagnosis in medical domains , 1997, Artif. Intell. Medicine.

[34]  Daniel J Buysse,et al.  Sleep–Related Breathing Disorders in Adults: Recommendations for Syndrome Definition and Measurement Techniques in Clinical Research , 2000 .

[35]  Yen-Liang Chen,et al.  Discovering multi-label temporal patterns in sequence databases , 2011, Inf. Sci..

[36]  Der-Ming Liou,et al.  Design and implementation of a web-based HL7 message generation and validation system , 2003, Int. J. Medical Informatics.

[37]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[38]  Rina Dechter,et al.  Temporal Constraint Networks , 1989, Artif. Intell..

[39]  Gemma Casas-Garriga,et al.  Summarizing Sequential Data with Closed Partial Orders. , 2005 .

[40]  Abraham Otero,et al.  A Data Mining Algorithm for Inducing Temporal Constraint Networks , 2010, IPMU.

[41]  Senén Barro,et al.  A model and a language for the fuzzy representation and handling of time , 1994 .

[42]  Daniel J Buysse,et al.  Sleep-related breathing disorders in adults: recommendations for syndrome definition and measurement techniques in clinical research. The Report of an American Academy of Sleep Medicine Task Force. , 1999, Sleep.

[43]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[44]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[45]  Philip S. Yu,et al.  Proceedings of the Eleventh International Conference on Data Engineering , 1995 .

[46]  Fabian Mörchen,et al.  Efficient mining of understandable patterns from multivariate interval time series , 2007, Data Mining and Knowledge Discovery.

[47]  Yen-Liang Chen,et al.  Discovering hybrid temporal patterns from sequences consisting of point- and interval-based events , 2009, Data Knowl. Eng..

[48]  Sushil Jajodia,et al.  Discovering calendar-based temporal association rules , 2003 .