Discovering Evolving Temporal Information: Theory and Application to Clinical Databases

Functional dependencies (FDs) allow us to represent database constraints, corresponding to requirements as “patients having the same symptoms undergo the same medical tests.” Some research efforts have focused on extending such dependencies to consider also temporal constraints such as “patients having the same symptoms undergo in the next period the same medical tests.” Temporal functional dependencies are able to represent such kind of temporal constraints in relational databases. Another extension for FDs allows one to represent approximate functional dependencies (AFDs), as “patients with the same symptomsgenerallyundergo the same medical tests.” It enables data to deviate from the defined constraints according to a user-defined percentage. Approximate temporal functional dependencies (ATFDs) merge the concepts of temporal functional dependency and of approximate functional dependency. Among the different kinds of ATFD, the Approximate Pure Temporally Evolving Functional Dependencies ($$\textit{APE}$$-FDs for short) allow one to detect patterns on the evolution of data in the database and to discover dependencies as “For most patients with the same initial diagnosis, the same medical test is prescribed after the occurrence of same symptom.” Mining ATFDs from large databases may be computationally expensive. In this paper, we focus on $$\textit{APE}$$-FDs and prove that, unfortunately, verifying a single $$\textit{APE}$$-FD over a given database instance is in general NP-complete. In order to cope with this problem, we propose a framework for mining complex $$\textit{APE}$$-FDs in real-world data collections. In the framework, we designed and applied sound and advanced model-checking techniques. To prove the feasibility of our proposal, we used real-world databases from two medical domains (namely, psychiatry and pharmacovigilance) and tested the running prototype we developed on such databases.

[1]  Jørn Lind-Nielsen,et al.  BuDDy : A binary decision diagram package. , 1999 .

[2]  Sushil Jajodia,et al.  Logical design for temporal databases with multiple granularities , 1997, TODS.

[3]  Esteban Zimányi,et al.  Data Warehouse Systems , 2014, Data-Centric Systems and Applications.

[4]  Angelo Montanari,et al.  A Uniform Framework for Temporal Functional Dependencies with Multiple Granularities , 2011, SSTD.

[5]  Pietro Sala,et al.  The Price of Evolution in Temporal Databases , 2015, 2015 22nd International Symposium on Temporal Representation and Reasoning (TIME).

[6]  David L. Olson Descriptive Data Mining , 2017, Computational Risk Management.

[7]  E. F. Codd,et al.  Normalized data base structure: a brief tutorial , 1971, SIGFIDET '71.

[8]  Yuval Shahar,et al.  Temporal Information Systems in Medicine , 2010 .

[9]  Hannu Toivonen,et al.  Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[10]  Jean-Marc Petit,et al.  Functional and approximate dependency mining: database and FCA points of view , 2002, J. Exp. Theor. Artif. Intell..

[11]  Hui Xiong,et al.  Temporal Skeletonization on Sequential Data: Patterns, Categorization, and Visualization , 2016, IEEE Trans. Knowl. Data Eng..

[12]  Jef Wijsen Temporal Dependencies , 2009, Encyclopedia of Database Systems.

[13]  Carlo Combi,et al.  Querying temporal clinical databases on granular trends , 2012, J. Biomed. Informatics.

[14]  Angelo Montanari,et al.  The t4sql temporal query language , 2007, CIKM '07.

[15]  Carlo Combi,et al.  Modeling and Querying Temporal Semistructured Data , 2009, New Trends in Data Warehousing and Data Analysis.

[16]  Pietro Sala,et al.  Mining approximate temporal functional dependencies with pure temporal grouping in clinical databases , 2015, Comput. Biol. Medicine.

[17]  Pietro Sala,et al.  Approximate Interval-Based Temporal Dependencies: The Complexity Landscape , 2014, 2014 21st International Symposium on Temporal Representation and Reasoning.

[18]  Carlo Combi,et al.  Data mining with Temporal Abstractions: learning rules from time series , 2007, Data Mining and Knowledge Discovery.

[19]  Marián Dvorský Common Permutation Problem , 2008, ArXiv.

[20]  Jef Wijsen,et al.  Temporal FDs on complex objects , 1999, TODS.

[21]  Gabriela Ochoa,et al.  A PSO/ACO approach to knowledge discovery in a pharmacovigilance context , 2009, GECCO '09.

[22]  Victor Vianu Dynamic functional dependencies and database aging , 1987, JACM.

[23]  Christian S. Jensen,et al.  Extending Existing Dependency Theory to Temporal Databases , 1996, IEEE Trans. Knowl. Data Eng..

[24]  Pietro Sala,et al.  A Framework for Mining Evolution Rules and Its Application to the Clinical Domain , 2015, 2015 International Conference on Healthcare Informatics.

[25]  Pietro Sala,et al.  Mining approximate interval-based temporal dependencies , 2015, Acta Informatica.

[26]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[27]  M. Lindquist,et al.  Signal Selection and Follow-Up in Pharmacovigilance , 2002, Drug safety.

[28]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[29]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[30]  John W. Chinneck,et al.  Faster integer-feasibility in mixed-integer linear programs by branching to force change , 2011, Comput. Oper. Res..

[31]  Pietro Sala,et al.  Discovering Quantitative Temporal Functional Dependencies on Clinical Data , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).