Efficient discovery of risk patterns in medical data

OBJECTIVE This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. METHODS To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. RESULTS The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. CONCLUSION The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.

[1]  Jianying Hu,et al.  High-utility pattern mining: A method for discovery of high-utility item sets , 2007, Pattern Recognit..

[2]  Rüdiger W. Brause,et al.  A Frequent Patterns Tree Approach for Rule Generation with Categorical Septic Shock Patient Data , 2001, ISMDA.

[3]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[4]  Takahira Yamaguchi,et al.  Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis , 2004, PKDD.

[5]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[9]  Jinyan Li,et al.  Relative risk and odds ratio: a data mining perspective , 2005, PODS '05.

[10]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[11]  Stan Matwin,et al.  Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases , 2007 .

[12]  Jean-François Boulicaut,et al.  Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases , 2004 .

[13]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[14]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[15]  M. Ohsaki A Rule Discovery Support System for Sequential Medical Data,-In the Case Study of a Chronic Hepatitis Dataset- , 2002 .

[16]  Ada Wai-Chee Fu,et al.  Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Norberto F. Ezquerra,et al.  Constraining and summarizing association rules in medical data , 2006, Knowledge and Information Systems.

[18]  Rüdiger W. Brause,et al.  Proceedings of the First International Symposium on Medical Data Analysis , 2000 .

[19]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[20]  Zhi-Hua Zhou,et al.  Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble , 2003, IEEE Transactions on Information Technology in Biomedicine.

[21]  Geoffrey I. Webb,et al.  K-Optimal Rule Discovery , 2005, Data Mining and Knowledge Discovery.

[22]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[23]  Jie Chen,et al.  Mining risk patterns in medical data , 2005, KDD '05.

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Carlos Ordonez Comparing association rules and decision trees for disease prediction , 2006, HIKM '06.

[28]  Jie Chen,et al.  Temporal Sequence Associations for Rare Events , 2004, PAKDD.

[29]  C. Ordonez,et al.  Constraining and summarizing association rules in medical data , 2006 .

[30]  Kotagiri Ramamohanarao,et al.  Making Use of the Most Expressive Jumping Emerging Patterns for Classification , 2001, Knowledge and Information Systems.

[31]  Mario F. Triola,et al.  Biostatistics for the Biological and Health Sciences , 2005 .

[32]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[33]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[34]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[35]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[36]  Jinyan Li,et al.  Using Rules to Analyse Bio-medical Data: A Comparison between C4.5 and PCL , 2003, WAIM.

[37]  Geoffrey I. Webb Efficient search for association rules , 2000, KDD '00.

[38]  Yuni Xia,et al.  Proceedings of the international workshop on Healthcare information and knowledge management , 2006, CIKM 2006.

[39]  J. Hardin,et al.  Association rules and data mining in hospital infection control and public health surveillance. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[40]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.