Implementing an Integrated Time-Series Data Mining Environment Based on Temporal Pattern Extraction Methods: A Case Study of an Interferon Therapy Risk Mining for Chronic Hepatitis

In this paper, we present the implementation of an integrated time-series data mining environment. Time-series data mining is one of key issues to get useful knowledge from databases. With mined time-series patterns, users can aware not only positive results but also negative result called risk after their observation period. However, users often face difficulties during time-series data mining process for data pre-processing method selection/construction, mining algorithm selection, and post-processing to refine the data mining process as other data mining processes. It is needed to develop a time-series data mining environment based on systematic analysis of the process. To get more valuable rules for domain experts from a time-series data mining process, we have designed an environment which integrates time-series pattern extraction methods, rule induction methods and rule evaluation methods with active human-system interaction. After implementing this environment, we have done a case study to mine time-series rules from blood and urine biochemical test database on chronic hepatitis patients. Then a physician has evaluated and refined his hypothesis on this environment. We discuss the availability of how much support to mine interesting knowledge for an expert.

[1]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  Takahira Yamaguchi,et al.  Constructive meta-learning with machine learning method repositories , 2004 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[6]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[7]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[8]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[9]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[10]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[11]  Yamaguchi Takahira,et al.  Development and Evaluation of an Integrated Time - Series KDD Environment -A Case Study of Medical KDD on Hepatitis- , 2004 .

[12]  Shusaku Tsumoto,et al.  Mining similar temporal patterns in long time-series data and its application to medicine , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Dino Pedreschi,et al.  Knowledge Discovery in Databases: PKDD 2004 , 2004, Lecture Notes in Computer Science.

[15]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[16]  Takahira Yamaguchi,et al.  Development and Evaluation of an Integrated Time-Series KDD Environment : A Case Study of Medical KDD on Hepatitis(Medical Data Mining)(Joint Workshop of Vietnamese Society of AI, SIGKBS-JSAI, ICS-IPSJ, and IEICE-SIGAI on Active Mining) , 2004 .

[17]  Takahira Yamaguchi,et al.  Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis , 2004, PKDD.

[18]  Ian Witten,et al.  Data Mining , 2000 .