Data Mining Techniques in Health Informatics: A Case Study from Breast Cancer Research

This paper presents a case study of using data mining techniques in the analysis of diagnosis and treatment events related to Breast Cancer disease. Data from over 16,000 patients has been pre-processed and several data mining techniques have been implemented by using Weka Waikato Environment for Knowledge Analysis. In particular, Generalized Sequential Patterns mining has been used to discover frequent patterns from disease event sequence profiles based on groups of living and deceased patients. Furthermore, five models have been evaluated in Classification with the objective to classify the patients based on selected attributes. This research showcases the data mining process and techniques to transform large amounts of patient data into useful information and potentially valuable patterns to help understand cancer outcomes.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[3]  Weiru Chen,et al.  Sequential Patterns Postprocessing for Structural Relation Patterns Mining , 2010, Strategic Advancements in Utilizing Data Mining and Warehousing Technologies.

[4]  Nosrat Shahsavar,et al.  Predicting Metastasis in Breast Cancer: Comparing a Decision Tree with Domain Experts , 2007, Journal of Medical Systems.

[5]  R. Geetha Ramani,et al.  Data Mining in Clinical Data Sets: A Review , 2012 .

[6]  José Antonio Gómez-Ruiz,et al.  A combined neural network and decision trees model for prognosis of breast cancer relapse , 2003, Artif. Intell. Medicine.

[7]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[8]  David A Rew,et al.  Understanding Outcomes in Cancer Surgery Through Time Structured Patient Records , 2011, Indian journal of surgical oncology.

[9]  Ben Shneiderman,et al.  Visual and Textual Consistency Checking Tools for Graphical User Interfaces , 1997, IEEE Trans. Software Eng..

[10]  Philip H. Goodman,et al.  Comparing the prediction accuracy of artifical neural networks and other statistical models for breast cancer survival , 1994, NIPS.

[11]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[12]  Lu Wang,et al.  Method for Knowledge Acquisition and Decision-Making Process Analysis in Clinical Decision Support System , 2014, ITBAM.

[13]  Andreas Holzinger,et al.  Trends in Interactive Knowledge Discovery for Personalized Medicine: Cognitive Science meets Machine Learning , 2014, IEEE Intell. Informatics Bull..

[14]  Sergio A. Alvarez,et al.  Mining Statistically Significant Associations for Exploratory Analysis of Human Sleep Data , 2006, IEEE Transactions on Information Technology in Biomedicine.

[15]  A Min Tjoa,et al.  The Relevance of Data Warehousing and Data Mining in the Field of Evidence-based Medicine to Support Healthcare Decision Making , 2007 .

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Uwe Aickelin,et al.  Discovering sequential patterns in a UK general practice database , 2012, Proceedings of 2012 IEEE-EMBS International Conference on Biomedical and Health Informatics.

[18]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[19]  Steven Roberts,et al.  Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer – a classification tree approach , 2006, BMC Cancer.

[20]  Yuh-Jye Lee,et al.  Survival-Time Classification of Breast Cancer Patients , 2003, Comput. Optim. Appl..