Advancements in Data Management and Data Mining Approaches

Health systems are facing a number of challenges in the cost-effective delivery of health care with aging populations and a number of diseases such as obesity, cancer, and diabetes increasing in prevalence. At the same time the life sciences industry is also faced with historically low productivity and a dearth of new drugs to replace medicines reaching loss of exclusivity. Translational medicine has emerged as a science that can help tackle these challenges. The move toward electronic medical records in health systems has provided a rich source of new data for conducting research into the pathophysiology of disease. Increasingly, it is understood that not all drugs work the same in all patients, and tailoring the right drug to the right patient at the right time will help improve medical outcomes while also reducing the cost associated with mistreatment or overtreatment. Key to achieving this is the use of new molecular diagnostic techniques such as next-generation sequencing, which can help scientists and clinicians understand the pathophysiology of disease and also identify which drugs will work in which patients. In this chapter we outline a data management framework that can be used to properly integrate and analyze clinical data from medical records or clinical trials and molecular data from new sequencing technologies. The use of different data integration platforms is discussed and approaches to how these can be used as a backbone to enable data mining. Best practices in data mining are described and common techniques that are used in biomedical research are introduced with some use case examples.

[1]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[2]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.

[3]  Takashi Kawashima,et al.  Mapping brain activity at scale with cluster computing , 2014, Nature Methods.

[4]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[5]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[6]  Dina Aronzon,et al.  tranSMART: An Open Source Knowledge Management and High Content Data Analytics Platform , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[7]  L. Nadauld,et al.  Precision medicine to improve survival without increasing costs in advanced cancer patients. , 2015 .

[8]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[9]  Robert Tibshirani,et al.  Classification of patients from time-course gene expression. , 2013, Biostatistics.

[10]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[11]  A. Butte,et al.  Disease Risk Factors Identified Through Shared Genetic Architecture and Electronic Medical Records , 2014, Science Translational Medicine.

[12]  Ujjwal Maulik,et al.  Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning , 2014, IEEE Journal of Translational Engineering in Health and Medicine.

[13]  Yudong D. He,et al.  Expression profiling predicts outcome in breast cancer , 2002, Breast Cancer Research.

[14]  Tarek A. El-Ghazawi,et al.  Predicting the severity of motor neuron disease progression using electronic health record data with a cloud computing Big Data approach , 2014, 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology.

[15]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[16]  D. Pani,et al.  A Device for Local or Remote Monitoring of Hand Rehabilitation Sessions for Rheumatic Patients , 2014, IEEE Journal of Translational Engineering in Health and Medicine.

[17]  Christopher G. Chute,et al.  Some experiences and opportunities for big data in translational research , 2013, Genetics in Medicine.

[18]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[19]  Yike Guo,et al.  tranSMART: An Open Source and Community-Driven Informatics and Data Sharing Platform for Clinical and Translational Research , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.