Role and Challenges of Unstructured Big Data in Healthcare

Unprecedented growth in the volume of unstructured healthcare data has immense potential in valuable insight extraction, improved healthcare services, quality patient care, and secure data management. However, technological advancements are required to achieve the potential benefits from unstructured data in healthcare according to the growth rate. The heterogeneity, diversity of sources, quality of data and various representations of unstructured data in healthcare increases the number of challenges as compared to structured data. This systematic review of the literature identifies the challenges and problems of data-driven healthcare due to the unstructured nature of data. The systematic review was carried out using five major scientific databases: ACM, Springer, ScienceDirect, PubMed, and IEEE Xplore. The inclusion of articles in review at the initial stage was based on English language and publication date from 2010 to 2018. A total of 103 articles were selected according to the inclusion criteria. Based on the review, various types of healthcare unstructured data have been discussed from different domains of healthcare. Also, potential challenges associated with unstructured big data have been identified in healthcare for future research directions in the technological advancement of healthcare services and quality patient care.

[1]  S. Gomathi,et al.  Implementing Big Data analytics to predict Systemic Lupus Erythematosus , 2015, 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[2]  Ludwig Lausser,et al.  Big data and precision medicine: challenges and strategies with healthcare data , 2018, International Journal of Data Science and Analytics.

[3]  K. Luyckx,et al.  Data integration of structured and unstructured sources for assigning clinical codes to patient stays , 2015, J. Am. Medical Informatics Assoc..

[4]  Sungyoung Lee,et al.  An Innovative Platform for Person-Centric Health and Wellness Support , 2015, IWBBIO.

[5]  Meihui Zhang,et al.  GEMINI: An Integrative Healthcare Analytics System , 2014, Proc. VLDB Endow..

[6]  Ruzanna Chitchyan,et al.  Data exfiltration: A review of external attack vectors and countermeasures , 2018, J. Netw. Comput. Appl..

[7]  Mario Andrés Paredes-Valverde,et al.  DiabSoft: A System for Diabetes Prevention, Monitoring, and Treatment , 2018 .

[8]  Yichuan Wang,et al.  An integrated big data analytics-enabled transformation model: Application to health care , 2018, Inf. Manag..

[9]  Haihua Xu,et al.  NLP based congestive heart failure case finding: A prospective analysis on statewide electronic medical records , 2015, Int. J. Medical Informatics.

[10]  Hongfang Liu,et al.  Generality and reuse in a common type system for clinical natural language processing , 2011, MIXHS '11.

[11]  Matthias Reumann,et al.  Use of big data for drug development and for public and personal health and care , 2017, Genetic Epidemiology.

[12]  Alex Thomo,et al.  A new approach to distinguish migraine from stroke by mining structured and unstructured clinical data-sources , 2016, Network Modeling Analysis in Health Informatics and Bioinformatics.

[13]  Darelle van Greunen,et al.  Electronic Health Records: Benefits and Challenges for Data Quality , 2017, Handbook of Large-Scale Distributed Computing in Smart Healthcare.

[14]  Özlem Uzuner,et al.  A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases , 2015, J. Biomed. Informatics.

[15]  Ivo D. Dinov,et al.  SOCR data dashboard: an integrated big data archive mashing medicare, labor, census and econometric information , 2015, Journal of Big Data.

[16]  Boris Villazón-Terrazas,et al.  Building a Mental Health Knowledge Model to Facilitate Decision Support , 2016, PKAW.

[17]  Alexander Kupriyanov,et al.  Particular Use of BIG DATA in Medical Diagnostic Tasks , 2018 .

[18]  Harry Hochheiser,et al.  An information model for computable cancer phenotypes , 2016, BMC Medical Informatics and Decision Making.

[19]  Elizabeth S. Chen,et al.  Mining the electronic health record for disease knowledge. , 2014, Methods in molecular biology.

[20]  Wouter Joosen,et al.  What electronic health records don’t know just yet. A privacy analysis for patient communities and health records interaction , 2012, Health and Technology.

[21]  Kok-Leong Ong,et al.  Big data applications in engineering and science , 2016 .

[22]  Tod Davis,et al.  Automated identification of pediatric appendicitis score in emergency department notes using natural language processing , 2017, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[23]  Hongfang Liu,et al.  Using Unstructured Data to Identify Readmitted Patients , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[24]  Beng Chin Ooi,et al.  From Big Data to Data Science: A Multi-disciplinary Perspective , 2014, Big Data Res..

[25]  Xiaohua Sun,et al.  Visual content correlation analysis , 2010 .

[26]  Adam B. Wilcox,et al.  Leveraging Electronic Health Records for Phenotyping , 2015 .

[27]  A. Brand,et al.  Personalized Medicine: What's in it for Rare Diseases? , 2017, Advances in experimental medicine and biology.

[28]  Shervin Malmasi,et al.  Extracting Healthcare Quality Information from Unstructured Data , 2017, AMIA.

[29]  Reza Samavi,et al.  Machine Learning and Mobile Health Monitoring Platforms: A Case Study on Research and Implementation Challenges , 2018, Journal of Healthcare Informatics Research.

[30]  Mei Liu,et al.  Role of text mining in early identification of potential drug safety issues. , 2014, Methods in molecular biology.

[31]  Aurelle Tchagna Kouanou,et al.  An optimal big data workflow for biomedical image analysis , 2018 .

[32]  Chonho Lee,et al.  A Data Analytics Pipeline for Smart Healthcare Applications , 2017 .

[33]  Charles Y. Chiu,et al.  Erratum to: Clinical metagenomic identification of Balamuthia mandrillaris encephalitis and assembly of the draft genome: the continuing case for reference genome sequencing , 2016, Genome Medicine.

[34]  Özlem Uzuner,et al.  Automatic prediction of coronary artery disease from clinical narratives , 2017, J. Biomed. Informatics.

[35]  Masaki Iwamura,et al.  A cross-sectional study of the association between dynapenia and higher-level functional capacity in daily living in community-dwelling older adults in Japan , 2017, BMC Geriatrics.

[36]  Sadiq Ur Rehman,et al.  A Review on Big Data Security and Privacy in Healthcare Applications , 2017 .

[37]  Raj M. Ratwani,et al.  Exploring methods for identifying related patient safety events using structured and unstructured data , 2015, J. Biomed. Informatics.

[38]  Liping Li,et al.  A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system , 2016, BMC Medical Informatics and Decision Making.

[39]  Magnus Rattray,et al.  Making sense of big data in health research: Towards an EU action plan , 2016, Genome Medicine.

[40]  Asoke K. Talukder Big Data Analytics Advances in Health Intelligence, Public Health, and Evidence-Based Precision Medicine , 2017, BDA.

[41]  Lynda R Hardy,et al.  Data Science: Transformation of Research and Scholarship , 2017 .

[42]  Nitesh V. Chawla,et al.  The State of Data in Healthcare: Path Towards Standardization , 2018, J. Heal. Informatics Res..

[43]  Pradeep Kumar Ray,et al.  Coronary artery disease risk assessment from unstructured electronic health records using text mining , 2015, J. Biomed. Informatics.

[44]  Daniel S. Margulies,et al.  2015 Brainhack Proceedings , 2016, GigaScience.

[45]  Mohammad-Reza Siadat,et al.  Unstructured medical image query using big data - An epilepsy case study , 2016, J. Biomed. Informatics.

[46]  N. Mehta,et al.  Cognitive Computing for Electronic Medical Records , 2016 .

[47]  Daniel Sonntag,et al.  A novel tool for the identification of correlations in medical data by faceted search , 2017, Comput. Biol. Medicine.

[48]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[49]  Judy Krueger,et al.  The Automatic Clinical Trial: Leveraging the Electronic Medical Record in Multisite Cancer Clinical Trials , 2012, Current Oncology Reports.

[50]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[51]  Arun Sundararaman,et al.  Novel Approach to Predict Hospital Readmissions Using Feature Selection from Unstructured Data with Class Imbalance , 2018, Big Data Res..

[52]  Ivo D. Dinov,et al.  Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data , 2016, GigaScience.

[53]  Jing Zhao,et al.  Ensembles of randomized trees using diverse distributed representations of clinical events , 2016, BMC Medical Informatics and Decision Making.

[54]  P. Erhan Eren,et al.  Big Data in mHealth , 2018 .

[55]  Mohammad-Reza Siadat,et al.  Extensible Query Framework for Unstructured Medical Data -- A Big Data Approach , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[56]  H. Hamidi,et al.  Business Challenges of Big Data Application in Health Organization , 2018 .

[57]  Jing Zhao,et al.  Modeling heterogeneous clinical sequence data in semantic space for adverse drug event detection , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[58]  A. Baldridge,et al.  Performance of an electronic health record-based phenotype algorithm to identify community associated methicillin-resistant Staphylococcus aureus cases and controls for genetic association studies , 2016, BMC Infectious Diseases.

[59]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[60]  Terry Anthony Byrd,et al.  Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations , 2018 .

[61]  Paloma Martínez,et al.  Turning user generated health-related content into actionable knowledge through text analytics services , 2016, Comput. Ind..

[62]  Fadoua Khennou,et al.  Designing a health data management system based hadoop-agent , 2016, 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt).

[63]  Kyung-Yong Chung,et al.  Associative Feature Information Extraction Using Text Mining from Health Big Data , 2019, Wirel. Pers. Commun..

[64]  Giovanni Rinaldi,et al.  An Introduction to the Technological Basis of eHealth , 2014 .

[65]  Plamen P. Angelov,et al.  A Nested Hierarchy of Dynamically Evolving Clouds for Big Data Structuring and Searching , 2015, INNS Conference on Big Data.

[66]  Sergey V. Kovalchuk,et al.  Towards Infrastructure for Knowledge-based Decision Support in Clinical Practice , 2016 .

[67]  Hadi Kharrazi,et al.  Comparing clinician descriptions of frailty and geriatric syndromes using electronic health records: a retrospective cohort study , 2017, BMC Geriatrics.

[68]  Arpan Kumar Kar,et al.  "Technology enabled Health" - Insights from twitter analytics with a socio-technical perspective , 2018, Int. J. Inf. Manag..

[69]  Christian E. Pulmano,et al.  Towards Developing an Intelligent Agent to Assist in Patient Diagnosis Using Neural Networks on Unstructured Patient Clinical Notes: Initial Analysis and Models , 2016, CENTERIS/ProjMAN/HCist.

[70]  S. Lavanya,et al.  Predictive Methodology for Diabetic Data Analysis in Big Data , 2015 .

[71]  T. Delespierre,et al.  Empirical advances with text mining of electronic health records , 2017, BMC Medical Informatics and Decision Making.

[72]  Zhiyong Lu,et al.  Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health. , 2016, Advances in experimental medicine and biology.

[73]  Dhanapal Jayalatchumy,et al.  Prediction of Diseases Using Hadoop in Big Data - A Modified Approach , 2017, CSOC.

[74]  Sachchidanand Singh,et al.  Object classification to analyze medical imaging data using deep learning , 2017, 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[75]  Wendy W. Chapman,et al.  Natural Language Processing for Biosurveillance , 2006, Handbook of Biosurveillance.

[76]  Kwang Hyeon Kim,et al.  A text-based data mining and toxicity prediction modeling system for a clinical decision support in radiation oncology: A preliminary study , 2017 .

[77]  Amit Kumar Das,et al.  Application of twitter in health care sector for India , 2016, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT).

[78]  Tudor Groza,et al.  CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital , 2017, bioRxiv.

[79]  Humphrey Sorensen,et al.  Accurate classification of socially generated medical discourse , 2018, International Journal of Data Science and Analytics.

[80]  James B Semmens,et al.  Big data and ophthalmic research. , 2016, Survey of ophthalmology.

[81]  Weider D. Yu,et al.  A distributed storage solution for cloud based e-Healthcare Information System , 2013, 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013).

[82]  H. Hemmings,et al.  Information technology innovation: the power and perils of big data. , 2015, British journal of anaesthesia.

[83]  Bikash Kanti Sarkar,et al.  Big data for secure healthcare system: a conceptual design , 2017 .

[84]  Filipe Portela,et al.  Data Science Analysis of HealthCare Complaints , 2018, WorldCIST.

[85]  J. Zijlstra,et al.  Emerging pan-resistance in Trichosporon species: a case report , 2016, BMC Infectious Diseases.

[86]  Girish Suryanarayana,et al.  Cloud-Enabled Search for Disparate Healthcare Data: A Case Study , 2013, 2013 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM).

[87]  Marijn Janssen,et al.  A Process Pattern Model for Tackling and Improving Big Data Quality , 2018, Inf. Syst. Frontiers.

[88]  Martin Necaský,et al.  Linked Open Data for Healthcare Professionals , 2013, IIWAS '13.

[89]  Srinivasa Rao Kundeti,et al.  Clinical named entity recognition: Challenges and opportunities , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[90]  Raymond Y. K. Lau,et al.  Smart health: Big data enabled health paradigm within smart cities , 2017, Expert Syst. Appl..

[91]  Alexey N. Yakovlev,et al.  Pattern-based Mining in Electronic Health Records for Complex Clinical Process Analysis , 2017 .

[92]  Fred Kusumoto,et al.  The application of Big Data in medicine: current implications and future directions , 2016, Journal of Interventional Cardiac Electrophysiology.

[93]  C. Kane,et al.  Active Surveillance of Prostate Cancer in a Community Practice: How to Measure, Manage, and Improve? , 2016, Urology.

[94]  Anutosh Maitra,et al.  A Novel Text Analysis Platform for Pharmacovigilance of Clinical Drugs , 2014, Complex Adaptive Systems.

[95]  Peter M. A. van Ooijen,et al.  Medical Imaging Informatics in Nuclear Medicine , 2017 .

[96]  Cristiano André da Costa,et al.  Internet of Health Things: Toward intelligent vital signs monitoring in hospital wards , 2018, Artif. Intell. Medicine.

[97]  C. Lovis,et al.  Big Data in Israeli healthcare: hopes and challenges report of an international workshop , 2015, Israel Journal of Health Policy Research.

[98]  P. Avillach,et al.  Representation of Patient Data in Health Information Systems and Electronic Health Records , 2014 .

[99]  Ji Zhang,et al.  Decision support systems for adoption in dental clinics: A survey , 2016, Knowl. Based Syst..

[100]  Stuart M. Speedie,et al.  The value of structured data elements from electronic health records for identifying subjects for primary care clinical trials , 2015, BMC Medical Informatics and Decision Making.

[101]  Hongfang Liu,et al.  Integrating Structured and Unstructured EHR Data Using an FHIR-based Type System: A Case Study with Medication Data , 2018, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[102]  Sherif Sakr,et al.  Towards a Comprehensive Data Analytics Framework for Smart Healthcare Services , 2016, Big Data Res..

[103]  D. Vasumathi,et al.  Predictive Methodology for Women Health Analysis Through Social Media , 2018 .

[104]  Sonja Zillner,et al.  Technology Roadmap Development for Big Data Healthcare Applications , 2014, KI - Künstliche Intelligenz.

[105]  Hongfang Liu,et al.  A common type system for clinical natural language processing , 2013, J. Biomed. Semant..

[106]  Stéphane M. Meystre,et al.  De-identification of Unstructured Clinical Data for Patient Privacy Protection , 2015, Medical Data Privacy Handbook.

[107]  Shaun J. Grannis,et al.  Using structured and unstructured data to identify patients' need for services that address the social determinants of health , 2017, Int. J. Medical Informatics.

[108]  Siddhartha R. Jonnalagadda,et al.  Text Mining of the Electronic Health Record: An Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials , 2017, Journal of Cardiovascular Translational Research.

[109]  Indu Khatri,et al.  A Survey of Big Data in Healthcare Industry , 2016 .