Semantic preservation of standardized healthcare documents in big data

BACKGROUND Standardized healthcare documents have a high adoption rate in today's hospital setup. This brings several challenges as processing the documents on a large scale takes a toll on the infrastructure. The complexity of these documents compounds the issue of handling them which is why applying big data techniques is necessary. The nature of big data techniques can trigger accuracy/semantic loss in health documents when they are partitioned for processing. This semantic loss is critical with respect to clinical use as well as insurance, or medical education. METHODS In this paper we propose a novel technique to avoid any semantic loss that happens during the conventional partitioning of healthcare documents in big data through a constraint model based on the conformance of clinical document standard and user based use cases. We used clinical document architecture (CDAR) datasets on Hadoop Distributed File System (HDFS) through uniquely configured setup. We identified the affected documents with respect to semantic loss after partitioning and separated them into two sets: conflict free documents and conflicted documents. The resolution for conflicted documents was done based on different resolution strategies that were mapped according to CDAR specification. The first part of the technique is focused in identifying the type of conflict in the blocks that arises after partitioning. The second part focuses on the resolution mapping of the conflicts based on the constraints applied depending on the validation and user scenario. RESULTS We used a publicly available dataset of CDAR documents, identified all conflicted documents and resolved all the them successfully to avoid any semantic loss. In our experiment we tested up to 87,000 CDAR documents and successfully identified the conflicts and resolved the semantic issues. CONCLUSION We have presented a novel study that focuses on the semantics of big data which did not compromise the performance and resolved the semantic issues risen during the processing of clinical documents.

[1]  Kazuhiko Ohe,et al.  A user-friendly tool to transform large scale administrative data into wide table format using a mapreduce program with a pig latin based script , 2012, BMC Medical Informatics and Decision Making.

[2]  Kensaku Kawamoto,et al.  Viewpoint Paper: The Clinical Document Architecture and the Continuity of Care Record: A Critical Analysis , 2006, J. Am. Medical Informatics Assoc..

[3]  Chris Farnell,et al.  Wireless patient monitoring system , 2014, 2014 IEEE Healthcare Innovation Conference (HIC).

[4]  Dayong Du,et al.  Apache Hive Essentials , 2015 .

[5]  Werner Ceusters,et al.  HL7 RIM: An Incoherent Standard , 2006, MIE.

[6]  Ik Indrajit,et al.  DICOM, HL7 and IHE: A basic primer on Healthcare Standards for Radiologists , 2007 .

[7]  Chen Ji,et al.  Medoop: A medical information platform based on Hadoop , 2013, 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services (Healthcom 2013).

[8]  Francisco Herrera,et al.  Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce , 2018, Inf. Fusion.

[9]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[10]  Farag Azzedin Towards a scalable HDFS architecture , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[11]  Amnon Shabo,et al.  Model Formulation: HL7 Clinical Document Architecture, Release 2 , 2006, J. Am. Medical Informatics Assoc..

[12]  Barry Robson,et al.  Data mining and clinical data repositories: Insights from a 667, 000 patient data set , 2006, Comput. Biol. Medicine.

[13]  Tim Benson,et al.  Principles of Health Interoperability: SNOMED CT, HL7 and FHIR , 2016 .

[14]  Tim Benson,et al.  Clinical Document Architecture , 2010 .

[15]  Hong Liu,et al.  Large-Scale Clinical Data Management and Analysis System Based on Cloud Computing , 2014 .

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Shahriar Akter,et al.  How ‘Big Data’ Can Make Big Impact: Findings from a Systematic Review and a Longitudinal Case Study , 2015 .

[18]  I. Kohane,et al.  Big Data and Machine Learning in Health Care. , 2018, JAMA.

[19]  C. McDonald,et al.  LOINC, a universal standard for identifying laboratory observations: a 5-year update. , 2003, Clinical chemistry.

[20]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[21]  Alec Wolman,et al.  MAUI: making smartphones last longer with code offload , 2010, MobiSys '10.

[22]  Arshdeep Bahga,et al.  A Cloud-based Approach for Interoperable Electronic Health Records (EHRs) , 2013, IEEE Journal of Biomedical and Health Informatics.

[23]  Kenneth D. Mandl,et al.  SMART on FHIR: a standards-based, interoperable apps platform for electronic health records , 2016, J. Am. Medical Informatics Assoc..

[24]  Kamran Sartipi,et al.  HL7 FHIR: An Agile and RESTful approach to healthcare information exchange , 2013, Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems.

[25]  Sungyoung Lee,et al.  Semantic transformation model for clinical documents in big data to support healthcare analytics , 2015, 2015 Tenth International Conference on Digital Information Management (ICDIM).

[26]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.