Storing and Querying DICOM Data with HYTORMO

In the health care industry, DICOM Digital Imaging and Communication in Medicine standard has become very popular for storage and transmission of digital medical images and reports. The ever-increasing size, high velocity and variety of the DICOM data collections make them more and more inefficient to be stored and queried them using a single data storage technique, e.g., a row store or a column store. In this study, we first highlight challenges in DICOM data management. We then describe HYTORMO, a new model to store and query the DICOM data. HYTORMO uses a hybrid data storage strategy that is aimed not only to leverage the advantage of both row and column stores, but also to attempt to keep a trade-off among reducing disk I/O cost, reducing tuple construction cost and reducing storage space. In addition, Bloom filters are applied to reduce network I/O cost during query processing. We prototyped our model on the top of Spark. Our preliminary experiments validate the proposed model in real DICOM datasets and show the effectiveness of our method.

[1]  Laurent d'Orazio,et al.  Toward intersection filter-based optimization for joins in MapReduce , 2013, Cloud-I '13.

[2]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[3]  Aldo von Wangenheim,et al.  DCMDSM: a DICOM decomposed storage model , 2014, J. Am. Medical Informatics Assoc..

[4]  Guido Moerkotte,et al.  Heuristic and randomized optimization for the join ordering problem , 1997, The VLDB Journal.

[5]  Verena Kantere,et al.  Adaptive query execution for data management in the cloud , 2010, CloudDB '10.

[6]  Oleg S. Pianykh,et al.  Digital Imaging and Communications in Medicine : A Practical Introduction and Survival Guide , 2008 .

[7]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[8]  Umar Farooq Minhas,et al.  SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..

[9]  Philipp Rösch,et al.  A Storage Advisor for Hybrid-Store Databases , 2012, Proc. VLDB Endow..

[10]  Saikat Mukherjee,et al.  Context-Driven Ontological Annotations in DICOM Images - Towards Semantic Pacs , 2009, HEALTHINF.

[11]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[12]  Simón J. Rascovsky,et al.  Use of CouchDB for Document-based Storage of DICOM Objects1 , 2012 .

[13]  Ivan Merelli,et al.  Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives , 2014, BioMed research international.

[14]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[15]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[16]  Andrew C. Simpson,et al.  A relational approach to the capture of DICOM files for Grid-enabled medical imaging databases , 2004, SAC '04.

[17]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[18]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[19]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[20]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[21]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.