Generating Big Data repositories for research in medical imaging

The production of medical imaging data has grown tremendously in the last decades. The proliferation of digital acquisition equipment has enabled even small institutions to produce considerable amounts of studies. Furthermore, the general trend for new imaging modalities is to produce more data per examinations. As a result, the design and implementation of tomorrow's storage and communication systems must deal with Big Data issues. Moreover, new realities such as the outsourcing of medical imaging infrastructures to the Cloud impose additional pressure on these systems. The research on technologies for coping with Big Data issues on large scale medical imaging environments is still in its early stages. This is mostly due to the difficulty of implementing and validating new technological approaches in real environments, without interfering with clinical practice. Therefore, it is crucial to create test bed environments for research purposes. This article proposes a methodology for creating simulated Big Data repositories. The system is able to use a real world repository to collect data from a representative time window and expand it according to research needs. In addition, the solution provides anonymization tools and allows exporting two types of simulated data: a structured file repository containing DICOM objects and a statistical summary of the archive content based on its characteristics, such as the number of produced studies per day, their modality and their associated data volumes.

[1]  H. K. Huang,et al.  PACS and Imaging Informatics , 2009 .

[2]  W. Art Chaovalitwongse,et al.  Healthcare Intelligence: Turning Data into Knowledge , 2014, IEEE Intelligent Systems.

[3]  Simón J. Rascovsky,et al.  Use of CouchDB for Document-based Storage of DICOM Objects1 , 2012 .

[4]  James Philbin,et al.  Multi-series DICOM: an Extension of DICOM That Stores a Whole Study in a Single Object , 2013, Journal of Digital Imaging.

[5]  Aldo von Wangenheim,et al.  DCMDSM: a DICOM decomposed storage model , 2014, J. Am. Medical Informatics Assoc..

[6]  Tiago Marques Godinho,et al.  A Routing Mechanism for Cloud Outsourcing of Medical Imaging Repositories , 2016, IEEE Journal of Biomedical and Health Informatics.

[7]  Oleg S. Pianykh,et al.  Digital Imaging and Communications in Medicine : A Practical Introduction and Survival Guide , 2008 .

[9]  H. K. Huang,et al.  PACS and Imaging Informatics: Basic Principles and Applications , 2004 .

[10]  Oleg S. Pianykh,et al.  Digital Imaging and Communications in Medicine (DICOM) , 2017, Radiopaedia.org.

[11]  Marco Eichelberg,et al.  Digital Imaging and Communications in Medicine , 2010 .

[12]  Aldo von Wangenheim,et al.  Optimization of PACS Data Persistency Using Indexed Hierarchical Data , 2013, Journal of Digital Imaging.

[13]  José Luís Oliveira,et al.  A knowledge federation architecture for rare disease patient registries and biobanks , 2016 .

[14]  Michele Larobina,et al.  Medical Image File Formats , 2014, Journal of Digital Imaging.

[15]  D. Peck Digital Imaging and Communications in Medicine (DICOM): A Practical Introduction and Survival Guide , 2009, Journal of Nuclear Medicine.

[16]  Aldo von Wangenheim,et al.  Evaluating a Row-Store Data Model for Full-Content DICOM Management , 2014, 2014 IEEE 27th International Symposium on Computer-Based Medical Systems.

[17]  José Luís Oliveira,et al.  Dicoogle - an Open Source Peer-to-Peer PACS , 2011, Journal of Digital Imaging.

[18]  Carlos Costa,et al.  Multi vendor DICOM metadata access a multi site hospital approach using Dicoogle , 2013, 2013 8th Iberian Conference on Information Systems and Technologies (CISTI).

[19]  José Luís Oliveira,et al.  Medical imaging archiving: A comparison between several NoSQL solutions , 2014, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[20]  Tiago Marques Godinho,et al.  Anatomy of an Extensible Open Source PACS , 2016, Journal of Digital Imaging.

[21]  Zhanpeng Jin,et al.  Telemedicine in the Cloud Era: Prospects and Challenges , 2015, IEEE Pervasive Computing.

[22]  Rita Noumeir,et al.  The digital imaging and communications in medicine , 2011 .

[23]  Mario A. R. Dantas,et al.  An architecture for DICOM medical images storage and retrieval adopting distributed file systems , 2009, Int. J. High Perform. Syst. Archit..

[24]  Lennart Blomqvist,et al.  Effects of outsourcing magnetic resonance examinations from a public university hospital to a private agent , 2011, Acta radiologica.

[25]  José Luís Oliveira,et al.  A PACS archive architecture supported on cloud services , 2012, International Journal of Computer Assisted Radiology and Surgery.