A data colocation grid framework for big data medical image processing: backend design

When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework’s performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop and HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.

[1]  Chao-Tung Yang,et al.  Implementation of a Medical Image File Accessing System on Cloud Computing , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.

[2]  Ronald C. Taylor An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.

[3]  Yuankai Huo,et al.  Mapping Lifetime Brain Volumetry with Covariate-Adjusted Restricted Cubic Spline Regression from Cross-Sectional Multi-site MRI , 2016, MICCAI.

[4]  et al.,et al.  The Effect of Template Choice on Morphometric Analysis of Pediatric Brain Data ☆ , 2022 .

[5]  Aniruddha S. Gokhale,et al.  Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service , 2017, 2017 IEEE International Conference on Cloud Engineering (IC2E).

[6]  Borko Furht,et al.  A study of transcoding on cloud environments for video content delivery , 2010, MCMC '10.

[7]  Said Jai-Andaloussi,et al.  Medical content based image retrieval by using the Hadoop framework , 2013, ICT 2013.

[8]  Carmen E Sanchez,et al.  Age-Specific MRI Templates for Pediatric Neuroimaging , 2012, Developmental neuropsychology.

[9]  Terry M. Peters,et al.  3D statistical neuroanatomical models from 305 MRI volumes , 1993, 1993 IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference.

[10]  Zhi-Dan Zhao,et al.  User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[11]  Euiseong Seo,et al.  Extensible Video Processing Framework in Apache Hadoop , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[12]  Aniruddha S. Gokhale,et al.  Algorithmic Enhancements to Big Data Computing Frameworks for Medical Image Processing , 2017, 2017 IEEE International Conference on Cloud Engineering (IC2E).

[13]  Lidong Chen,et al.  An approach for fast and parallel video processing on Apache Hadoop clusters , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[14]  Scott Holland,et al.  Template-O-Matic: A toolbox for creating customized pediatric templates , 2008, NeuroImage.

[15]  Olha Buchel,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[16]  Saint John Walker Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2014 .

[17]  Sébastien Ourselin,et al.  Block Matching: A General Framework to Improve Robustness of Rigid Registration of Medical Images , 2000, MICCAI.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Jie Lu,et al.  Scaling-Up Item-Based Collaborative Filtering Recommendation Algorithm Based on Hadoop , 2011, 2011 IEEE World Congress on Services.

[20]  Kishor Sadafale,et al.  An online recommendation system for e-commerce based on apache mahout framework , 2013, SIGMIS-CPR '13.

[21]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[22]  J Mazziotta,et al.  A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM). , 2001, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[23]  Shunxing Bao,et al.  Theoretical and empirical comparison of big data image processing with Apache Hadoop and Sun Grid Engine , 2017, Medical Imaging.

[24]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[25]  Marco Pennacchiotti,et al.  Investigating topic models for social media user recommendation , 2011, WWW.