Scalable analysis of Big pathology image data cohorts using efficient methods and high-performance computing strategies

BackgroundWe describe a suite of tools and methods that form a core set of capabilities for researchers and clinical investigators to evaluate multiple analytical pipelines and quantify sensitivity and variability of the results while conducting large-scale studies in investigative pathology and oncology. The overarching objective of the current investigation is to address the challenges of large data sizes and high computational demands.ResultsThe proposed tools and methods take advantage of state-of-the-art parallel machines and efficient content-based image searching strategies. The content based image retrieval (CBIR) algorithms can quickly detect and retrieve image patches similar to a query patch using a hierarchical analysis approach. The analysis component based on high performance computing can carry out consensus clustering on 500,000 data points using a large shared memory system.ConclusionsOur work demonstrates efficient CBIR algorithms and high performance computing can be leveraged for efficient analysis of large microscopy images to meet the challenges of clinically salient applications in pathology. These technologies enable researchers and clinical investigators to make more effective use of the rich informational content contained within digitized microscopy specimens.

[1]  Jun Kong,et al.  Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines , 2013, Parallel Comput..

[2]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Electronic Imaging.

[3]  David I. August,et al.  Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.

[4]  Todd H. Stokes,et al.  Feasibility analysis of high resolution tissue image registration using 3-D synthetic data , 2011, Journal of pathology informatics.

[5]  Craig J. Webb,et al.  Large-scale virtual acoustics simulation at audio rates using three dimensional finite difference time domain and multiple graphics processing units , 2013 .

[6]  Ebroul Izquierdo,et al.  Histology Image Retrieval in Optimized Multifeature Spaces , 2013, IEEE Journal of Biomedical and Health Informatics.

[7]  Jun Kong,et al.  An Integrative Approach for In Silico Glioma Research , 2010, IEEE Transactions on Biomedical Engineering.

[8]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[9]  Philippe Schmid-Saugeona,et al.  Towards a computer-aided diagnosis system for pigmented skin lesions. , 2003, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society.

[10]  Ümit V. Çatalyürek,et al.  Optimizing dataflow applications on heterogeneous environments , 2010, Cluster Computing.

[11]  Anant Madabhushi,et al.  A boosted distance metric: application to content based image retrieval and classification of digitized histopathology , 2009, Medical Imaging.

[12]  J. Saltz,et al.  Image Analysis for Neuroblastoma Classification: Segmentation of Cell Nuclei , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[13]  Lin Yang,et al.  Content-based histopathology image retrieval using CometCloud , 2014, BMC Bioinformatics.

[14]  EigenmannRudolf,et al.  OpenMP to GPGPU , 2009 .

[15]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Chih-Wen Cheng,et al.  Multiscale Integration of -Omic, Imaging, and Clinical Data in Biomedical Informatics , 2012, IEEE Reviews in Biomedical Engineering.

[17]  Lin Yang,et al.  Virtual Microscopy and Grid-Enabled Decision Support for Large-Scale Analysis of Imaged Pathology Specimens , 2009, IEEE Transactions on Information Technology in Biomedicine.

[18]  Chunhua Weng,et al.  Clinical research informatics: a conceptual perspective , 2012, J. Am. Medical Informatics Assoc..

[19]  José M. García,et al.  Accelerating Fibre Orientation Estimation from Diffusion Weighted Magnetic Resonance Imaging Using GPUs , 2012, PDP.

[20]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[21]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[22]  Jacob D. Furst,et al.  Content-based image retrieval for pulmonary computed tomography nodule images , 2007, SPIE Medical Imaging.

[23]  Metin Nafi Gürcan,et al.  Content-Based Microscopic Image Retrieval System for Multi-Image Queries , 2012, IEEE Transactions on Information Technology in Biomedicine.

[24]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[25]  Jun Kong,et al.  Morphological signatures and genomic correlates in glioblastoma , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[26]  Joel H. Saltz,et al.  Parallel content-based sub-image retrieval using hierarchical searching , 2014, Bioinform..

[27]  Chung-Lin Huang,et al.  A content-based image retrieval system , 1998, Image Vis. Comput..

[28]  Yan Yang,et al.  Semi-supervised Clustering Ensemble Based on Collaborative Training , 2012, RSKT.

[29]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Natthakan Iam-On,et al.  LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles , 2010 .

[31]  Gary R. Bradski,et al.  Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[32]  Ümit V. Çatalyürek,et al.  Run-time optimizations for replicated dataflows on heterogeneous environments , 2010, HPDC '10.

[33]  Fernand Meyer Automatic screening of cytological specimens , 1986 .

[34]  Jack J. Dongarra,et al.  Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[35]  Perry L. Miller,et al.  Research Paper: PathMaster: Content-based Cell Image Retrieval Using Automated Feature Extraction , 2000, J. Am. Medical Informatics Assoc..

[36]  Hans-Peter Kriegel,et al.  Region of Interest Queries in CT Scans , 2011, SSTD.

[37]  Lin Yang,et al.  High Throughput Analysis of Breast Cancer Specimens on the Grid , 2007, MICCAI.

[38]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[39]  Bahram Parvin,et al.  Molecular bases of morphometric composition in Glioblastoma multiforme , 2012, 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI).

[40]  Yixin Chen,et al.  A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Gagan Agrawal,et al.  Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[42]  James S. Duncan,et al.  Synthesis of Research: Medical Image Databases: A Content-based Retrieval Approach , 1997, J. Am. Medical Informatics Assoc..

[43]  Enrico Blanzieri,et al.  A multiple classifier system for early melanoma diagnosis , 2003, Artif. Intell. Medicine.

[44]  May D. Wang,et al.  Biological interpretation of morphological patterns in histopathological whole-slide images , 2012, BCB.

[45]  Kei-Hoi Cheung,et al.  Case Report: A High Productivity/Low Maintenance Approach to High-performance Computation for Biomedicine: Four Case Studies , 2004, J. Am. Medical Informatics Assoc..

[46]  Serge J. Belongie,et al.  Region-based image querying , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[47]  Hermann Ney,et al.  Automatic categorization of medical images for content-based retrieval and data mining. , 2005, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society.

[48]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[49]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[50]  Tony Pan,et al.  ImageMiner: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology , 2011, J. Am. Medical Informatics Assoc..

[51]  Jun Kong,et al.  Texture based image recognition in microscopy images of diffuse gliomas with multi-class gentle boosting mechanism , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[52]  Mathias Kaspar,et al.  An optimized web-based approach for collaborative stereoscopic medical visualization , 2013, J. Am. Medical Informatics Assoc..

[53]  Jun Kong,et al.  Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development , 2009, Pattern Recognit..

[54]  B. S. Manjunath,et al.  Biological imaging software tools , 2012, Nature Methods.

[55]  Jun Kong,et al.  Integrated morphologic analysis for the identification and characterization of disease subtypes , 2012, J. Am. Medical Informatics Assoc..

[56]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[57]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[58]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[59]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[60]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[61]  Todd D. Millstein,et al.  Practical predicate dispatch , 2004, OOPSLA.

[62]  Jun Kong,et al.  A data model and database for high-resolution pathology analytical image informatics , 2011, Journal of pathology informatics.

[63]  Deendayal Dinakarpandian,et al.  A New Metric to Measure Gene Product Similarity , 2007, BIBM.

[64]  William F. Punch,et al.  A Comparison of Resampling Methods for Clustering Ensembles , 2004, IC-AI.

[65]  Rudolf Hanka,et al.  Histological image retrieval based on semantic content analysis , 2003, IEEE Transactions on Information Technology in Biomedicine.

[66]  Nikolas P. Galatsanos,et al.  A similarity learning approach to content-based image retrieval: application to digital mammography , 2004, IEEE Transactions on Medical Imaging.

[67]  Lin Yang,et al.  PathMiner: A Web-Based Tool for Computer-Assisted Diagnostics in Pathology , 2009, IEEE Transactions on Information Technology in Biomedicine.

[68]  Gagan Agrawal,et al.  Porting irregular reductions on heterogeneous CPU-GPU configurations , 2011, 2011 18th International Conference on High Performance Computing.

[69]  Yong He,et al.  A Hybrid CPU-GPU Accelerated Framework for Fast Mapping of High-Resolution Human Brain Connectome , 2013, PloS one.

[70]  Luc Vincent,et al.  Morphological grayscale reconstruction in image analysis: applications and efficient algorithms , 1993, IEEE Trans. Image Process..

[71]  Krishnendu Basuli,et al.  Content-Based Image Retrieval System , 2008, UBIQ.

[72]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[73]  Gregory Diamos,et al.  An Execution Model and Runtime for Heterogeneous Many Core Systems , 2011 .

[74]  Manuel M. Oliveira,et al.  Nuclear Morphometric Analysis (NMA): Screening of Senescence, Apoptosis and Nuclear Irregularities , 2012, PloS one.

[75]  Thomas Martin Deserno,et al.  Hierarchical feature clustering for content-based retrieval in medical image databases , 2003, SPIE Medical Imaging.

[76]  Karsten Schwan,et al.  Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community , 2011, Computing in Science & Engineering.

[77]  Alberto Del Bimbo,et al.  Content based retrieval of 3D cellular structures , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[78]  Thomas Hérault,et al.  Performance Portability of a GPU Enabled Factorization with the DAGuE Framework , 2011, 2011 IEEE International Conference on Cluster Computing.

[79]  Anand Raghunathan,et al.  A framework for efficient and scalable execution of domain-specific templates on GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[80]  Olcay Sertel,et al.  Computer-assisted grading of neuroblastic differentiation. , 2008, Archives of pathology & laboratory medicine.

[81]  Feiping Nie,et al.  Consensus spectral clustering in near-linear time , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[82]  Jun Kong,et al.  Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[83]  Betsy L. Humphreys,et al.  High-Performance Computing and Communications and The National Information Infrastructure: New Opportunities and Challenges , 1995, J. Am. Medical Informatics Assoc..

[84]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[85]  Dan Klein,et al.  Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon , 2005 .

[86]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[87]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[88]  George R. Thoma,et al.  A Learning-Based Similarity Fusion and Filtering Approach for Biomedical Image Retrieval Using SVM Classification and Relevance Feedback , 2011, IEEE Transactions on Information Technology in Biomedicine.

[89]  Jin Hyung Lee,et al.  High-throughput optogenetic functional magnetic resonance imaging with parallel computations , 2013, Journal of Neuroscience Methods.

[90]  Joel H. Saltz,et al.  Pathological Image Analysis Using the GPU: Stroma Classification for Neuroblastoma , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[91]  Janito Vaqueiro Ferreira,et al.  Advances on Watershed Processing on GPU Architecture , 2011, ISMM.

[92]  Joel H. Saltz,et al.  Integrative, Multimodal Analysis of Glioblastoma Using TCGA Molecular Data, Pathology Images, and Clinical Outcomes , 2011, IEEE Transactions on Biomedical Engineering.

[93]  T M Lehmann,et al.  Content-based Image Retrieval in Medical Applications , 2004, Methods of Information in Medicine.

[94]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[95]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[96]  Lei Zheng,et al.  Design and analysis of a content-based pathology image retrieval system , 2003, IEEE Transactions on Information Technology in Biomedicine.

[97]  Vijay V. Raghavan,et al.  Content-Based Image Retrieval Systems - Guest Editors' Introduction , 1995, Computer.

[98]  Lin He,et al.  SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder , 2010, Cell Research.

[99]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[100]  Jun Kong,et al.  High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[101]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[102]  Metin Nafi Gürcan,et al.  Coordinating the use of GPU and CPU for improving performance of compute intensive applications , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[103]  Manish Parashar,et al.  The analysis of image feature robustness using cometcloud , 2012, Journal of pathology informatics.

[104]  Jun Kong,et al.  A high-performance spatial database based approach for pathology imaging algorithm evaluation , 2013, Journal of pathology informatics.

[105]  Hooshang Kangarloo,et al.  Evidence-based radiology: requirements for electronic access. , 2002, Academic radiology.

[106]  Conor McBride Clowns to the left of me, jokers to the right (pearl): dissecting data structures , 2008, POPL '08.

[107]  Manish Parashar,et al.  Decentralized Data Sharing of Tissue Microarrays for Investigative Research in Oncology , 2006, Cancer informatics.