Parallel Automatic Registration of Large Scale Microscopic Images on Multiprocessor CPUs and GPUs

During the present decade, emerging architectures like multicore CPUs and graphics processing units (GPUs) have steadily gained popularity for their ability to deploy high computational power at a low cost. In this paper, we combine parallelization techniques on a cooperative cluster of multicore CPUs and multisocket GPUs to apply their joint computational power to an automatic image registration algorithm intended for the analysis of high-resolution microscope images. Registration methods pose a computational challenge within the biomedical field due to the large size of microscope image data sets, which typically extend to the Terabyte scale. We analyze this application to identify those parts which are more favorable to the CPU and GPU execution models and decompose the process accordingly. Performance results are presented for two sets of images: mouse placenta (16K x 16K pixels) and mouse mammary tumor (23K x 62K pixels). Execution times are shown on different multi-node, multi-socket and multi-core configurations to provide performance insights about the most effective approach.

[1]  John D. Owens,et al.  Fast Deformable Registration on the GPU: A CUDA Implementation of Demons , 2008, 2008 International Conference on Computational Sciences and Its Applications.

[2]  Pheng-Ann Heng,et al.  A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting , 2004, Comput. Animat. Virtual Worlds.

[3]  Taejung Kim,et al.  Automatic satellite image registration by combination of matching and random sample consensus , 2003, IEEE Trans. Geosci. Remote. Sens..

[4]  Torsten Rohlfing,et al.  Nonrigid image registration in shared-memory multiprocessor environments with application to brains, breasts, and bees , 2003, IEEE Transactions on Information Technology in Biomedicine.

[5]  Markus Löffler,et al.  Three-dimensional reconstruction and quantification of cervical carcinoma invasion fronts from histological serial sections , 2005, IEEE Transactions on Medical Imaging.

[6]  David R. Butenhof Programming with POSIX threads , 1993 .

[7]  Fumihiko Ino,et al.  A data distributed parallel algorithm for nonrigid image registration , 2005, Parallel Comput..

[8]  Lawrence A. Ray,et al.  2-D and 3-D Image Registration for Medical, Remote Sensing, and Industrial Applications , 2005, J. Electronic Imaging.

[9]  Antonio Ruiz,et al.  Non-rigid Registration for Large Sets of Microscopic Images on Graphics Processors , 2009, J. Signal Process. Syst..

[10]  Kim L. Boyer,et al.  Pathological image segmentation for neuroblastoma using the GPU , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[11]  Kenneth Moreland,et al.  The FFT on a GPU , 2003, HWWS '03.

[12]  J. P. Lewis Fast Normalized Cross-Correlation , 2010 .

[13]  Fumihiko Ino,et al.  A GPGPU Approach for Accelerating 2-D/3-D Rigid Registration of Medical Images , 2006, ISPA.

[14]  Gallagher Pryor,et al.  3D nonrigid registration via optimal mass transport on the GPU , 2009, Medical Image Anal..

[15]  Dawid Pajak General-Purpose Computation Using Graphics Hardware for Fast HDR Image Processing , 2007 .

[16]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[17]  Hiroshi Inoue,et al.  REAL-TIME MUTUAL-INFORMATION-BASED LINEAR REGISTRATION ON THE CELL BROADBAND ENGINE PROCESSOR , 2007, 2007 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[18]  Sudipto Guha,et al.  Data Visualization and Mining using the GPU , 2011 .

[19]  Stefan Heldmann,et al.  International Journal of Computer Vision c ○ 2006 Springer Science + Business Media, LLC. Manufactured in the United States. DOI: 10.1007/s11263-006-9780-x Image Registration of Sectioned Brains , 2004 .

[20]  Klaus Mueller,et al.  Visual Simulation of Heat Shimmering and Mirage , 2007, IEEE Transactions on Visualization and Computer Graphics.