SuperDragon: A Heterogeneous Parallel System for Accelerating 3D Reconstruction of Cryo-Electron Microscopy Images

The data deluge in medical imaging processing requires faster and more efficient systems. Due to the advance in recent heterogeneous architecture, there has been a resurgence in research aimed at domain-specific accelerators. In this article, we develop an experimental system SuperDragon for evaluating acceleration of a single-particle Cryo-electron microscopy (Cryo-EM) 3D reconstruction package EMAN through a hybrid of CPU, GPU, and FPGA parallel architecture. Based on a comprehensive workload characterization, we exploit multigrained parallelism in the Cryo-EM 3D reconstruction algorithm and investigate a proper computational mapping to the underlying heterogeneous architecture. The package is restructured with task-level (MPI), thread-level (OpenMP), and data-level (GPU and FPGA) parallelism. Especially, the proposed FPGA accelerator is a stream architecture that emphasizes the importance of optimizing computing dominated data access patterns. Besides, the configurable computing streams are constructed by arranging the hardware modules and bypassing channels to form a linear deep pipeline. Compared to the multicore (six-core) program, the GPU and FPGA implementations achieve speedups of 8.4 and 2.25 times in execution time while improving power efficiency by factors of 7.2 and 14.2, respectively.

[1]  Xingjian Li,et al.  Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system , 2011, HPDC '11.

[2]  D. J. De Rosier,et al.  Reconstruction of Three Dimensional Structures from Electron Micrographs , 1968, Nature.

[3]  W Chiu,et al.  EMAN: semiautomated software for high-resolution single-particle reconstructions. , 1999, Journal of structural biology.

[4]  Dan Meng,et al.  Single-particle 3d reconstruction from cryo-electron microscopy images on GPU , 2009, ICS.

[5]  R. Glaeser,et al.  Electron Diffraction of Frozen, Hydrated Protein Crystals , 1974, Science.

[6]  Viktor K. Prasanna,et al.  Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[7]  Frank Mueller,et al.  PFetch: software prefetching exploiting temporal predictability of memory access streams , 2008, MEDEA '08.

[8]  Jason Cong,et al.  Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.

[9]  Tarek A. El-Ghazawi,et al.  The Promise of High-Performance Reconfigurable Computing , 2008, Computer.

[10]  Markus Kowarschik,et al.  Design and implementation of the software architecture for a 3-D reconstruction system in medical imaging , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[11]  Eric L. Miller,et al.  Parallel-Beam Backprojection: An FPGA Implementation Optimized for Medical Imaging , 2002, FPGA '02.

[12]  Florent de Dinechin,et al.  Generating high-performance custom floating-point pipelines , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[13]  Scott A. Mahlke,et al.  Optimus: efficient realization of streaming applications on FPGAs , 2008, CASES '08.

[14]  Wayne Luk,et al.  Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing , 2010, 2010 International Conference on Field-Programmable Technology.

[15]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[16]  Sek M. Chai,et al.  Proteus: An architectural synthesis tool based on the stream programming paradigm , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[17]  Klaus Mueller,et al.  IOP PUBLISHING PHYSICS IN MEDICINE AND BIOLOGY , 2007 .

[18]  Wayne Luk,et al.  Cube: A 512-FPGA cluster , 2009, 2009 5th Southern Conference on Programmable Logic (SPL).

[19]  Tony M. Brewer,et al.  Instruction Set Innovations for the Convey HC-1 Computer , 2010, IEEE Micro.

[20]  Wen Tang,et al.  A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction , 2012, FPGA '12.

[21]  W. Luk,et al.  Axel: a heterogeneous cluster with FPGAs and GPUs , 2010, FPGA '10.