Parallel Colt

Major breakthroughs in chip and software design have been observed for the last nine years. In October 2001, IBM released the world’s first multicore processor: POWER4. Six years later, in February 2007, NVIDIA made a public release of CUDA SDK, a set of development tools to write algorithms for execution on Graphic Processing Units (GPUs). Although software vendors have started working on parallelizing their products, the vast majority of existing code is still sequential and does not effectively utilize modern multicore CPUs and manycore GPUs. This article describes Parallel Colt, a multithreaded Java library for scientific computing and image processing. In addition to describing the design and functionality of Parallel Colt, a comparison to MATLAB is presented. Two ImageJ plugins for iterative image deblurring and motion correction of PET brain images are described as typical applications of this library. Performance comparisons with MATLAB, including GPU computations via AccelerEyes’ Jacket toolbox are also given.

[1]  J. Nagy,et al.  Enforcing nonnegativity in image reconstruction algorithms , 2000, SPIE Optics + Photonics.

[2]  Jack J. Dongarra,et al.  JLAPACK-compiling LAPACK Fortran to Java , 1999, Sci. Program..

[3]  Joseph D. Darcy,et al.  How Java’s Floating-Point Hurts Everyone Everywhere , 2004 .

[4]  L. Bluestein A linear filtering approach to the computation of discrete Fourier transform , 1970 .

[5]  J. V. Haxby,et al.  Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares , 1996, NeuroImage.

[6]  Robert P. Dougherty,et al.  Extensions of DAMAS and Benefits and Limitations of Deconvolution in Beamforming , 2005 .

[7]  Marian Brezina,et al.  Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems , 2005, Computing.

[8]  N Raghunath,et al.  Motion correction of PET brain images through deconvolution: I. Theoretical development and analysis in software simulations , 2009, Physics in medicine and biology.

[9]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[10]  Denis Caromel,et al.  Current State of Java for HPC , 2008 .

[11]  R. Hartley A More Symmetrical Fourier Analysis Applied to Transmission Problems , 1942, Proceedings of the IRE.

[12]  Åke Björck,et al.  Numerical methods for least square problems , 1996 .

[13]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[14]  J. Demmel,et al.  Sun Microsystems , 1996 .

[15]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[16]  Markus Bundschus,et al.  Towards a Next-Generation Matrix Library for Java , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[17]  H. Malcolm Hudson,et al.  Accelerated image reconstruction using ordered subsets of projection data , 1994, IEEE Trans. Medical Imaging.

[18]  Lars Kai Hansen,et al.  The Quantitative Evaluation of Functional Neuroimaging Experiments: The NPAIRS Data Analysis Framework , 2000, NeuroImage.

[19]  James G. Nagy,et al.  Iterative Methods for Image Deblurring: A Matlab Object-Oriented Approach , 2004, Numerical Algorithms.

[20]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[21]  Ws. Rasband ImageJ, U.S. National Institutes of Health, Bethesda, Maryland, USA , 2011 .

[22]  James C. Schatzman,et al.  Accuracy of the Discrete Fourier Transform and the Fast Fourier Transform , 1996, SIAM J. Sci. Comput..

[23]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[24]  Mario Bertero,et al.  Introduction to Inverse Problems in Imaging , 1998 .

[25]  P. Hansen Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion , 1987 .

[26]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[27]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[28]  Carlos Oscar Sánchez Sorzano,et al.  TomoJ: tomography software for three-dimensional reconstruction in transmission electron microscopy , 2007, BMC Bioinformatics.

[29]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[30]  K. R. Rao,et al.  A Fast Computational Algorithm for the Discrete Sine Transform , 1980, IEEE Trans. Commun..

[31]  Yousef Saad,et al.  ILUT: A dual threshold incomplete LU factorization , 1994, Numer. Linear Algebra Appl..

[32]  Wolfgang Hoschek,et al.  Uniform, Versatile and Efficient Dense and Sparse Multi-Dimensional Arrays , 2000 .

[33]  C. J. Thompson,et al.  Motion correction of PET images using multiple acquisition frames , 1997, IEEE Transactions on Medical Imaging.

[34]  C. Vogel Computational Methods for Inverse Problems , 1987 .

[35]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[36]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[37]  M. S. Atkins,et al.  Compensation methods for head motion detected during PET imaging , 1996 .

[38]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[39]  J. Nagy,et al.  A weighted-GCV method for Lanczos-hybrid regularization. , 2007 .

[40]  Per Christian Hansen,et al.  Rank-Deficient and Discrete Ill-Posed Problems , 1996 .