ISP: An Optimal Out-of-Core Image-Set Processing Streaming Architecture for Parallel Heterogeneous Systems

Image population analysis is the class of statistical methods that plays a central role in understanding the development, evolution, and disease of a population. However, these techniques often require excessive computational power and memory that are compounded with a large number of volumetric inputs. Restricted access to supercomputing power limits its influence in general research and practical applications. In this paper we introduce ISP, an Image-Set Processing streaming framework that harnesses the processing power of commodity heterogeneous CPU/GPU systems and attempts to solve this computational problem. In ISP, we introduce specially designed streaming algorithms and data structures that provide an optimal solution for out-of-core multiimage processing problems both in terms of memory usage and computational efficiency. ISP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the ISP out-of-core solution can achieve the same performance as the in-core solution. We demonstrate the efficiency of the ISP framework on synthetic and real datasets.

[1]  Michael Wimmer,et al.  Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful , 2004, Comput. Graph. Forum.

[2]  Todd C. Mowry,et al.  Compiler-based I/O prefetching for out-of-core applications , 2001, TOCS.

[3]  Michael I. Miller,et al.  Individualizing Neuroanatomic Atlases Using a Massively Parallel Computer , 1996, Computer.

[4]  Carsten Dachsbacher,et al.  Granular visibility queries on the GPU , 2009, I3D '09.

[5]  D. Manocha,et al.  Cache-oblivious mesh layouts , 2005, ACM Trans. Graph..

[6]  Martin Burtscher,et al.  High Throughput Compression of Double-Precision Floating-Point Data , 2007, 2007 Data Compression Conference (DCC'07).

[7]  Changjun Hu,et al.  Transforming the Adaptive Irregular Out-of-Core Applications for Hiding Communication and Disk I/O , 2007, OTM Conferences.

[8]  Ken Kennedy,et al.  A model and compilation strategy for out-of-core data parallel programs , 1995, PPOPP '95.

[9]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  Cláudio T. Silva,et al.  Out-Of-Core Rendering of Large, Unstructured Grids , 2001, IEEE Computer Graphics and Applications.

[11]  E ChristensenGary,et al.  Individualizing Neuroanatomical Atlases Using a Massively Parallel Computer , 1996 .

[12]  Eddy Caron,et al.  Out-of-core and pipeline techniques for wavefront algorithms , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[13]  Dinesh Manocha,et al.  Fast BVH Construction on GPUs , 2009, Comput. Graph. Forum.

[14]  V. Michael Bove,et al.  Cheops: a reconfigurable data-flow system for video processing , 1995, IEEE Trans. Circuits Syst. Video Technol..

[15]  Dinesh Manocha,et al.  Quick-VDR: out-of-core view-dependent rendering of gigantic models , 2005, IEEE Transactions on Visualization and Computer Graphics.

[16]  Todd C. Mowry,et al.  Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.

[17]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Cláudio T. Silva,et al.  Out-Of-Core Rendering of Large, , 2001 .

[19]  Joseph Ross Mitchell,et al.  A work-efficient GPU algorithm for level set segmentation , 2010, HPG '10.

[20]  A. M. Alattar A probabilistic filter for eliminating temporal noise in time-varying image sequences , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[21]  Hugues Hoppe,et al.  Progressive meshes , 1996, SIGGRAPH.

[22]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[23]  Linh K. Ha,et al.  Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs , 2011 .

[24]  Dinesh Manocha,et al.  Memory-Scalable GPU Spatial Hierarchy Construction , 2011, IEEE Transactions on Visualization and Computer Graphics.

[25]  Michael Guthe,et al.  Parallel View-Dependent Out-of-Core Progressive Meshes , 2010, VMV.

[26]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[27]  Hans Knutsson,et al.  Phase based volume registration using cuda , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Lei Yang,et al.  Exploiting temporal coherence in real-time rendering , 2010, SA '10.

[29]  Jill Macdonald Boyce,et al.  Noise reduction of image sequences using adaptive motion compensated frame averaging , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[31]  Kun Zhou,et al.  Real-time KD-tree construction on graphics hardware , 2008, SIGGRAPH Asia '08.

[32]  Guido Gerig,et al.  Unbiased diffeomorphic atlas construction for computational anatomy , 2004, NeuroImage.

[33]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[34]  Martin Isenburg,et al.  Lossless compression of predicted floating-point geometry , 2005, Comput. Aided Des..

[35]  Frederic Dufaux,et al.  Motion estimation techniques for digital TV: a review and a new contribution , 1995, Proc. IEEE.

[36]  Ross T. Whitaker,et al.  Microstructural connectivity of the arcuate fasciculus in adolescents with high-functioning autism , 2010, NeuroImage.

[37]  David S. Greenberg,et al.  Out of core, out of mind: practical parallel I/O , 1993, Proceedings of Scalable Parallel Libraries Conference.

[38]  Richard Szeliski,et al.  Finding paths through the world's photos , 2008, ACM Trans. Graph..

[39]  Jens H. Krüger,et al.  Fast Parallel Unbiased Diffeomorphic Atlas Construction on Multi-Graphics Processing Units , 2009, EGPGV@Eurographics.

[40]  G. Blelloch Introduction to Data Compression * , 2022 .

[41]  P. Thomas Fletcher,et al.  Population Shape Regression from Random Design Data , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[42]  Tzong-Jer Chen,et al.  A pseudo lossless image compression method , 2010, 2010 3rd International Congress on Image and Signal Processing.

[43]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[44]  Renato Pajarola,et al.  Out-Of-Core Algorithms for Scientific Visualization and Computer Graphics , 2002 .

[45]  Anand Raghunathan,et al.  A framework for efficient and scalable execution of domain-specific templates on GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[46]  Todd C. Mowry,et al.  Tolerating latency through software-controlled data prefetching , 1994 .