User Transparent Data and Task Parallel Multimedia Computing with Pyxis-DT

The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of emerging MMCA problems, there is an urgent need to apply High Performance Computing (HPC) techniques. However, as most MMCA researchers are not also HPC experts, in the field there is a demand~for~programming models and tools that are both efficient and easy~to~use. Today several user transparent library-based parallelization tools exist that aim to satisfy both these requirements. Such tools generally use a data parallel approach in which data structures (e.g. video frames) are scattered among the available nodes in a compute cluster. However, for certain MMCA applications a data parallel approach induces intensive communication, which significantly decreases performance. In these situations, we can benefit from applying alternative approaches. This paper presents Pyxis-DT: a user transparent parallel programming model for MMCA applications that employs both data and task parallelism. Hybrid parallel execution is obtained by run-time construction and execution of a task graph consisting of strictly defined building block operations. Each of these building block operations can be executed in data parallel fashion. Results show that for realistic MMCA applications the concurrent use of data and task parallelism can significantly improve performance compared to using either approach in isolation.

[1]  Dennis Koelma,et al.  User transparency: a fully sequential programming model for efficient data parallel image processing: Research Articles , 2004 .

[2]  Dennis Koelma,et al.  A Software Architecture for User Transparent Parallel Image Processing on MIMD Computers , 2001, Euro-Par.

[3]  Jason Maassen,et al.  Towards user transparent parallel multimedia computing on GPU-Clusters , 2010, ISCA'10.

[4]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[5]  Yan Zhao,et al.  Towards fully user transparent task and data parallel image processing , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.

[6]  Jason Maassen,et al.  Real-World Distributed Computer with Ibis , 2010, Computer.

[7]  Dennis Koelma,et al.  P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems , 2002, IEEE Trans. Parallel Distributed Syst..

[8]  Cristina Nicolescu,et al.  A data and task parallel image processing environment , 2002, Parallel Comput..

[9]  Danny Crookes,et al.  A PVM Implementation of a Portable Parallel Image Processing Library , 1996, PVM.

[10]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[11]  Arnold W. M. Smeulders,et al.  A Minimum Cost Approach for Segmenting Networks of Lines , 2001, International Journal of Computer Vision.

[12]  Lawrence Rauchwerger,et al.  Automatic Detection of Parallelism: A grand challenge for high performance computing , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[13]  T. Kielmann,et al.  Real-world Distributed Computing with Ibis , 2010 .

[14]  Yi Yang,et al.  A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.

[15]  Henry Hoffmann,et al.  Parallel VSIPL++: An Open Standard Software Library for High-Performance Parallel Signal Processing , 2005, Proceedings of the IEEE.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Joost van de Weijer,et al.  Fast Anisotropic Gauss Filtering , 2002, ECCV.

[18]  Andrea Clematis,et al.  A Grid framework to enable parallel and concurrent TMA image analyses , 2009, Int. J. Grid Util. Comput..

[19]  Niels Drost,et al.  User Transparent Task Parallel Multimedia Content Analysis , 2010, Euro-Par.

[20]  Dennis Koelma,et al.  User transparency: a fully sequential programming model for efficient data parallel image processing , 2004, Concurr. Pract. Exp..

[21]  D. K. Arvind,et al.  Languages and Compilers for Parallel Computing , 2014, Lecture Notes in Computer Science.

[22]  Antonio J. Plaza,et al.  Commodity cluster-based parallel processing of hyperspectral imagery , 2006, J. Parallel Distributed Comput..

[23]  Danny Crookes,et al.  Efficient implementation of a portable parallel programming model for image processing , 1999, Concurr. Pract. Exp..

[24]  Wen-mei W. Hwu,et al.  Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..

[25]  Jason Maassen,et al.  User transparent data and task parallel multimedia computing with Pyxis-DT , 2013, Future Gener. Comput. Syst..

[26]  Matthew Haines,et al.  Approaches for integrating task and data parallelism , 1998, IEEE Concurr..

[27]  Marcel Worring,et al.  High-Performance Distributed Video Content Analysis with Parallel-Horus , 2007, IEEE MultiMedia.

[28]  Wen-mei W. Hwu,et al.  CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.