Adaptive Parallel Householder Bidiagonalization

With the increasing use of large image and video archives and high-resolution multimedia data streams in many of today's research and application areas, there is a growing need for multimedia-oriented high-performance computing. As a consequence, a need for algorithms, methodologies, and tools that can serve as support in the (automatic) parallelization of multimedia applications is rapidly emerging. This paper discusses the parallelization of Householder bidiagonalization, a matrix factorization method which is an integral part of full Singular Value Decomposition (SVD) -- an important algorithm for many multimedia problems. Householder bidiagonalization is hard to parallelize efficiently because the total number of matrix elements taking part in the calculations reduces during runtime. To overcome the growing negative performance impact of load imbalances and overprovisioning of compute resources, we apply adaptive runtime techniques of periodic matrix remapping and process reduction for improved performance. Results show that our adaptive parallel execution approach provides a significant improvement in efficiency, even when applying a set of compute resources which is (initially) very large.

[1]  Alan George,et al.  Gaussian elimination with partial pivoting and load balancing on a multiprocessor , 1987, Parallel Comput..

[2]  A. Srinivasan Givens and Householder Reductions for Linear Least Squares on aCluster of Workstations , 2007 .

[3]  Sivan Toledo,et al.  Out-of-Core SVD and QR Decompositions , 2001, PPSC.

[4]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[5]  Danny Crookes,et al.  Efficient implementation of a portable parallel programming model for image processing , 1999 .

[6]  Bruno Lang,et al.  Efficient parallel reduction to bidiagonal form , 1999, Parallel Comput..

[7]  Marcel Worring,et al.  High-Performance Distributed Image and Video Content Analysis with Parallel-Horus , 2007 .

[8]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Danny Crookes,et al.  Efficient implementation of a portable parallel programming model for image processing , 1999, Concurr. Pract. Exp..

[10]  Marcel Worring,et al.  High-Performance Distributed Video Content Analysis with Parallel-Horus , 2007, IEEE MultiMedia.

[11]  Dennis Koelma,et al.  A software architecture for user transparent parallel image processing , 2002, Parallel Comput..

[12]  Frank J. Seinstra,et al.  Object Recognition by a Grid connected robot dog , 2007 .

[13]  Marcel Worring,et al.  User transparent parallel processing of the 2004 NIST TRECVID data set , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[14]  Marcel Worring,et al.  MediaMill: exploring news video archives based on learned semantics , 2005, MULTIMEDIA '05.

[15]  Jack J. Dongarra,et al.  A fully parallel algorithm for the symmetric eigenvalue problem , 1985, PPSC.

[16]  R. V. Nieuwpoort,et al.  Scalable Wall-Socket Multimedia Grid Computing , 2008 .

[17]  Dennis Koelma,et al.  Finite state machine-based optimization of data parallel regular domain problems applied in low-level image processing , 2004, IEEE Transactions on Parallel and Distributed Systems.

[18]  David J. Evans,et al.  Systolic SVD and QR Decomposition by Householder Reflections , 2002, Int. J. Comput. Math..

[19]  Josef Stoer,et al.  Numerische Mathematik 1 , 1989 .