Workload Characterization of a Parallel Video Mining Application on a 16-Way Shared-Memory Multiprocessor System

As video data become more and more pervasive, mining information from multimedia data sources becomes increasingly important, e.g., automatically extracting highlights from soccer game video content. However, the huge computation requirement of mining interested data limits its wide use in practice. Since the hardware imperative behind computer architecture is shifting from uniprocessors to multi-core processors, exploiting thread-level parallelism existing in multimedia mining applications is critical to utilizing the hardware resources and accelerating the complex processing of highlight events detection. In this paper we analyze the view type and playfield detection application, a widely used application in sports video mining systems, and we present several different schemes (task level, data-slicing-level, and a hybrid parallel scheme, as well as variations of the hybrid parallel scheme) for parallelizing this application. The hybrid parallel scheme, which exploits data-level and task-slicing-level parallelism, outperforms basic task-level and data-slicing-level schemes, delivering much better performance in terms of execution time and speedup. On a 16-way shared-memory multi-processing system with hardware prefetch enabled, the hybrid scheme achieves a speedup of 10.6x. Detailed performance analysis shows that because of the large working set, the workload often requires data from the off-chip memory. Therefore, the saturated bus bandwidth utilization is the likely cause of bottlenecks for achieving perfect scalability performance. With hardware prefetch enabled, the bus utilization rate on 16-processors system is about 76% for the hybrid scheme, and the projected bus bandwidth requirement for perfect scalability is about 3.1GB/s for 16 processors and 6.2 GB/s for 32 processors. In addition, our experiments also reveal that there are also no obvious scaling limiting factors, e.g., very low synchronization and load imbalance problems even with up to 16 processors

[1]  M. Luo,et al.  Pyramidwise structuring for soccer highlight extraction , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[2]  Qi Tian,et al.  A mid-level representation framework for semantic sports video analysis , 2003, ACM Multimedia.

[3]  Xinmin Tian,et al.  Compiler support of the workqueuing execution model for Intel SMP architectures , 2002 .

[4]  Angelos Bilas,et al.  Real-time parallel MPEG-2 decoding in software , 1997, Proceedings 11th International Parallel Processing Symposium.

[5]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.

[6]  A. Murat Tekalp,et al.  Automatic Soccer Video Analysis and Summarization , 2003, IS&T/SPIE Electronic Imaging.

[7]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Chng Eng Siong,et al.  Automatic replay generation for soccer video broadcasting , 2004, MULTIMEDIA '04.

[9]  Eric Li,et al.  MPEG Decoding Workload Characterization , 2003 .

[10]  Xinguo Yu,et al.  Current and Emerging Topics in Sports Video Processing , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[11]  Marcel Worring,et al.  User transparent parallel processing of the 2004 NIST TRECVID data set , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[12]  Qi Tian,et al.  A repeated video clip identification system , 2005, MULTIMEDIA '05.

[13]  Tao Wang,et al.  Soccer Highlight Detection using Two-Dependence Bayesian Network , 2006, 2006 IEEE International Conference on Multimedia and Expo.