Algorithm/Architecture Co-Exploration of Visual Computing on Emergent Platforms: Overview and Future Prospects

Concurrently exploring both algorithmic and architectural optimizations is a new design paradigm. This survey paper addresses the latest research and future perspectives on the simultaneous development of video coding, processing, and computing algorithms with emerging platforms that have multiple cores and reconfigurable architecture. As the algorithms in forthcoming visual systems become increasingly complex, many applications must have different profiles with different levels of performance. Hence, with expectations that the visual experience in the future will become continuously better, it is critical that advanced platforms provide higher performance, better flexibility, and lower power consumption. To achieve these goals, algorithm and architecture co-design is significant for characterizing the algorithmic complexity used to optimize targeted architecture. This paper shows that seamless weaving of the development of previously autonomous visual computing algorithms and multicore or reconfigurable architectures will unavoidably become the leading trend in the future of video technology.

[1]  John C. Hart,et al.  GPU algorithms for radiosity and subsurface scattering , 2003, HWWS '03.

[2]  Gwo Giun Lee,et al.  Algorithm/Architecture Co-Design of 3-D Spatio–Temporal Motion Estimation for Video Coding , 2007, IEEE Transactions on Multimedia.

[3]  Johan Eker,et al.  CAL language report: Specification of the CAL actor language , 2003 .

[4]  LeeGwo Giun,et al.  Algorithm/architecture co-exploration of visual computing on emergent platforms , 2009 .

[5]  Sujit Dey,et al.  System-level performance analysis for designing on-chipcommunication architectures , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Edward A. Lee,et al.  A hardware-software codesign methodology for DSP applications , 1993, IEEE Design & Test of Computers.

[7]  Tao Wang,et al.  Parallelization Strategies and Performance Analysis of Media Mining Applications on Multi-Core Processors , 2009, J. Signal Process. Syst..

[8]  Gwo Giun Lee,et al.  On the verification of multi-standard SoC’S for reconfigurable video coding based on algorithm/architecture co-exploration , 2008, 2008 IEEE Workshop on Signal Processing Systems.

[9]  Ferran Marqués,et al.  Region-based representations of image and video: segmentation tools for multimedia services , 1999, IEEE Trans. Circuits Syst. Video Technol..

[10]  William Mark Future graphics architectures , 2008, SIGGRAPH '08.

[11]  Pradeep Dubey,et al.  Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.

[12]  Chia-Lin Yang,et al.  A Multi-core Architecture Based Parallel Framework for H.264/AVC Deblocking Filters , 2009, J. Signal Process. Syst..

[13]  Liang-Gee Chen,et al.  Predictive watershed: a fast watershed algorithm for video segmentation , 2003, IEEE Trans. Circuits Syst. Video Technol..

[14]  Kurt Keutzer,et al.  Comparing analytical modeling with simulation for network processors: a case study , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[15]  Gwo Giun Lee,et al.  A high-quality spatial-temporal content-adaptive deinterlacing algorithm , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[16]  Gwo Giun Lee,et al.  A Motion-Adaptive Deinterlacer via Hybrid Motion Detection and Edge-Pattern Recognition , 2008, EURASIP J. Image Video Process..

[17]  Lai-Man Po,et al.  Enhanced hexagonal search for fast block motion estimation , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Gwo Giun Lee,et al.  On a 3D recursive motion estimation algorithm and architecture for digital video SoC , 2004, The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS '04..

[19]  Lurng-Kuo Liu,et al.  Video Analysis and Compression on the STI Cell Broadband Engine Processor , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[20]  Tao Wang,et al.  Novel parallel Hough Transform on multi-core processors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Ghislain Roquier,et al.  An integrated environment for HW/SW co-design based on a CAL specification and HW/SW code generators , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[22]  Klaus Mueller,et al.  Practical considerations for GPU-accelerated CT , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[23]  Hsueh-Ming Hang,et al.  H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA) , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[24]  Yen-Kuang Chen,et al.  Parallelization of AdaBoost algorithm on multi-core processors , 2008, 2008 IEEE Workshop on Signal Processing Systems.

[25]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[26]  Yen-Kuang Chen,et al.  Implementation of H.264 encoder and decoder on personal computers , 2006, J. Vis. Commun. Image Represent..

[27]  Yuan-Hua Chu,et al.  Overview of ITRI PAC project - from VLIW DSP processor to multicore computing platform , 2008, 2008 IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[28]  Donald P. Brutzman,et al.  The virtual reality modeling language and Java , 1998, CACM.

[29]  D.V. Anderson,et al.  Trends in multicore DSP platforms , 2009, IEEE Signal Processing Magazine.

[30]  M.C. Kung,et al.  Block based parallel motion estimation using programmable graphics hardware , 2008, 2008 International Conference on Audio, Language and Image Processing.

[31]  Luca Benini,et al.  SystemC Cosimulation and Emulation of Multiprocessor SoC Designs , 2003, Computer.

[32]  William R. Mark Future Graphics Architectures , 2008, SIGGRAPH 2008.

[33]  Oscar C. Au,et al.  Fast global motion estimation based on local motion segmentation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[34]  Sébastien Lafond,et al.  Quasi-Static Scheduling of CAL Actor Networks for Reconfigurable Video Coding , 2011, J. Signal Process. Syst..

[35]  J.-F. Nezan,et al.  Reconfigurable video coding on multicore , 2009, IEEE Signal Processing Magazine.

[36]  Jason N. Dale,et al.  Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..

[37]  Zheng-hui Lin,et al.  Macroblock-level decoding and deblocking method and its pipeline implementation in H.264 decoder SOC design , 2007 .

[38]  Shau-Yin Tseng,et al.  Realization and Optimization of H.264 Decoder for Dual-Core SoC , 2007, SIGMAP.

[39]  Gwo Giun Lee,et al.  Multiresolution-based texture adaptive motion detection for de-interlacing , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[40]  Gwo Giun Lee,et al.  Motion Adaptive Deinterlacing via Edge Pattern Recognition , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[41]  Martin Isenburg,et al.  Coding with ASCII: compact, yet text-based 3D content , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[42]  Gwo Giun Lee,et al.  Spatial-temporal content-adaptive deinterlacing algorithm , 2008 .

[43]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[44]  Ahmed Amine Jerraya,et al.  Multiprocessor System-on-Chip (MPSoC) Technology , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[45]  Gwo Giun Lee,et al.  Multiresolution-Based Texture Adaptive Algorithm for High-Quality Deinterlacing , 2007, IEICE Trans. Inf. Syst..

[46]  Euee S. Jang,et al.  An introduction to the MPEG-4 animation framework eXtension , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Hiroshi Inoue,et al.  REAL-TIME MUTUAL-INFORMATION-BASED LINEAR REGISTRATION ON THE CELL BROADBAND ENGINE PROCESSOR , 2007, 2007 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[48]  Peter Pirsch,et al.  Multicore system-on-chip architecture for MPEG-4 streaming video , 2002, IEEE Trans. Circuits Syst. Video Technol..

[49]  Kamesh Namuduri Motion estimation using spatio-temporal contextual information , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Kue-Hwan Sihn,et al.  Analysis and Parallelization of H.264 decoder on Cell Broadband Engine Architecture , 2007, 2007 IEEE International Symposium on Signal Processing and Information Technology.

[51]  Viktor K. Prasanna,et al.  A hierarchical simulation framework for application development on system-on-chip architectures , 2001, Proceedings 14th Annual IEEE International ASIC/SOC Conference (IEEE Cat. No.01TH8558).

[52]  Wael M. Badawy,et al.  An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation , 2009, J. Signal Process. Syst..

[53]  Peter Pirsch,et al.  A Platform-Independent Methodology for Performance Estimation of Multimedia Signal Processing Applications , 2005, J. VLSI Signal Process..

[54]  Samuel Williams,et al.  Scientific Computing Kernels on the Cell Processor , 2007, International Journal of Parallel Programming.

[55]  James F. Blinn,et al.  Real-time GPU rendering of piecewise algebraic surfaces , 2006, SIGGRAPH 2006.

[56]  Michael Deering,et al.  Geometry compression , 1995, SIGGRAPH.

[57]  Wayne H. Wolf A Decade of Hardware/Software Codesign , 2003, Computer.

[58]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[59]  Gwo Giun Lee,et al.  On the efficient algorithm/architecture co-exploration for complex video processing , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[60]  Yang Wang,et al.  Spatiotemporal video segmentation based on graphical models , 2005, IEEE Transactions on Image Processing.

[61]  Marco Mattavelli,et al.  High-abstraction level complexity analysis and memory architecture simulations of multimedia algorithms , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[62]  Ben H. H. Juurlink,et al.  Parallel Scalability of Video Decoders , 2009, J. Signal Process. Syst..

[63]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[64]  Gwo Giun Lee,et al.  Extraction of Perceptual Hue Feature Set for Color Image/Video Segmentation , 2008, PCM.

[65]  Liang-Gee Chen,et al.  Analysis and design of macroblock pipelining for H.264/AVC VLSI architecture , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[66]  Philip C. Treleaven Parallel architecture overview , 1988, Parallel Comput..

[67]  Robert A. Walker,et al.  Efficient optimal design space characterization methodologies , 2000, TODE.

[68]  James D. K. Kim,et al.  Interpolator data compression for MPEG-4 animation , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[69]  Amitabh Varshney,et al.  Real-time rendering of translucent meshes , 2004, TOGS.

[70]  Erwin A. de Kock,et al.  YAPI: application modeling for signal processing systems , 2000, Proceedings 37th Design Automation Conference.

[71]  Klaus Mueller,et al.  IOP PUBLISHING PHYSICS IN MEDICINE AND BIOLOGY , 2007 .

[72]  Masayuki Tanimoto Overview of free viewpoint television , 2006, Signal Process. Image Commun..

[73]  Yu-Cheng Lin,et al.  Multi-pass algorithm of motion estimation in video encoding for generic GPU , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[74]  Christopher J. Hughes,et al.  Computer Vision on Multi-Core Processors: Articulated Body Tracking , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[75]  Kazuhiro Otsuka,et al.  Real-time Visual Tracker by Stream Processing , 2009, J. Signal Process. Syst..

[76]  Ivano Barbieri,et al.  A Simulation and Exploration Technology for Multimedia-Application-Driven Architectures , 2005, J. VLSI Signal Process..

[77]  Jörn W. Janneck,et al.  Profiling dataflow programs , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[78]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[79]  H.-Y. Lin,et al.  Rate control algorithm based on intra-picture complexity for H.264/AVC , 2009, IET Image Process..

[80]  Xun He,et al.  An efficient block motion estimation method on CELL BE , 2008, 2008 International Conference on Audio, Language and Image Processing.

[81]  Mickaël Raulet,et al.  Overview of the MPEG Reconfigurable Video Coding Framework , 2011, J. Signal Process. Syst..

[82]  G C Sharp,et al.  GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration , 2007, Physics in medicine and biology.

[83]  Edward J. Delp,et al.  Overview of parallel processing approaches to image and video compression , 1994, Electronic Imaging.

[84]  Houqiang Li,et al.  Parallel Encoding - Decoding Operation for Multiview Video Coding with High Coding Efficiency , 2007, 2007 3DTV Conference.

[85]  Shyi-Chyi Cheng Visual pattern matching in motion estimation for object-based very low bit-rate coding using moment-preserving edge detection , 2005, IEEE Trans. Multim..

[86]  Faouzi Kossentini,et al.  H.264/AVC baseline profile decoder complexity analysis , 2003, IEEE Trans. Circuits Syst. Video Technol..

[87]  Ghislain Roquier,et al.  Translating dataflow programs to efficient hardware: an MPEG-4 Simple Profile decoder case study , 2008 .

[88]  Gwo Giun Lee,et al.  A 3D Spatio-Temporal Motion Estimation Algorithm for Video Coding , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[89]  Keshab K. Parhi,et al.  VLSI digital signal processing systems , 1999 .

[90]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[91]  Erik B. van der Tol,et al.  Mapping of H.264 decoding on a multiprocessor architecture , 2003, IS&T/SPIE Electronic Imaging.

[92]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[93]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[94]  S.S. Bhattacharyya,et al.  Towards systematic exploration of tradeoffs for medical image registration on heterogeneous platforms , 2008, 2008 IEEE Biomedical Circuits and Systems Conference.

[95]  Mei Yu,et al.  Parallel Process of Hyper-Space-Based Multiview Video Compression , 2006, 2006 International Conference on Image Processing.

[96]  Ming Wei Chang,et al.  DVFS Aware Techniques on Parallel Architecture Core (PAC) Platform , 2008, 2008 International Conference on Embedded Software and Systems Symposia.

[97]  Janusz Konrad,et al.  Multiple motion segmentation with level sets , 2003, IEEE Trans. Image Process..

[98]  W. Plishker,et al.  Towards a Heterogeneous Medical Image Registration Acceleration Platform , 2007, 2007 IEEE Biomedical Circuits and Systems Conference.

[99]  Marco Mattavelli,et al.  Evaluation of the parallelization potential for efficient multimedia implementations: dynamic evaluation of algorithm critical path , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[100]  G. Blake,et al.  A survey of multicore processors , 2009, IEEE Signal Processing Magazine.

[101]  Lothar Thiele,et al.  A framework for evaluating design tradeoffs in packet processing architectures , 2002, DAC '02.

[102]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[103]  Wayne H. Wolf,et al.  Computers as components - principles of embedded computing system design , 2005 .