VLIW architecture optimization for an efficient computation of stereoscopic video applications

This paper presents two new architecture optimizations to improve the processing performance of video applications with a high degree of data parallelism in VLIW processors. On the one hand, a new register file access mechanism, called X4 operation mode, allows to access wide operands made up of several consecutive registers in the register file, while keeping its normal functionality (i.e. single read/write register access). On the other hand, a new functional unit is proposed to efficiently process a typical stereoscopic video application based on a rank transformation and a semi-global-matching algorithm. An evaluation of those enhanced mechanisms is performed using a generic VLIW architecture and the resulting VLIW processor is compared with other CPU/GPU and FPGA implementations. The proposed architecture provides the full flexibility of a programmable processor, while processing 640×480 stereo video sequences under real-time conditions, what is not possible with the compared CPUs or GPUs.

[1]  Ines Ernst,et al.  Mutual Information Based Semi-Global Stereo Matching on the GPU , 2008, ISVC.

[2]  Faisal Imdad-Haque,et al.  The art of verification with Vera , 2001 .

[3]  Peter Pirsch,et al.  A Multi-Shared Register File Structure for VLIW Processors , 2010, J. Signal Process. Syst..

[4]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[5]  Peter Pirsch,et al.  Instruction merging to increase parallelism in VLIW architectures , 2009, 2009 International Symposium on System-on-Chip.

[6]  Peter Pirsch,et al.  RAPANUI: Rapid Prototyping for Media Processor Architecture Exploration , 2005, SAMOS.

[7]  Peter Pirsch,et al.  Design Space Exploration of Media Processors: A Generic VLIW Architecture and a Parameterized Scheduler , 2007, ARCS.

[8]  Philip L. Davidson,et al.  Real-time stereo vision using semi-global matching on programmable graphics hardware , 2006, SIGGRAPH '06.

[9]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[10]  T. Vaudrey,et al.  Differences between stereo and motion behaviour on synthetic and real-world stereo sequences , 2008, 2008 23rd International Conference Image and Vision Computing New Zealand.

[11]  A. C. Sonmez,et al.  FPGA design and implementation of a real-time stereo vision system , 2012, 2012 International Symposium on Innovations in Intelligent Systems and Applications.

[12]  Oge Marques,et al.  Stereo depth with a Unified Architecture GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Stefan K. Gehrig,et al.  A Real-Time Low-Power Stereo Vision Engine Using Semi-Global Matching , 2009, ICVS.

[14]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[15]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..