A two-level real-time vision machine combining coarse- and fine-grained parallelism

In this paper, we describe a real-time vision machine having a stereo camera as input generating visual information on two different levels of abstraction. The system provides visual low-level and mid-level information in terms of dense stereo and optical flow, egomotion, indicating areas with independently moving objects as well as a condensed geometric description of the scene. The system operates at more than 20 Hz using a hybrid architecture consisting of one dual-GPU card and one quad-core CPU. The different processing stages of visual information have rather different characteristics that in some cases make fine-grained parallelization on a GPU less applicable. However, for most of the stages that are not efficiently implementable on a GPU, a coarse parallelization on multiple CPU-cores is applicable. We show that with such hybrid parallelism, we can achieve a speed up of approximately a factor 90 and a reduction of latency of a factor 26 compared to processing on a single CPU-core. Since the vision machine provides generic visual information it can be used in many contexts. Currently it is used in a driver assistance context as well as in two robotic applications.

[1]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[2]  Kevin Skadron,et al.  Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.

[3]  David Vernon The Space of Cognitive Vision , 2006, Cognitive Vision Systems.

[4]  Peter Tröger,et al.  The Multi-Core Era - Trends and Challenges , 2008, ArXiv.

[5]  William B. Thompson,et al.  Detecting moving objects , 1989, International Journal of Computer Vision.

[6]  Florentin Wörgötter,et al.  A cortical architecture on parallel hardware for motion processing in real time. , 2010, Journal of vision.

[7]  Justus H. Piater,et al.  A Probabilistic Framework for 3D Visual Object Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Gerhard Krieger,et al.  Nonlinear mechanisms and higher-order statistics in biological vision and electronic image processing: review and perspectives , 2001, J. Electronic Imaging.

[9]  Danica Kragic,et al.  Birth of the Object: Detection of Objectness and Extraction of Object Shape through Object-Action complexes , 2008, Int. J. Humanoid Robotics.

[10]  Michael Felsberg,et al.  The monogenic signal , 2001, IEEE Trans. Signal Process..

[11]  Henrique S. Malvar,et al.  High-quality linear interpolation for demosaicing of Bayer-patterned color images , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Geoffrey E. Hinton,et al.  Learning Generative Texture Models with extended Fields-of-Experts , 2009, BMVC.

[13]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[14]  Bernt Schiele,et al.  Probabilistic object recognition using multidimensional receptive field histograms , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[15]  Douglas C. Schmidt,et al.  The Design and Performance of , 2003 .

[16]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  Michael Felsberg,et al.  An explicit and compact coding of geometric and structural image information applied to stereo processing , 2004, Pattern Recognit. Lett..

[18]  Fabio Solari,et al.  Compact (and accurate) early vision processing in the harmonic space , 2007, VISAPP.

[19]  Ron Sass,et al.  Quantifying Effective Memory Bandwidth of Platform FPGAs , 2007 .

[20]  Hans Knutsson,et al.  Signal processing for computer vision , 1994 .

[21]  Florentin Wörgötter,et al.  Accumulated Visual Representation for Cognitive Vision , 2008, BMVC.

[22]  Jeppe Barsøe Jessen Real time sparse and dense stereo in an early cognitive vision system using CUDA Master ’ s thesis , 2009 .

[23]  Barry Wilkinson Computer architecture (2nd ed.): design and performance , 1996 .

[24]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[25]  Refractor Vision , 2000, The Lancet.

[26]  Håkon Ording Bugge An evaluation of Intel’s core i7 architecture using a comparative approach , 2009, Computer Science - Research and Development.

[27]  Marc M. Van Hulle,et al.  A phase-based approach to the estimation of the optical flow field using spatial filtering , 2002, IEEE Trans. Neural Networks.

[28]  Karl Pauwels Computational modeling of visual attention: neuronal response modulation in the thalamocortical complex and saliency-based detection of independent motion , 2008 .

[29]  John W. Tukey,et al.  Data Analysis and Regression: A Second Course in Statistics , 1977 .

[30]  D. Hubel,et al.  Anatomical Demonstration of Columns in the Monkey Striate Cortex , 1969, Nature.

[31]  Danica Kragic,et al.  Early reactive grasping with second order 3D feature relations , 2007 .

[32]  Gösta H. Granlund,et al.  The complexity of vision , 1999, Signal Process..

[33]  Nicolas Pugeault,et al.  Early cognitive vision: feedback mechanisms for the disambiguation of early visual representation , 2008 .

[34]  Eduardo Ros,et al.  A Hybrid FPGA/Coarse Parallel Processing Architecture for Multi-modal Visual Feature Descriptors , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[35]  Edward H. Adelson,et al.  PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .

[36]  D. Pollen,et al.  Phase relationships between adjacent simple cells in the visual cortex. , 1981, Science.

[37]  David J. Fleet,et al.  Computation of component image velocity from local phase information , 1990, International Journal of Computer Vision.

[38]  G. Granlund In search of a general picture processing operator , 1978 .

[39]  Marc M. Van Hulle,et al.  Realtime phase-based optical flow on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[40]  Norbert Krüger,et al.  A Real-Time Embedded System for Stereo Vision Preprocessing Using an FPGA , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[41]  Carlo Tomasi,et al.  On the Consistency of Instantaneous Rigid Motion Estimation , 2004, International Journal of Computer Vision.

[42]  Iain E. Garden Richardson,et al.  Design and Performance , 2004 .

[43]  Florentin Wörgötter,et al.  Early Cognitive Vision: Using Gestalt-Laws for Task-Dependent, Active Image-Processing , 2004, Natural Computing.

[44]  Markus Lappe,et al.  Biologically Motivated Multi-modal Processing of Visual Primitives , 2003 .

[45]  John Y. Aloimonos,et al.  Unification and integration of visual modules: an extension of the Marr Paradigm , 1989 .

[46]  Maya Gokhale,et al.  Matched Filter Computation on FPGA, Cell and GPU , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[47]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[48]  Michael Felsberg,et al.  Continuous dimensionality characterization of image structures , 2009, Image Vis. Comput..

[49]  Soren W. Henriksen,et al.  Manual of photogrammetry , 1980 .

[50]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[51]  Florentin Wörgötter,et al.  Road Interpretation for Driver Assistance based on an Early Cognitive Vision System , 2009, VISAPP.

[52]  Jean-Yves Bouguet,et al.  Camera calibration toolbox for matlab , 2001 .