Compression of visual data into symbol-like descriptors in terms of a cognitive real-time vision system

Humans have main senses: sight, hearing, touch, smell, and taste. Most of them combine several aspects. For example vision addresses at least three perceptual modalities: motion, color, and luminance. Extraction of these modalities begins in the human eye in the retinal network and the preprocessed signals enter the brain as streams of spatio-temporal patterns. As vision is our main sense, particularly for the perception of the three dimensional structure of the world around us, major eorts have been made to understand and simulate the visual system based on the knowledge collected to date. The research done over the last decades in elds of image processing and computer vision coupled with a tremendous step forward in hardware for parallel computing opened the door to building of so-called cognitive vision systems and for their incorporation into robots. The goal of any cognitive vision system is to transform visual input information into more descriptive representations than just color, motion, or luminance. Furthermore, in most robotic systems \live" interactions of robots with the environment are required, greatly increasing demands on the system. In such systems all pre-computations of the visual data need to be performed in real-time in order to be able to use the output data in the perception-action loop. Thus, a central goal of this thesis is to provide techniques which are strictly compatible with real-time computation. In the first part of this thesis we investigate possibilities for the powerful compression of the initial visual input data into symbol-like descriptors, upon which abstract logic or learning schemes can be applied. We introduce a new real-time video segmentation framework performing automatic decomposition of monocular and stereo video streams without use of prior knowledge on data and considering only preceding information. All entities in the scene, representing objects or their parts, are uniquely identied. In the second part of the thesis we make additional use of stereoscopic visual information and address the problem of establishing correspondences between two views of the scene solved with apparent ease in the human visual system (for images acquired with left and right eye). We exploit these correspondences in the stereo image pairs for the estimation of depth (distance) by proposing a novel disparity measurement technique based on extracted stereo-segments. This technique approximates shape and computes depth information for all entities found in the scene. The most important and novel achievement of this approach is that it produces reliable depth information for objects with weak texture where performance of traditional stereo techniques is very poor. In the third part of this thesis we employ an active sensor, producing indoors much more precise depth information encoded as range-data than any passive stereo technique. We perform fusion of image and range data for video segmentation which results in better results. By this we can now even handle fast moving objects, which was not possible so far. To address the real-time constraint, the proposed segmentation framework was accelerated on a Graphics Processing Unit (GPU) architecture using the parallel programming model of Compute Uni ed Device Architecture (CUDA). All introduced methods: segmentation of single images, segmentation of monocular and stereo video streams, depth-supported video segmentation, and disparity computation from stereosegment correspondences run in real-time for middle-size images and close to real-time for higher resolutions. In summary: The main result of this thesis is a framework which can produce a compact representation of any visual scene where all meaningful entities are uniquely identied, tracked, and important descriptors, such as shape and depth information, are extracted. The ability of the framework was successfully demonstrated in the context of several European projects (PACO-PLUS, Garnics, IntellAct, and Xperience). The developed real-time system is now employed as a robust visual front-end in various real-time robotic systems.

[1]  Andreas Koschan,et al.  Digital Color Image Processing , 2008 .

[2]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[3]  Xin Yang,et al.  Region-based representations of image and motion estimation , 2001, International Symposium on Multispectral Image Processing and Pattern Recognition.

[4]  Edward H. Adelson,et al.  Human-assisted motion annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[6]  William T. Freeman,et al.  Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Ruigang Yang,et al.  Reliability Fusion of Time-of-Flight Depth and Stereo Geometry for High Quality Depth Maps , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Thomas Mauthner,et al.  Tracking as Segmentation of Spatial-Temporal Volumes by Anisotropic Weighted TV , 2009, EMMCVPR.

[9]  S. Dreyfus,et al.  Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[10]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[11]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[12]  Vladimir Kolmogorov,et al.  Graph cut based image segmentation with connectivity priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Norbert Krüger,et al.  Symbols as Self-emergent Entities in an Optimization Process of Feature Extraction and Predictions , 2006, Biological Cybernetics.

[14]  Rudolf Mester,et al.  Multichannel Segmentation Using Contour Relaxation: Fast Super-Pixels and Temporal Propagation , 2011, SCIA.

[15]  Babette Dellen,et al.  Real-Time Image Segmentation on a GPU , 2010, Facing the Multicore-Challenge.

[16]  Oussama Khatib,et al.  Grasping with application to an autonomous checkout robot , 2011, 2011 IEEE International Conference on Robotics and Automation.

[17]  W. James MacLean,et al.  Leveraging cost matrix structure for hardware implementation of stereo disparity computation using dynamic programming , 2010, Comput. Vis. Image Underst..

[18]  Jitendra Malik,et al.  Learning to Detect Natural Image Boundaries Using Brightness and Texture , 2002, NIPS.

[19]  Sylvain Paris,et al.  Edge-Preserving Smoothing and Mean-Shift Segmentation of Video Streams , 2008, ECCV.

[20]  Narendra Ahuja,et al.  A constant-space belief propagation algorithm for stereo matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Klaus Mosegaard,et al.  A SIMULATED ANNEALING APPROACH TO SEISMIC MODEL OPTIMIZATION WITH SPARSE PRIOR INFORMATION , 1991 .

[22]  David S. Johnson,et al.  8. The traveling salesman problem: a case study , 2003 .

[23]  Eren Erdal Aksoy,et al.  3d semantic representation of actions from effcient stereo-image-sequence segmentation on GPUs , 2010 .

[24]  Georg Hartmann,et al.  Stereo Matching with Implicit Detection of Occlusions , 1998, ECCV.

[25]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[26]  G. Wyszecki,et al.  Color Science Concepts and Methods , 1982 .

[27]  Wolff,et al.  Collective Monte Carlo updating for spin systems. , 1989, Physical review letters.

[28]  Babette Dellen,et al.  Depth-supported real-time video segmentation with the Kinect , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[29]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2011, International Journal of Computer Vision.

[30]  In-So Kweon,et al.  Adaptive Support-Weight Approach for Correspondence Search , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Din-Chang Tseng,et al.  Color segmentation using perceptual attributes , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis,.

[32]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[33]  Takeo Kanade,et al.  A Cooperative Algorithm for Stereo Matching and Occlusion Detection , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  P. Salamon,et al.  Simulated annealing with constant thermodynamic speed , 1988 .

[35]  Miao Liao,et al.  Real-time Global Stereo Matching Using Hierarchical Belief Propagation , 2006, BMVC.

[36]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..

[37]  Ajai Jain,et al.  The Handbook of Pattern Recognition and Computer Vision , 1993 .

[38]  Frédo Durand,et al.  A Topological Approach to Hierarchical Segmentation using Mean Shift , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Federico Tombari,et al.  Segmentation-Based Adaptive Support for Accurate Stereo Correspondence , 2007, PSIVT.

[41]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[42]  Eren Erdal Aksoy,et al.  A modular system architecture for online parallel vision pipelines , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[43]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[44]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[45]  Gauthier Lafruit,et al.  Stream-Centric Stereo Matching and View Synthesis: A High-Speed Approach on GPUs , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[46]  Brooks,et al.  Intelligent Robots and Systems IROS ' 90 Lunar Base Const rue t ion Rob 0 t s , .

[47]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[48]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[49]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[50]  William A. Barrett,et al.  Toboggan-based intelligent scissors with a four-parameter edge model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[51]  Richard P. Wildes,et al.  Direct Recovery of Three-Dimensional Scene Geometry From Binocular Stereo Disparity , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Daniel Cremers,et al.  An Improved Algorithm for TV-L 1 Optical Flow , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[53]  Stefano Mattoccia,et al.  Scene Segmentation Assisted by Stereo Vision , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[54]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[55]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Victor S. Lempitsky,et al.  Global Optimization for Shape Fitting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Antonios Gasteratos,et al.  Light-invariant 3D object's pose estimation using color distance transform , 2010, 2010 IEEE International Conference on Imaging Systems and Techniques.

[58]  Nikos Paragios,et al.  Segmentation, ordering and multi-object tracking using graphical models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[59]  Hsi-Jian Lee,et al.  Region matching and depth finding for 3D objects in stereo aerial photographs , 1990, Pattern Recognit..

[60]  Eren Erdal Aksoy,et al.  Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[61]  Steve R. White,et al.  Concepts of scale in simulated annealing , 2008 .

[62]  Olga Veksler,et al.  Fast variable window for stereo correspondence using integral images , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[63]  Mamoru Nakanishi,et al.  A New Real Time Object Segmentation and Tracking Algorithm and its Parallel Hardware Architecture , 2005, J. VLSI Signal Process..

[64]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[65]  Eren Erdal Aksoy,et al.  Categorizing object-action relations from semantic scene graphs , 2010, 2010 IEEE International Conference on Robotics and Automation.

[66]  Sankar K. Pal,et al.  A review on image segmentation techniques , 1993, Pattern Recognit..

[67]  F. Wörgötter,et al.  Cluster update algorithm and recognition , 2000 .

[68]  Keiichi Abe,et al.  Region correspondence by inexact attributed planar graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[69]  Liang-Gee Chen,et al.  Hardware-Efficient Belief Propagation , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[70]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[71]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[72]  Marc M. Van Hulle,et al.  Optic flow from unstable sequences through local velocity constancy maximization , 2009, Image Vis. Comput..

[73]  Qingxiong Yang,et al.  Near Real-time Stereo for Weakly-Textured Scenes , 2008, BMVC.

[74]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[75]  Sim Heng Ong,et al.  Video segmentation: Propagation, validation and aggregation of a preceding graph , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Keith Price,et al.  Picture Segmentation Using a Recursive Region Splitting Method , 1998 .

[77]  A Fast and Robust Cluster Update Algorithm for Image Segmentation in Spin-Lattice Models Without AnnealingVisual Latencies Revisited , 1998, Neural Computation.

[78]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.