AWEAR 2.0 system: Omni-directional audio-visual data acquisition and processing

We present a wearable audio-visual capturing system, termed AWEAR 2.0, along with its underlying vision components that allow robust self-localization, multi-body pedestrian tracking, and dense scene reconstruction. Designed as a backpack, the system is aimed at supporting the cognitive abilities of the wearer. In this paper, we focus on the design issues for the hardware platform and on the performance of the current state-of-the-art computer vision methods on the acquired sequences. We describe the calibration procedure of the two omni-directional cameras present in the system as well as a structure-from-motion pipeline that allows for stable multi-body tracking even from rather shaky video sequences thanks to ground plane stabilization. Furthermore, we show how a dense scene reconstruction can be obtained from the data acquired with the platform.

[1]  Olivier Koch,et al.  A Self-Calibrating, Vision-Based Navigation Assistant , 2008 .

[2]  Stepán Obdrzálek,et al.  Object Recognition using Local Affine Frames on Distinguished Regions , 2002, BMVC.

[3]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[4]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5]  Luc Van Gool,et al.  Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michal Havlena,et al.  Measuring camera translation by the dominant apical angle , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  M. Havlena,et al.  Dynamic 3 D Scene Analysis from Omni-Directional Video Data , 2009 .

[11]  Tomás Pajdla,et al.  Structure from motion with wide circular field of view cameras , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  R. Hartley,et al.  A Non-iterative Method for Correcting Lens Distortion from Nine Point Correspondences , 2005 .

[13]  Richard Hartley,et al.  A non-iterative method for lens distortion correction from nine point correspondences , 2005, ICCV 2005.

[14]  Jan-Michael Frahm,et al.  Real-Time Visibility-Based Fusion of Depth Maps , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[18]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[19]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Thomas W. Parks,et al.  Adaptive homogeneity-directed demosaicing algorithm , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[22]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[23]  Stepán Obdrzálek,et al.  Image Retrieval Using Local Compact DCT-Based Representation , 2003, DAGM-Symposium.