SMOR: A Semantic Multi-View Object Representation System in 2D Image Sequences

In this work, we propose a framework for segmenting and tracking objects in a static scene using 2D image sequences taken from different viewpoints. The system includes the use of a set of simple algorithms to segment, identify, group and track the semantic objects modeled as a set of regions with compatible surface features. The integration of these algorithms for design of the system is the main contribution of our work. Previous works have considered this issue for the sequences of the static scenes containing a single object or dynamic scenes of multiple objects from a single shot. We are studying this issue for multiple objects from different shots. We present a hierarchical segmentation of image sequences into shots, regions, and objects. Shots are detected using inter-frame information related to camera motion, the regions are identified using the color information in each frame, and the objects are formed by adjacent regions with compatible shape and form. Then we track the segmented regions/objects throughout the sequence using motion information and color constancy. The result of the system is a multi-view representation of objects of a static scene. Comparisons to demonstrate the performance of the systems is provided and discussed.

[1]  B. S. Manjunath,et al.  NeTra-V: toward an object-based video representation , 1998, IEEE Trans. Circuits Syst. Video Technol..

[2]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[4]  David Casasent,et al.  Feature Space Trajectory Methods for Active Computer Vision , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Marc Levoy,et al.  Real-time 3D model acquisition , 2002, ACM Trans. Graph..

[6]  Sven J. Dickinson,et al.  Physics-based tracking of 3D objects in 2D image sequences , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[7]  Berthold K. P. Horn,et al.  Shape from shading , 1989 .

[8]  Allen M. Waxman,et al.  Adaptive 3-D Object Recognition from Multiple Views , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Donald B. Gennery,et al.  Visual tracking of known three-dimensional objects , 1992, International Journal of Computer Vision.

[10]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Sadegh Abbasi,et al.  Automatic view selection in multi-view object recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[12]  Hai Jin,et al.  Integrating Spatio-Temporal Context With Multiview Representation for Object Recognition in Visual Surveillance , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Jorge L. C. Sanz,et al.  Advances in Machine Vision , 1988, Springer Series in Perception Engineering.

[14]  David G. Lowe,et al.  Perceptual Organization and Visual Recognition , 2012 .

[15]  Amarnath Gupta,et al.  Virage video engine , 1997, Electronic Imaging.

[16]  Louise Stark,et al.  Developing the aspect graph representation for use in image understanding , 1989 .

[17]  Linda G. Shapiro,et al.  3D object recognition from color intensity images , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[18]  Sadegh Abbasi,et al.  Affine-similar shape retrieval: application to multiview 3-D object recognition , 2001, IEEE Trans. Image Process..

[19]  Andrea Salgian,et al.  Appearance-based object recognition using multiple views , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  Satoshi Okada,et al.  Object recognition using a feature search strategy generated from a 3D model , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[21]  Shih-Fu Chang,et al.  An integrated approach for content-based video object segmentation and retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[22]  Alex Pentland,et al.  Closed-Form Solutions for Physically Based Shape Modeling and Recognition , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Tamim Asfour,et al.  Object separation using active methods and multi-view representations , 2008, 2008 IEEE International Conference on Robotics and Automation.

[24]  Shree K. Nayar,et al.  A Theory of Specular Surface Geometry , 2004, International Journal of Computer Vision.

[25]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[26]  King Ngi Ngan,et al.  Multi-view video based multiple objects segmentation using graph cut and spatiotemporal projections , 2010, J. Vis. Commun. Image Represent..

[27]  Pavel Slavik,et al.  Virtual Environments and Scientific Visualization ’96 , 1996, Eurographics.

[28]  Ronen Basri,et al.  Lambertian Reflectance and Linear Subspaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Sadegh Abbasi,et al.  Automatic Selection of Optimal Views in Multi-view Object Recognition , 2000, BMVC.

[30]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.