Image composition for object pop-out

We propose a new data-driven framework for novel object detection and segmentation, or “object pop-out”. Traditionally, this task is approached via background subtraction, which requires continuous observation from a stationary camera. Instead, we consider this an image matching problem. We detect novel objects in the scene using an unordered, sparse database of previously captured images of the same general environment. The problem is formulated in a new image composition framework: 1) given an input image, we find a small set of similar matching images; 2) each of the matches is aligned with the input by proposing a set of homography transformations; 3) regions from different transformed matches are stitched together into a single composite image that best matches the input; 4) the difference between the input and the composite is used to “pop-out” new or changed objects.

[1]  Harry Shum,et al.  Correction to Construction of Panoramic Image Mosaics with Global and Local Alignment , 2001, International Journal of Computer Vision.

[2]  David Salesin,et al.  Interactive digital photomontage , 2004, SIGGRAPH 2004.

[3]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[4]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Yihong Gong,et al.  Background Segmentation Using Spatial-Temporal Multi-Resolution MRF , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[6]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[7]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[8]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[9]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Andrew Zisserman,et al.  Get Out of my Picture! Internet-based Inpainting , 2009, BMVC.

[12]  Chris Stauffer,et al.  Automated multi-camera planar tracking correspondence modeling , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[14]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[15]  A EfrosAlexei,et al.  Scene completion using millions of photographs , 2007 .

[16]  Maneesh Agrawala,et al.  Piecewise Image Registration in the Presence of Multiple Large Motions , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Takeo Kanade,et al.  Background Subtraction for Freely Moving Cameras , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[20]  Jitendra Malik,et al.  Towards robust automatic traffic scene analysis in real-time , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[21]  Chris Stauffer,et al.  Moving Object Segmentation Using Super-Resolution Background Models , 2005 .

[22]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[23]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[25]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, ICCV.

[26]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[29]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[32]  Andrew W. Fitzgibbon,et al.  Efficient new-view synthesis using pairwise dictionary priors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.