Swipe Mosaics from Video

A panoramic image mosaic is an attractive visualization for viewing many overlapping photos, but its images must be both captured and processed correctly to produce an acceptable composite. We propose Swipe Mosaics, an interactive visualization that places the individual video frames on a 2D planar map that represents the layout of the physical scene. Compared to traditional panoramic mosaics, our capture is easier because the user can both translate the camera center and film moving subjects. Processing and display degrade gracefully if the footage lacks distinct, overlapping, non-repeating texture. Our proposed visual odometry algorithm produces a distribution over (x,y) translations for image pairs. Inferring a distribution of possible camera motions allows us to better cope with parallax, lack of texture, dynamic scenes, and other phenomena that hurt deterministic reconstruction techniques. Robustness is obtained by training on synthetic scenes with known camera motions. We show that Swipe Mosaics are easy to generate, support a wide range of difficult scenes, and are useful for documenting a scene for closer inspection.

[1]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[2]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Richard Szeliski,et al.  Street slide: browsing street level imagery , 2010, ACM Trans. Graph..

[4]  David W. Murray,et al.  An O(N²) Square Root Unscented Kalman Filter for Visual Simultaneous Localization and Mapping , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  David Salesin,et al.  Panoramic video textures , 2005, ACM Trans. Graph..

[6]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[7]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[8]  Yuzhen Niu,et al.  Direct manipulation video navigation in 3D , 2013, CHI.

[9]  Michael F. Cohen,et al.  Capturing and viewing gigapixel images , 2007, ACM Trans. Graph..

[10]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[13]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[14]  Shmuel Peleg,et al.  Mosaicing on Adaptive Manifolds , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  P. Anandan,et al.  Mosaic based representations of video sequences and their applications , 1995, Proceedings of IEEE International Conference on Computer Vision.

[16]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[17]  Frédo Durand,et al.  Unstructured Light Fields , 2012, Comput. Graph. Forum.

[18]  Richard Szeliski,et al.  Video mosaics for virtual environments , 1996, IEEE Computer Graphics and Applications.

[19]  Natasha Gelfand,et al.  Viewfinder Alignment , 2008, Comput. Graph. Forum.

[20]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[21]  Illah R. Nourbakhsh,et al.  Techniques for evaluating optical flow for visual odometry in extreme terrain , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[22]  Andrew Zisserman,et al.  Multiple View Geometry , 1999 .

[23]  Harry Shum,et al.  Rendering with concentric mosaics , 1999, SIGGRAPH.

[24]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[25]  R. Szeliski,et al.  Ambient point clouds for view interpolation , 2010, ACM Trans. Graph..

[26]  David Salesin,et al.  Video object annotation, navigation, and composition , 2008, UIST '08.

[27]  Michael Gleicher,et al.  Subspace video stabilization , 2011, TOGS.

[28]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  E. Olson Fast iterative alignment of pose graphs with poor estimates , 2006 .

[30]  Steven M. Seitz,et al.  Dynamic Mosaics , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[31]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[32]  Raanan Fattal,et al.  Video stabilization using epipolar geometry , 2012, TOGS.

[33]  Pierre Dragicevic,et al.  Video browsing by direct manipulation , 2008, CHI.

[34]  Richard Szeliski,et al.  Finding paths through the world's photos , 2008, ACM Trans. Graph..

[35]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[36]  James Davis,et al.  Mosaics of scenes with moving objects , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[38]  Andrew W. Fitzgibbon,et al.  Unwrap mosaics: a new representation for video editing , 2008, ACM Trans. Graph..

[39]  Steven M. Seitz,et al.  Multicore bundle adjustment , 2011, CVPR 2011.

[40]  Marc Levoy,et al.  The Frankencamera: an experimental platform for computational photography , 2010, ACM Trans. Graph..

[41]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[42]  Suya You,et al.  Fusion of vision and gyro tracking for robust augmented reality registration , 2001, Proceedings IEEE Virtual Reality 2001.

[43]  Dieter Schmalstieg,et al.  Real-time panoramic mapping and tracking on mobile phones , 2010, 2010 IEEE Virtual Reality Conference (VR).

[44]  Selim Benhimane,et al.  Real-time image-based tracking of planes using efficient second-order minimization , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[45]  Simon J. D. Prince,et al.  Computer Vision: Models, Learning, and Inference , 2012 .

[46]  Dani Lischinski,et al.  Dynamosaics: video mosaics with non-chronological time , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[47]  Tom Drummond,et al.  Tightly integrated sensor fusion for robust visual tracking , 2004, Image Vis. Comput..

[48]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[49]  Marc Pollefeys,et al.  Learning a Confidence Measure for Optical Flow , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[51]  Richard Szeliski,et al.  Creating full view panoramic image mosaics and environment maps , 1997, SIGGRAPH.

[52]  Tom Drummond,et al.  Tightly integrated sensor fusion for robust visual tracking , 2004, Image Vis. Comput..

[53]  David Salesin,et al.  Photographing long scenes with multi-viewpoint panoramas , 2006, ACM Trans. Graph..

[54]  Aaron Hertzmann,et al.  Learning 3D mesh segmentation and labeling , 2010, SIGGRAPH 2010.

[55]  Jan O. Borchers,et al.  DRAGON: a direct manipulation interface for frame-accurate in-scene video navigation , 2008, CHI.

[56]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[57]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR 2011.