Photo-based Multimedia Applications using Image Features Detection

This paper proposes a framework for the creation of interactive multimedia applications that take advantage of detected features from user-captured photos. The goal is to create games, architectural and space planning applications that interact with visual elements in the images such as walls, floors and empty spaces. The framework takes advantage of a semi-automatic algorithm to detect scene elements and camera parameters. Using the detected features, virtual objects can be inserted in the scene. In this paper several example applications are presented and discussed, and the reliability of the detection algorithm is compared with other systems. The presented solution analyses the photos using graph-cuts for segmentation, vanishing point detection and line analysis to detect the scene elements. The main advantage of the proposed framework is the semi-automatic creation of the tri-dimensional model to be used in mixed reality applications. This enables scenarios where the user can be responsible for the input scene without much prior knowledge or experience. The current implemented examples include a furniture positioning system and a snake game with a user-built maze in the real world. The proposed system is ideal for multimedia mobile applications where interaction is combined with the back camera of the device.

[1]  Joseph Schlecht,et al.  Sampling bedrooms , 2011, CVPR 2011.

[2]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[3]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[4]  Andrew W. Fitzgibbon,et al.  Markerless tracking using planar structures in the scene , 2000, Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000).

[5]  Reinhard Koch,et al.  Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  Gilles Simon Automatic online walls detection for immediate use in AR tasks , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[7]  David A. Forsyth,et al.  Rendering synthetic objects into legacy photographs , 2011, ACM Trans. Graph..

[8]  Nuno Correia,et al.  Magnetic augmented reality: virtual objects in your space , 2012, AVI.

[9]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[10]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Carsten Rother A new approach to vanishing point detection in architectural environments , 2002, Image Vis. Comput..

[13]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[14]  Jitendra Malik,et al.  Inferring spatial layout from a single image via depth-ordered grouping , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Anne Bationo Tillon,et al.  Mobile augmented reality in the museum: Can a lace-like technology take you closer to works of art? , 2011, 2011 IEEE International Symposium on Mixed and Augmented Reality - Arts, Media, and Humanities.

[16]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR 2011.

[17]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Tomoya Ishikawa,et al.  Interactive 3-D indoor modeler for virtualizing service fields , 2011, Virtual Reality.

[19]  Stephen Gould,et al.  Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding , 2010, ECCV.

[20]  Takeo Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, CVPR.

[21]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[22]  Tsuhan Chen,et al.  Active learning for piecewise planar 3D reconstruction , 2011, CVPR 2011.

[23]  Chieh-Li Chen,et al.  Tennis real play: an interactive tennis game with models from real videos , 2011, MM '11.

[24]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[25]  Richard Szeliski,et al.  Piecewise planar stereo for image-based rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[27]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[28]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time dynamic 3D surface reconstruction and interaction , 2011, SIGGRAPH '11.

[29]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[30]  Dieter Schmalstieg,et al.  Real-Time Detection and Tracking for Augmented Reality on Mobile Phones , 2010, IEEE Transactions on Visualization and Computer Graphics.

[31]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[32]  Peter Simon,et al.  Augmenting experiences — A bridge between two universities , 2011, 2011 IEEE International Symposium on Mixed and Augmented Reality - Arts, Media, and Humanities.

[33]  Alan L. Yuille,et al.  Manhattan World: compass direction from a single image by Bayesian inference , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.