An Automatic Three-Dimensional Scene Reconstruction System Using Crowdsourced Geo-Tagged Videos

Automatic 3-D scene reconstruction is a useful technique in modern intelligent systems. Scene reconstruction from video sequences requires a selection of representative video frames. Most previous works employ the content-based techniques to automatically extract key frames. These methods take no frame geo-information into account and may be computationally heavy. In this paper, we propose a new key frame selection scheme based on the video geographic cues. Currently, an increasing number of user-generated videos are collected, which is a trend driven by the popularity of smartphones. In addition, it is convenient to acquire and fuse various sensor data (e.g., the geo-spatial metadata) for creating the geo-tagged mobile videos. Nowadays, large repositories of media content are automatically geo-tagged. Our proposed technique utilizes these underlying geo-metadata to select the most representative frames. We first eliminate irrelevant frames in which the candidate 3-D object does not appear. Then, a fixed number of key frames are selected. The criterion is that the selected key frames can maximally cover the candidate 3-D object/scene. Comprehensive experiments demonstrate the high quality of the reconstructed 3-D objects. Moreover, the execution time is reduced by 90%.

[1]  Bing-Fei Wu,et al.  A Real-Time Vision System for Nighttime Vehicle Detection and Traffic Surveillance , 2011, IEEE Transactions on Industrial Electronics.

[2]  David Nister,et al.  Automatic Dense Reconstruction from Uncalibrated Video Sequences , 2001 .

[3]  Xuelong Li,et al.  Fusion of Multichannel Local and Global Structural Cues for Photo Aesthetics Evaluation , 2014, IEEE Transactions on Image Processing.

[4]  P. Torr Geometric motion segmentation and model selection , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[5]  Ying Zhang,et al.  Learning a Probabilistic Topology Discovering Model for Scene Categorization , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Jiebo Luo,et al.  Towards Extracting Semantically Meaningful Key Frames From Personal Video Clips: From Humans to Computers , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Jianmin Wang,et al.  A Novel Inverter for Arc Welding Machines , 2015, IEEE Transactions on Industrial Electronics.

[8]  Cheng-Yuan Chang,et al.  Real-Time Visual Tracking and Measurement to Control Fast Dynamics of Overhead Cranes , 2012, IEEE Transactions on Industrial Electronics.

[9]  Xiaoli Li,et al.  The Specular Exponent as a Criterion for Appearance Quality Assessment of Pearllike Objects by Artificial Vision , 2012, IEEE Transactions on Industrial Electronics.

[10]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Chuanhou Gao,et al.  Modeling of the Thermal State Change of Blast Furnace Hearth With Support Vector Machines , 2012, IEEE Transactions on Industrial Electronics.

[12]  Richard Szeliski,et al.  Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Roger Zimmermann,et al.  Motch: an automatic motion type characterization system for sensor-rich videos , 2012, ACM Multimedia.

[14]  Gene H. Golub,et al.  Matrix computations , 1983 .

[15]  Xiao Liu,et al.  Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[16]  Seth Hutchinson,et al.  Image fusion and subpixel parameter estimation for automated optical inspection of electronic components , 1996, IEEE Trans. Ind. Electron..

[17]  Xuelong Li,et al.  Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo Cropping , 2014, IEEE Transactions on Image Processing.

[18]  Roger Zimmermann,et al.  Viewable scene modeling for geospatial video search , 2008, ACM Multimedia.

[19]  Luming Zhang,et al.  An Effective Video Summarization Framework Toward Handheld Devices , 2015, IEEE Transactions on Industrial Electronics.

[20]  Li Ling,et al.  A Dense 3D Reconstruction Approach from Uncalibrated Video Sequences , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[21]  Sang-Hoon Kim,et al.  3D Estimation and Key-Frame Selection for Match Move , 2003 .

[22]  Yi Yang,et al.  Weakly Supervised Photo Cropping , 2014, IEEE Transactions on Multimedia.

[23]  Jong-Soo Choi,et al.  Optimal keyframe selection algorithm for three-dimensional reconstruction in uncalibrated multiple images , 2008 .

[24]  Gérard G. Medioni,et al.  Aerial 3D reconstruction with line-constrained dynamic programming , 2011, 2011 International Conference on Computer Vision.

[25]  Athman Bouguettaya,et al.  On-Line Clustering , 1996, IEEE Trans. Knowl. Data Eng..

[26]  Tong Heng Lee,et al.  A Robust Real-Time Embedded Vision System on an Unmanned Rotorcraft for Ground Target Following , 2012, IEEE Transactions on Industrial Electronics.

[27]  Long Quan,et al.  A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Sudipta N. Sinha,et al.  REAL-TIME VIDEO-BASED RECONSTRUCTION OF URBAN ENVIRONMENTS , 2007 .

[29]  N. Sudha,et al.  Hardware-Efficient Image-Based Robotic Path Planning in a Dynamic Environment and Its FPGA Implementation , 2011, IEEE Transactions on Industrial Electronics.

[30]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[31]  Alex Ruderman,et al.  About Voltage Total Harmonic Distortion for Single- and Three-Phase Multilevel Inverters , 2015, IEEE Transactions on Industrial Electronics.

[32]  Matthew N. Dailey,et al.  Robust Key Frame Extraction for 3D Reconstruction from Video Streams , 2010, VISAPP.

[33]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[35]  Rama Chellappa,et al.  3D face reconstruction from video using a generic model , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[36]  Daniel Henrández-Lobato,et al.  Learning feature selection dependencies in multi-task learning , 2013, NIPS 2013.

[37]  Yue Gao,et al.  Feature Correlation Hypergraph: Exploiting High-order Potentials for Multimodal Recognition , 2014, IEEE Transactions on Cybernetics.

[38]  Yi Yang,et al.  Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[39]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[40]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Xuelong Li,et al.  A Fine-Grained Image Categorization System by Cellet-Encoded Spatial Pyramid Modeling , 2015, IEEE Transactions on Industrial Electronics.