On the Role of Representations for Reasoning in Large-Scale Urban Scenes

The advent of widely available photo collections covering broad geographic areas has spurred significant advances in large-scale urban scene modeling. While much emphasis has been placed on reconstruction and visualization, the utility of such models extends well beyond. Specifically, these models should support a wide variety of reasoning tasks (or queries), and thus enable advanced scene study. Driven by this interest, we analyze 3D representations for their utility to perform queries. Since representations as well as queries are highly heterogeneous, we build on a categorization that serves as a coupling interface between both domains. Equipped with our taxonomy and the notion of uncertainty in the representation, we quantify the utility of representations for solving three archetypal reasoning tasks in terms of accuracy, uncertainty and computational complexity. We provide an empirical analysis of these intertwined realms on challenging real and synthetic urban scenes.

[1]  Michael J. Black,et al.  Towards Probabilistic Volumetric Reconstruction Using Ray Potentials , 2015, 2015 International Conference on 3D Vision.

[2]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[3]  Larry S. Davis,et al.  Why Did the Person Cross the Road (There)? Scene Understanding Using Probabilistic Logic Models and Common Sense Reasoning , 2010, ECCV.

[4]  Song-Chun Zhu,et al.  Joint inference of groups, events and human roles in aerial videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Pau Gargallo,et al.  Minimizing the Reprojection Error in Surface Reconstruction from Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Gérard Giraudon,et al.  Robotics and Autonomous Systems 3 D scene interpretation for a mobile robot , 2022 .

[7]  M. Fernandez,et al.  Closed-Form Expression for the Poisson-Binomial Probability Density Function , 2010, IEEE Transactions on Aerospace and Electronic Systems.

[8]  Nico Blodow,et al.  Towards 3D Point cloud based object maps for household environments , 2008, Robotics Auton. Syst..

[9]  C. Zach Fast and High Quality Fusion of Depth Maps , 2008 .

[10]  Danica Kragic,et al.  Tracking people interacting with objects , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Gérard Giraudon,et al.  Scene analysis system , 1994, Proceedings of 1st International Conference on Image Processing.

[12]  Peter Szolovits,et al.  What Is a Knowledge Representation? , 1993, AI Mag..

[13]  Jean-Philippe Pons,et al.  High Accuracy and Visibility-Consistent Dense Multiview Stereo , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[16]  Balázs Kégl,et al.  MULTIBOOST: A Multi-purpose Boosting Package , 2012, J. Mach. Learn. Res..

[17]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[18]  Philippe Montesinos,et al.  An image analysis, application for aerial imagery interpretation , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[19]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[20]  Pau Gargallo,et al.  An Occupancy-Depth Generative Model of Multi-view Images , 2007, ACCV.

[21]  Pascal Fua,et al.  Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[22]  John W. Fisher,et al.  Aerial Reconstructions via Probabilistic Data Fusion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Daniel Cremers,et al.  Continuous ratio optimization via convex relaxation with applications to multiview 3D reconstruction , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[25]  R. Cipolla,et al.  A probabilistic framework for space carving , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[26]  Mica R. Endsley,et al.  Toward a Theory of Situation Awareness in Dynamic Systems , 1995, Hum. Factors.

[27]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[28]  Jack Bresenham,et al.  Algorithm for computer control of a digital plotter , 1965, IBM Syst. J..

[29]  N. Kwok,et al.  Evaluating Performance of Multiple RRTs , 2008, 2008 IEEE/ASME International Conference on Mechtronic and Embedded Systems and Applications.

[30]  Mihai Dimian,et al.  Noise and Stochastic Processes , 2014 .

[31]  F. W. Fichtner Semantic enrichment of a point cloud based on an octree for multi-storey pathfinding , 2016 .

[32]  Daniel Cremers,et al.  Multiview Stereo and Silhouette Consistency via Convex Functionals over Convex Domains , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  C. Liguori,et al.  Propagation of uncertainty through stereo triangulation , 2010, 2010 IEEE Instrumentation & Measurement Technology Conference Proceedings.

[34]  P. Burrough,et al.  Principles of geographical information systems , 1998 .

[35]  Tomas Akenine-Möller,et al.  Fast, Minimum Storage Ray-Triangle Intersection , 1997, J. Graphics, GPU, & Game Tools.

[36]  Florent Lafarge Some new research directions to explore in urban reconstruction , 2015, 2015 Joint Urban Remote Sensing Event (JURSE).

[37]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[39]  Daniel G. Aliaga,et al.  A Survey of Urban Reconstruction , 2013, Comput. Graph. Forum.

[40]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[41]  Ayellet Tal,et al.  On the Visibility of Point Clouds , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Sanja Fidler,et al.  Holistic 3D scene understanding from a single geo-tagged image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  John W. Fisher,et al.  Semantically-Aware Aerial Reconstruction from Multi-modal Data , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[45]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[46]  Martin Held,et al.  ERIT - A Collection of Efficient and Reliable Intersection Tests , 1997, J. Graphics, GPU, & Game Tools.

[47]  Haye Hinrichsen,et al.  Entropy estimates of small data sets , 2008, 0804.4561.

[48]  Gérard Giraudon,et al.  Spatial Context In An Image Analysis System , 1990, ECCV.

[49]  Niloy J. Mitra,et al.  Visibility of noisy point cloud data , 2010, Comput. Graph..

[50]  Frank Bergmann,et al.  Scene Based Reasoning , 2015, AGI.

[51]  Gérard Giraudon,et al.  Multispecialist System for 3D Scene Analysis , 1994, ECAI.

[52]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[53]  Kiriakos N. Kutulakos,et al.  A Probabilistic Theory of Occupancy and Emptiness , 2002, ECCV.

[54]  Jeremy S. De Bonet,et al.  Poxels: Probabilistic Voxelized Volume Reconstruction , 1999 .

[55]  John W. Fisher,et al.  Performance Guarantees for Information Theoretic Active Inference , 2007, AISTATS.

[56]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[57]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[58]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[59]  Bruce A. Draper,et al.  The schema system , 1988, International Journal of Computer Vision.

[60]  Hang Yang,et al.  Structured Indoor Modeling , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Katsushi Ikeuchi,et al.  Scene Understanding by Reasoning Stability and Safety , 2015, International Journal of Computer Vision.

[62]  J A Sethian,et al.  A fast marching level set method for monotonically advancing fronts. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Wenze Hu,et al.  Learning 3D Object Templates by Quantizing Geometry and Appearance Spaces , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  C. Mallet,et al.  AIRBORNE LIDAR FEATURE SELECTION FOR URBAN CLASSIFICATION USING RANDOM FORESTS , 2009 .

[65]  D. Rajan Probability, Random Variables, and Stochastic Processes , 2017 .

[66]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Jan Dirk Wegner,et al.  Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Manuela Veloso,et al.  FAST GOAL NAVIGATION WITH OBSTACLE AVOIDANCE USING A DYNAMIC LOCAL VISUAL MODEL , 2005 .

[69]  Jianxiong Xiao,et al.  Reconstructing the World’s Museums , 2012, International Journal of Computer Vision.

[70]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[71]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  F. Downton,et al.  Introduction to Mathematical Statistics , 1959 .

[73]  Stéphane Houzelle,et al.  Interpretation of remotely sensed images in a context of multisensor fusion using a multispecialist architecture , 1993, IEEE Trans. Geosci. Remote. Sens..

[74]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[75]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .