A Perceptual Measure for Deep Single Image Camera Calibration

Most current single image camera calibration methods rely on specific image features or user input, and cannot be applied to natural images captured in uncontrolled settings. We propose directly inferring camera calibration parameters from a single image using a deep convolutional neural network. This network is trained using automatically generated samples from a large-scale panorama dataset, and considerably outperforms other methods, including recent deep learning-based approaches, in terms of standard L2 error. However, we argue that in many cases it is more important to consider how humans perceive errors in camera estimation. To this end, we conduct a large-scale human perception study where we ask users to judge the realism of 3D objects composited with and without ground truth camera calibration. Based on this study, we develop a new perceptual measure for camera calibration, and demonstrate that our deep calibration network outperforms other methods on this measure. Finally, we demonstrate the use of our calibration network for a number of applications including virtual object insertion, image retrieval and compositing.

[1]  Mahdi Nezamabadi,et al.  Color Appearance Models , 2014, J. Electronic Imaging.

[2]  Scott Workman,et al.  Horizon Lines in the Wild , 2016, BMVC.

[3]  Daniel Cohen-Or,et al.  Micro perceptual human computation for visual tasks , 2012, TOGS.

[4]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[6]  Connor Greenwell,et al.  DEEPFOCAL: A method for direct focal length estimation , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[7]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Yannick Hold-Geoffroy,et al.  Deep Outdoor Illumination Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[12]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[13]  Ersin Yumer,et al.  Learning to predict indoor illumination from a single image , 2017, ACM Trans. Graph..

[14]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[15]  Carsten Rother,et al.  A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[16]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[18]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[19]  Michel Antunes,et al.  Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Noah Snavely,et al.  Robust Global Translations with 1DSfM , 2014, ECCV.

[21]  Michael H. Brill,et al.  Color appearance models , 1998 .

[22]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Kalyan Sunkavalli,et al.  Automatic Scene Inference for 3D Object Compositing , 2014, ACM Trans. Graph..

[24]  Vittorio Ferrari,et al.  Video Temporal Alignment for Object Viewpoint , 2016, ACCV.

[25]  Alexei A. Efros,et al.  What Do the Sun and the Sky Tell Us About the Camera? , 2010, International Journal of Computer Vision.

[26]  Antonio Criminisi,et al.  Shape from Texture: Homogeneity Revisited , 2000, BMVC.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  O. Chum,et al.  Detection, Rectification and Segmentation of Coplanar Repeated Patterns , 2014, CVPR 2014.

[29]  Seungyong Lee,et al.  Automatic Upright Adjustment of Photographs With Robust Camera Calibration , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  P. Cavanagh The artist as neuroscientist , 2005, Nature.

[31]  Stephen J. Maybank,et al.  On plane-based camera calibration: A general algorithm, singularities, applications , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[32]  George Drettakis,et al.  Perception of perspective distortions in image-based rendering , 2013, ACM Trans. Graph..

[33]  Claudio Cusano,et al.  Single and Multiple Illuminant Estimation Using Convolutional Neural Networks , 2015, IEEE Transactions on Image Processing.

[34]  Qian Chen,et al.  Camera Calibration with Two Arbitrary Coplanar Circles , 2004, ECCV.

[35]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Martial Hebert,et al.  Data-Driven 3D Primitives for Single Image Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Scott Workman,et al.  A Pot of Gold: Rainbows as a Calibration Cue , 2014, ECCV.

[38]  Janne Heikkilä,et al.  A four-step camera calibration procedure with implicit image correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Mary J. Bravo,et al.  Image forensic analyses that elude the human visual system , 2010, Electronic Imaging.

[40]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Andrew Zisserman,et al.  Planar grouping for automatic detection of vanishing lines and points , 2000, Image Vis. Comput..

[42]  Warren W. Esty,et al.  The Box-Percentile Plot , 2003 .

[43]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.