Probabilistic Multiview Depth Image Enhancement Using Variational Inference

An inference-based multiview depth image enhancement algorithm is introduced and investigated in this paper. Multiview depth imagery plays a pivotal role in free-viewpoint television. This technology requires high-quality virtual view synthesis to enable viewers to move freely in a dynamic real world scene. Depth imagery of different viewpoints is used to synthesize an arbitrary number of novel views. Usually, the depth imagery is estimated individually by stereo-matching algorithms and, hence, shows inter-view inconsistency. This inconsistency affects the quality of view synthesis negatively. This paper enhances the multiview depth imagery at multiple viewpoints by probabilistic weighting of each depth pixel. First, our approach classifies the color pixels in the multiview color imagery. Second, using the resulting color clusters, we classify the corresponding depth values in the multiview depth imagery. Each clustered depth image is subject to further subclustering. Clustering based on generative models is used for assigning probabilistic weights to each depth pixel. Finally, these probabilistic weights are used to enhance the depth imagery at multiple viewpoints. Experiments show that our approach consistently improves the quality of virtual views by 0.2 dB to 1.6 dB, depending on the quality of the input multiview depth imagery.

[1]  Seungkyu Lee,et al.  Time-of-Flight Depth Camera Motion Blur Detection and Deblurring , 2014, IEEE Signal Processing Letters.

[2]  Lu Yu,et al.  Temporal consistency enhancement on depth sequences , 2010, 28th Picture Coding Symposium.

[3]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[4]  Young Min Kim,et al.  Multi-view image and ToF sensor fusion for dense 3D reconstruction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[5]  Markus Flierl,et al.  Multiview depth map enhancement by variational bayes inference estimation of Dirichlet mixture models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Markus Flierl,et al.  A Variational Bayesian Inference Framework for Multiview Depth Image Enhancement , 2012, 2012 IEEE International Symposium on Multimedia.

[8]  Ray A. Jarvis,et al.  A Perspective on Range Finding Techniques for Computer Vision , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Young Min Kim,et al.  Design and calibration of a multi-view TOF sensor fusion system , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[12]  Ismo Rakkolainen,et al.  A Survey of 3DTV Displays: Techniques and Technologies , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  A. Aydin Alatan,et al.  Segment-Based Stereo-Matching Via Plane and Angle Sweeping , 2007, 2007 3DTV Conference.

[14]  Xiaojin Gong,et al.  Guided inpainting and filtering for Kinect depth maps , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Minh N. Do,et al.  Depth Video Enhancement Based on Weighted Mode Filtering , 2012, IEEE Transactions on Image Processing.

[16]  Hai Tao,et al.  Dynamic depth recovery from multiple synchronized video streams , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  Hai Tao,et al.  Global matching criterion and color segmentation based stereo , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[18]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[19]  Markus Flierl,et al.  View interpolation with structured depth from multiview video , 2011, 2011 19th European Signal Processing Conference.

[20]  A. M. Kondoz,et al.  Impact of depth map spatial resolution on 3D video quality and depth perception , 2010, 2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[21]  Nizar Bouguila,et al.  Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application , 2004, IEEE Transactions on Image Processing.

[22]  Markus Flierl,et al.  Bayesian estimation of Dirichlet mixture model with variational inference , 2014, Pattern Recognit..

[23]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[24]  Camilo C. Dorea,et al.  Depth map reconstruction using color-based region merging , 2011, 2011 18th IEEE International Conference on Image Processing.

[25]  Masatoshi Okutomi,et al.  Color stereo matching and its application to 3-D measurement of optic nerve head , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[26]  Ruigang Yang,et al.  Reliability Fusion of Time-of-Flight Depth and Stereo Geometry for High Quality Depth Maps , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Charles A. Poynton,et al.  A technical introduction to digital video , 1996 .

[28]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[29]  Antonio Ortega,et al.  Depth map distortion analysis for view rendering and depth coding , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[30]  A. Lee Swindlehurst,et al.  IEEE Journal of Selected Topics in Signal Processing Inaugural Issue: [editor-in-chief's message] , 2007, J. Sel. Topics Signal Processing.

[31]  Yo-Sung Ho,et al.  Temporally Consistent Depth Map Estimation Using Motion Estimation for 3 DTV , 2009 .

[32]  Luis Salgado,et al.  Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.

[33]  Tomaso A. Poggio,et al.  On parallel stereo , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[34]  Hai Jin,et al.  Color Image Segmentation Based on Mean Shift and Normalized Cuts , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Aljoscha Smolic,et al.  Reliability-based generation and view synthesis in layered depth video , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[36]  Ju Shen,et al.  Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Yo-Sung Ho,et al.  High-quality multi-view depth generation using multiple color and depth cameras , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[38]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[39]  Thomas Wiegand,et al.  3-D Video Representation Using Depth Maps , 2011, Proceedings of the IEEE.

[40]  Zixiang Xiong,et al.  3D scene reconstruction by multiple structured-light based commodity depth cameras , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[42]  Toshiaki Fujii,et al.  FTV format using global view and depth map , 2012, 2012 Picture Coding Symposium.

[43]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[44]  P. Deb Finite Mixture Models , 2008 .

[45]  Christoph Fehn,et al.  Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV , 2004, IS&T/SPIE Electronic Imaging.

[46]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[47]  H.M. Wechsler,et al.  Digital image processing, 2nd ed. , 1981, Proceedings of the IEEE.

[48]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  F. Billmeyer Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd ed., by Gunter Wyszecki and W. S. Stiles, John Wiley and Sons, New York, 1982, 950 pp. Price: $75.00 , 1983 .

[50]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[51]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[52]  Kai-Kuang Ma,et al.  Content-adaptive temporal consistency enhancement for depth video , 2012, ICIP.

[53]  B. Girod,et al.  Multiview Video Compression , 2007, IEEE Signal Processing Magazine.

[54]  W D Wright,et al.  Color Science, Concepts and Methods. Quantitative Data and Formulas , 1967 .

[55]  Lu Fang,et al.  An Analytical Model for Synthesis Distortion Estimation in 3D Video , 2014, IEEE Transactions on Image Processing.

[56]  Toshiaki Fujii,et al.  Free-Viewpoint TV , 2011, IEEE Signal Processing Magazine.

[57]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[58]  Shahram Izadi,et al.  Modeling Kinect Sensor Noise for Improved 3D Reconstruction and Tracking , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[59]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[60]  Andreas Klaus,et al.  Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[61]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..