论文信息 - From Information Scaling of Natural Images to Regimes of Statistical Models

From Information Scaling of Natural Images to Regimes of Statistical Models

Computer vision can be considered a highly specialized data collection and data analysis problem. We need to understand the special properties of image data in order to construct statistical models for representing the wide variety of image patterns. One special property of vision that distinguishes itself from other sensory data such as speech data is that distance or scale plays a profound role in image data. More specifically, visual objects and patterns can appear at a wide range of distances or scales, and the same visual pattern appearing at different distances or scales produces different image data with different statistical properties, thus entails different regimes of statistical models. In particular, we show that the entropy rate of the image data changes over the viewing distance (as well as the camera resolution). Moreover, the inferential uncertainty changes with viewing distance too. We call these changes information scaling. From this perspective, we examine both empirically and theoretically two prominent and yet largely isolated research themes in image modeling literature, namely, wavelet sparse coding and Markov random fields. Our results indicate that the two models are appropriate on two different entropy regimes: sparse coding targets the low entropy regime, whereas the random fields are suitable for the high entropy regime. Because of information scaling, both models are necessary for representing and interpreting image intensity patterns in the whole entropy range, and information scaling triggers transitions between these two regimes of models. This motivates us to propose a full-zoom primal sketch model that integrates both sparse coding and Markov random fields. In this model, local image intensity patterns are classified into “sketchable regime” and “non-sketchable regime” by a sketchability criterion. In the sketchable regime, the image data are represented deterministically by highly parametrized sketch primitives. In the non-sketchable regime, the image data are characterized by Markov random fields whose sufficient statistics summarize computational results from failed attempts of sparse coding. The contribution of our work is two folded. First, information scaling provides a dimension to chart the space of natural images. Second, the full-zoom modeling scheme provides a natural integration of sparse coding and Markov random fields, thus enables us to develop a new and richer class of statistical models.

Song-Chun Zhu | Y. Wu | Cheng-en Guo

[1] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[2] J. M. Hammersley,et al. Markov fields on finite graphs and lattices , 1971 .

[3] J. Besag. Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[4] G. Matheron. Random Sets and Integral Geometry , 1976 .

[5] B. Mandelbrot. Fractal Geometry of Nature , 1984 .

[6] Charles M. Newman,et al. Normal fluctuations and the FKG inequalities , 1980 .

[7] Edward H. Adelson,et al. The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[8] Andrew P. Witkin,et al. Scale-Space Filtering , 1983, IJCAI.

[9] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] J. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[11] A. Barron. THE STRONG ERGODIC THEOREM FOR DENSITIES: GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM' , 1985 .

[12] A. Barron. ENTROPY AND THE CENTRAL LIMIT THEOREM , 1986 .

[13] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] D. Chandler,et al. Introduction To Modern Statistical Mechanics , 1987 .

[15] J. Friedman. Exploratory Projection Pursuit , 1987 .

[16] R A Young,et al. The Gaussian derivative model for spatial vision: I. Retinal mechanisms. , 1988, Spatial vision.

[17] T. Cover,et al. A sandwich proof of the Shannon-McMillan-Breiman theorem , 1988 .

[18] Stéphane Mallat,et al. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[19] D. Stoyan,et al. Stochastic Geometry and Its Applications , 1989 .

[20] P Perona,et al. Preattentive texture discrimination with early vision mechanisms , 1990 .

[21] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[22] William Bialek,et al. Statistics of Natural Images: Scaling in the Woods , 1993, NIPS.

[23] Ulf Grenander,et al. General Pattern Theory: A Mathematical Study of Regular Structures , 1993 .

[24] Michael I. Miller,et al. REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[25] Noel A Cressie,et al. Statistics for Spatial Data, Revised Edition. , 1994 .

[26] David J. Field,et al. What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[27] J. Bergen,et al. Pyramid-based texture analysis/synthesis , 1995, Proceedings., International Conference on Image Processing.

[28] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[29] Tai Sing Lee,et al. Image Representation Using 2D Gabor Wavelets , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[30] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[31] D. Mumford. Pattern theory: a unifying perspective , 1996 .

[32] Steven W. Zucker,et al. Local Scale Control for Edge Detection and Blur Estimation , 1996, ECCV.

[33] Daniel L. Ruderman,et al. Origins of scaling in natural images , 1996, Vision Research.

[34] Song-Chun Zhu,et al. Prior Learning and Gibbs Reaction-Diffusion , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[35] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[36] Tony Lindeberg,et al. Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[37] Terrence J. Sejnowski,et al. The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[38] Song-Chun Zhu,et al. Minimax Entropy Principle and Its Application to Texture Modeling , 1997, Neural Computation.

[39] Martin Vetterli,et al. Data Compression and Harmonic Analysis , 1998, IEEE Trans. Inf. Theory.

[40] Leszek Wojnar,et al. Image Analysis , 1998 .

[41] K. Jarrod Millman,et al. Learning Sparse Codes with a Mixture-of-Gaussians Prior , 1999, NIPS.

[42] Bruno A. Olshausen,et al. PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[43] Luis Álvarez,et al. The Size of Objects in Natural and Artificial Images , 1999 .

[44] E. Candès,et al. Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges , 2000 .

[45] Bin Yu,et al. Wavelet thresholding via MDL for natural images , 2000, IEEE Trans. Inf. Theory.

[46] Zhiyi Chi. Stationary self-similar random fields on the integer lattice , 2001 .

[47] D. Mumford,et al. Stochastic models for generic images , 2001 .

[48] Oliver Johnson. An Information-Theoretic Central Limit Theorem for Finitely Susceptible FKG Systems , 2001 .

[49] Gjlles Aubert,et al. Mathematical problems in image processing , 2001 .

[50] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[51] Eero P. Simoncelli,et al. Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[52] Ann B. Lee. Occlusion Models for Natural Images : A Statistical Study of a Scale-Invariant Dead Leaves Model , 2001 .

[53] Thomas C. M. Lee. An Introduction to Coding Theory and the Two‐Part Minimum Description Length Principle , 2001 .

[54] S. Osher,et al. IMAGE DECOMPOSITION AND RESTORATION USING TOTAL VARIATION MINIMIZATION AND THE H−1 NORM∗ , 2002 .

[55] Song-Chun Zhu,et al. Statistical Modeling of Texture Sketch , 2002, ECCV.

[56] Anuj Srivastava,et al. Universal Analytical Forms for Modeling Image Probabilities , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[57] Tony F. Chan,et al. Mathematical Models for Local Nontexture Inpaintings , 2002, SIAM J. Appl. Math..

[58] Stanley Osher,et al. Image Decomposition and Restoration Using Total Variation Minimization and the H1 , 2003, Multiscale Model. Simul..

[59] Eero P. Simoncelli,et al. On Advances in Statistical Modeling of Natural Images , 2004, Journal of Mathematical Imaging and Vision.

[60] Arthur E. C. Pece,et al. The Problem of Sparse Image Coding , 2002, Journal of Mathematical Imaging and Vision.

[61] Eero P. Simoncelli,et al. A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[62] Song-Chun Zhu,et al. Equivalence of Julesz Ensembles and FRAME Models , 2000, International Journal of Computer Vision.

[63] Lionel Moisan,et al. Meaningful Alignments , 2000, International Journal of Computer Vision.

[64] Song-Chun Zhu,et al. What are Textons? , 2005 .

[65] 仁科喜久子,et al. コーパスを利用した副詞の分析:-For a Composition System- , 2005 .

[66] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .