Of bits and wows: A Bayesian theory of surprise with applications to attention

The amount of information contained in a piece of data can be measured by the effect this data has on its observer. Fundamentally, this effect is to transform the observer's prior beliefs into posterior beliefs, according to Bayes theorem. Thus the amount of information can be measured in a natural way by the distance (relative entropy) between the prior and posterior distributions of the observer over the available space of hypotheses. This facet of information, termed "surprise", is important in dynamic situations where beliefs change, in particular during learning and adaptation. Surprise can often be computed analytically, for instance in the case of distributions from the exponential family, or it can be numerically approximated. During sequential Bayesian learning, surprise decreases as the inverse of the number of training examples. Theoretical properties of surprise are discussed, in particular how it differs and complements Shannon's definition of information. A computer vision neural network architecture is then presented capable of computing surprise over images and video stimuli. Hypothesizing that surprising data ought to attract natural or artificial attention systems, the output of this architecture is used in a psychophysical experiment to analyze human eye movements in the presence of natural video stimuli. Surprise is found to yield robust performance at predicting human gaze (ROC-like ordinal dominance score approximately 0.7 compared to approximately 0.8 for human inter-observer repeatability, approximately 0.6 for simpler intensity contrast-based predictor, and 0.5 for chance). The resulting theory of surprise is applicable across different spatio-temporal scales, modalities, and levels of abstraction.

[1]  Robert McEliece,et al.  The Theory of Information and Coding: Information theory , 2002 .

[2]  S A Finney,et al.  Real-time data collection in Linux: A case study , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[3]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[4]  Mario Blaum,et al.  Information, Coding and Mathematics , 2010 .

[5]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[6]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[7]  Irving John Good,et al.  The Surprise Index for the Multivariate Normal Distribution , 1956 .

[8]  H. Basford,et al.  Optimal eye movement strategies in visual search , 2005 .

[9]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[10]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[11]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[12]  S. Grossberg,et al.  Contrast-sensitive perceptual grouping and object-based attention in the laminar circuits of primary visual cortex , 2000, Vision Research.

[13]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[14]  Tarald O. Kvålseth Stimulus Probability, Surprise, and Reaction Time , 1987 .

[15]  J. Aczel,et al.  On Measures of Information and Their Characterizations , 2012 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  G. Rainer,et al.  Cognitive neuroscience: Neural mechanisms for detecting and remembering novel events , 2003, Nature Reviews Neuroscience.

[18]  Jitendra Malik,et al.  An Information Maximization Model of Eye Movements , 2004, NIPS.

[19]  W. Weaver,et al.  Probability, rarity, interest, and surprise. , 1948, The Scientific monthly.

[20]  Keith A. Schneider,et al.  Interhemispheric suppression: The case of the missing vertical meridian , 2010 .

[21]  E. T. Jaynes,et al.  BAYESIAN METHODS: GENERAL BACKGROUND ? An Introductory Tutorial , 1986 .

[22]  Michael Evans,et al.  Bayesian ikference procedures derived via the concept of relative surprise , 1997 .

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  E. Jaynes Probability theory : the logic of science , 2003 .

[25]  Lawrence D. Brown Fundamentals of Statistical Exponential Families , 1987 .

[26]  Béla Bollobás,et al.  Random Graphs , 1985 .

[27]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[28]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Edwin Thompson Jaynes,et al.  Probability theory , 2003 .

[30]  Claudio M. Privitera,et al.  Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[32]  P Reinagel,et al.  Natural scene statistics at the centre of gaze. , 1999, Network.

[33]  K Suder,et al.  The Control of Low-Level Information Flow in the Visual System , 2000, Reviews in the neurosciences.

[34]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[35]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[36]  D. S. Wooding,et al.  The relationship between the locations of spatial features and those of fixations made during visual examination of briefly presented images. , 1996, Spatial vision.

[37]  P. Lennie,et al.  Rapid adaptation in visual cortex to the structure of images. , 1999, Science.

[38]  R. M. Redheffer,et al.  A Note on the Surprise Index , 1951 .

[39]  W. Eric L. Grimson,et al.  Using adaptive tracking to classify and monitor activities in a site , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[40]  Laurent Itti,et al.  Realistic avatar eye and head animation using a neurobiological model of visual attention , 2004, SPIE Optics + Photonics.

[41]  Guy Jumarie,et al.  Relative Information — What For? , 1990 .

[42]  A. Rényi On Measures of Entropy and Information , 1961 .

[43]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[44]  Tim B. Swartz,et al.  Optimally and computations for relative surprise inferences , 2006 .

[45]  Mark Bartlett,et al.  THE STATISTICAL SIGNIFICANCE OF ODD BITS OF INFORMATION , 1952 .

[46]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[47]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[48]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[49]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[50]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[51]  S. Grossberg How hallucinations may arise from brain mechanisms of learning, attention, and volition , 2000, Journal of the International Neuropsychological Society.

[52]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2001 .

[53]  Pierre Baldi,et al.  A Computational Theory of Surprise , 2002 .

[54]  Zhaoping Li A saliency map in primary visual cortex , 2002, Trends in Cognitive Sciences.

[55]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[56]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[57]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[58]  William R. Softky,et al.  The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[59]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[60]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[61]  Zhaoping Li,et al.  Psychophysical Tests of the Hypothesis of a Bottom-Up Saliency Map in Primary Visual Cortex , 2007, PLoS Comput. Biol..

[62]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[63]  Laurent Itti,et al.  The role of memory in guiding attention during natural vision. , 2006, Journal of vision.