Predictive Encoding of Contextual Relationships for Perceptual Inference, Interpolation and Prediction

We propose a new neurally-inspired model that can learn to encode the global relationship context of visual events across time and space and to use the contextual information to modulate the analysis by synthesis process in a predictive coding framework. The model learns latent contextual representations by maximizing the predictability of visual events based on local and global contextual information through both top-down and bottom-up processes. In contrast to standard predictive coding models, the prediction error in this model is used to update the contextual representation but does not alter the feedforward input for the next layer, and is thus more consistent with neurophysiological observations. We establish the computational feasibility of this model by demonstrating its ability in several aspects. We show that our model can outperform state-of-art performances of gated Boltzmann machines (GBM) in estimation of contextual information. Our model can also interpolate missing events or predict future events in image sequences while simultaneously estimating contextual information. We show it achieves state-of-art performances in terms of prediction accuracy in a variety of tasks and possesses the ability to interpolate missing frames, a function that is lacking in GBM.

[1]  Roland Memisevic,et al.  Modeling Deep Temporal Dependencies with Recurrent "Grammar Cells" , 2014, NIPS.

[2]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[4]  J. Crowley,et al.  Estimating Face orientation from Robust Detection of Salient Facial Structures , 2004 .

[5]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[7]  E. Maris,et al.  Prior Expectation Mediates Neural Adaptation to Repeated Sounds in the Auditory Cortex: An MEG Study , 2011, The Journal of Neuroscience.

[8]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[9]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Tai Sing Lee,et al.  Accounting for network effects in neuronal responses using L1 regularized point process models , 2010, NIPS.

[12]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[13]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[14]  David Mumford,et al.  On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[15]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[16]  Roland Memisevic,et al.  Modeling sequential data using higher-order relational features and predictive training , 2014, ArXiv.

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  Geoffrey E. Hinton,et al.  Modeling the joint density of two images under a variety of transformations , 2011, CVPR 2011.

[19]  P. Dayan,et al.  Space and time in visual context , 2007, Nature Reviews Neuroscience.

[20]  Roland Memisevic,et al.  Learning to Relate Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Dileep George Belief Propagation and Wiring Length Optimization as Organizing Principles for Cortical Microcircuits , 2005 .

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[24]  Thomas Dean,et al.  A Computational Model of the Cerebral Cortex , 2005, AAAI.

[25]  M. Bar The proactive brain: using analogies and associations to generate predictions , 2007, Trends in Cognitive Sciences.

[26]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[27]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[28]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[29]  Roland Memisevic,et al.  Gradient-based learning of higher-order image features , 2011, 2011 International Conference on Computer Vision.

[30]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[31]  Emery N. Brown,et al.  Context Matters: The Illusive Simplicity of Macaque V1 Receptive Fields , 2012, PloS one.

[32]  D. Mumford On the computational architecture of the neocortex , 2004, Biological Cybernetics.