Mental Sampling in Multimodal Representations

Both resources in the natural environment and concepts in a semantic space are distributed "patchily", with large gaps in between the patches. To describe people's internal and external foraging behavior, various random walk models have been proposed. In particular, internal foraging has been modeled as sampling: in order to gather relevant information for making a decision, people draw samples from a mental representation using random-walk algorithms such as Markov chain Monte Carlo (MCMC). However, two common empirical observations argue against simple sampling algorithms such as MCMC. First, the spatial structure is often best described by a L\'evy flight distribution: the probability of the distance between two successive locations follows a power-law on the distances. Second, the temporal structure of the sampling that humans and other animals produce have long-range, slowly decaying serial correlations characterized as $1/f$-like fluctuations. We propose that mental sampling is not done by simple MCMC, but is instead adapted to multimodal representations and is implemented by Metropolis-coupled Markov chain Monte Carlo (MC$^3$), one of the first algorithms developed for sampling from multimodal distributions. MC$^3$ involves running multiple Markov chains in parallel but with target distributions of different temperatures, and it swaps the states of the chains whenever a better location is found. Heated chains more readily traverse valleys in the probability landscape to propose moves to far-away peaks, while the colder chains make the local steps that explore the current peak or patch. We show that MC$^3$ generates distances between successive samples that follow a L\'evy flight distribution and $1/f$-like serial correlations, providing a single mechanistic account of these two puzzling empirical phenomena.

[1]  Christopher T. Kello,et al.  Distributional and Temporal Properties of Eye Movement Trajectories in Scene Perception , 2011, CogSci.

[2]  W. A. Bousfield,et al.  An Analysis of Sequences of Restricted Associative Responses , 1944 .

[3]  G. Winocur,et al.  Clustering and switching as two components of verbal fluency: evidence from younger and older healthy adults. , 1997, Neuropsychology.

[4]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[5]  Michael T. Turvey,et al.  Human memory retrieval as Lévy foraging , 2007 .

[6]  S. Gershman,et al.  Where do hypotheses come from? , 2017, Cognitive Psychology.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Vikash K. Mansinghka,et al.  Reconciling intuitive physics and Newtonian mechanics for colliding objects. , 2013, Psychological review.

[9]  R. Ratcliff,et al.  Estimation and interpretation of 1/fα noise in human cognition , 2004 .

[10]  P. A. Prince,et al.  Lévy flight search patterns of wandering albatrosses , 1996, Nature.

[11]  Aapo Hyvärinen,et al.  Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior , 2002, NIPS.

[12]  Thomas L. Griffiths,et al.  "Burn-in, bias, and the rationality of anchoring" , 2012, NIPS.

[13]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[14]  Jing Xu,et al.  How memory biases affect information transmission: A rational analysis of serial reproduction , 2008, NIPS.

[15]  József Fiser,et al.  Neural Variability and Sampling-Based Probabilistic Representations in the Visual Cortex , 2016, Neuron.

[16]  Thomas L. Griffiths,et al.  Human memory search as a random walk in a semantic network , 2012, NIPS.

[17]  Havlin,et al.  Expected number of distinct sites visited by N Lévy flights on a one-dimensional lattice. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[18]  Christopher T. Kello,et al.  Scaling laws in cognitive sciences , 2010, Trends in Cognitive Sciences.

[19]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[20]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[21]  Nicolas E. Humphries,et al.  Scaling laws of marine predator search behaviour , 2008, Nature.

[22]  G. V. van Orden,et al.  Self-organization of cognitive performance. , 2003, Journal of experimental psychology. General.

[23]  Lawrence M. Ward,et al.  Dynamical Cognitive Science , 2001 .

[24]  D L Gilden,et al.  1/f noise in human cognition. , 1995, Science.

[25]  J. Tenenbaum,et al.  Predicting the future as Bayesian inference: people combine prior knowledge with observations when estimating duration and extent. , 2011, Journal of experimental psychology. General.

[26]  David L. Gilden,et al.  Fluctuations in the Time Required for Elementary Decisions , 1997 .

[27]  Christopher D. Manning,et al.  Probabilistic models of language processing and acquisition , 2006, Trends in Cognitive Sciences.

[28]  Peter Dayan,et al.  Optimal Recall from Bounded Metaplastic Synapses: Predicting Functional Adaptations in Hippocampal Area CA3 , 2014, PLoS Comput. Biol..

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  J. Tenenbaum,et al.  Structured statistical models of inductive reasoning. , 2009, Psychological review.

[31]  Cristina Savin,et al.  Spatio-temporal Representations of Uncertainty in Spiking Neural Networks , 2014, NIPS.

[32]  Jianbo Gao,et al.  Inertia and memory in ambiguous visual perception , 2006, Cognitive Processing.

[33]  Thomas L. Thornton,et al.  Provenance of correlations in psychological data , 2005, Psychonomic bulletin & review.

[34]  József Fiser,et al.  Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment , 2011, Science.

[35]  J. Gibbon,et al.  Scalar expectancy theory and peak-interval timing in humans. , 1998, Journal of experimental psychology. Animal behavior processes.

[36]  József Fiser,et al.  Perceptual Decision-Making as Probabilistic Inference by Neural Sampling , 2014, Neuron.

[37]  John R. Anderson,et al.  The Adaptive Nature of Human Categorization , 1991 .

[38]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[39]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[40]  Thomas T. Hills,et al.  Optimal foraging in semantic memory. , 2012, Psychological review.

[41]  Thomas L. Griffiths,et al.  One and Done? Optimal Decisions From Very Few Samples , 2014, Cogn. Sci..

[42]  H. Stanley,et al.  Optimizing the success of random searches , 1999, Nature.

[43]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[44]  D. Wolpert Probabilistic models in human sensorimotor control. , 2007, Human movement science.

[45]  Joshua B. Tenenbaum,et al.  Multistability and Perceptual Inference , 2012, Neural Computation.

[46]  Adam N. Sanborn,et al.  Bayesian Brains without Probabilities , 2016, Trends in Cognitive Sciences.

[47]  H. Larralde,et al.  Lévy walk patterns in the foraging movements of spider monkeys (Ateles geoffroyi) , 2003, Behavioral Ecology and Sociobiology.

[48]  Laurence Aitchison,et al.  The Hamiltonian Brain: Efficient Probabilistic Inference with Excitatory-Inhibitory Neural Circuit Dynamics , 2014, PLoS Comput. Biol..

[49]  G. Zaslavsky,et al.  Lévy Flights and Related Topics in Physics , 2013 .

[50]  J. Tenenbaum,et al.  Special issue on “Probabilistic models of cognition , 2022 .

[51]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[52]  J. Gibbon Scalar expectancy theory and Weber's law in animal timing. , 1977 .

[53]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[54]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .