Hierarchical structure is employed by humans during visual motion perception

Significance The structured organization of motion in visual scenes is highly informative for our everyday perception: We recognize people by the way they walk, track objects through occlusion, or predict hazardous situations from the traffic flow. It is, however, unclear how our minds tame the overwhelmingly complex stream of dynamic information received by the retina to form such stable percepts. We argue that an observer can exploit a “divide-and-conquer” strategy where complex motion relations are broken down into compositions of simpler motions. Evidence for hierarchical decomposition comes from multiple object tracking and prediction experiments in which humans are able to exploit motion structure knowledge to improve performance. Our results can guide neuroscience experiments on the neural representation of structure. In the real world, complex dynamic scenes often arise from the composition of simpler parts. The visual system exploits this structure by hierarchically decomposing dynamic scenes: When we see a person walking on a train or an animal running in a herd, we recognize the individual’s movement as nested within a reference frame that is, itself, moving. Despite its ubiquity, surprisingly little is understood about the computations underlying hierarchical motion perception. To address this gap, we developed a class of stimuli that grant tight control over statistical relations among object velocities in dynamic scenes. We first demonstrate that structured motion stimuli benefit human multiple object tracking performance. Computational analysis revealed that the performance gain is best explained by human participants making use of motion relations during tracking. A second experiment, using a motion prediction task, reinforced this conclusion and provided fine-grained information about how the visual system flexibly exploits motion structure.

[1]  Oliver W. Layton,et al.  A Unified Model of Heading and Path Perception in Primate MSTd , 2014, PLoS Comput. Biol..

[2]  Jan Drugowitsch,et al.  Computational Precision of Mental Inference as Critical Source of Human Choice Suboptimality , 2016, Neuron.

[3]  Ardavan Saeedi,et al.  Variational Particle Approximations , 2014, J. Mach. Learn. Res..

[4]  Christopher C. Pack,et al.  Hierarchical processing of complex motion along the primate dorsal visual pathway , 2012, Proceedings of the National Academy of Sciences.

[5]  S. Ullman,et al.  The interpretation of visual motion , 1977 .

[6]  James T Enns,et al.  Multiple-object tracking is based on scene, not retinal, coordinates. , 2005, Journal of experimental psychology. Human perception and performance.

[7]  Eero P. Simoncelli,et al.  Optimal inference explains the perceptual coherence of visual motion stimuli. , 2011, Journal of vision.

[8]  S. Yantis Multielement visual tracking: Attention and perceptual organization , 1992, Cognitive Psychology.

[9]  Edward H. Adelson,et al.  Motion illusions as optimal percepts , 2002, Nature Neuroscience.

[10]  Wei Ji Ma,et al.  No capacity limit in attentional tracking: evidence for probabilistic inference under a resource constraint. , 2009, Journal of vision.

[11]  G. Alvarez,et al.  Spatial separation between targets constrains maintenance of attention on multiple objects , 2008, Psychonomic bulletin & review.

[12]  Surya Ganguli,et al.  A mathematical theory of semantic development in deep neural networks , 2018, Proceedings of the National Academy of Sciences.

[13]  A. Stocker,et al.  Post-decision biases reveal a self-consistency principle in perceptual inference , 2018, eLife.

[14]  Kazuhiko Yokosawa,et al.  Grouping and Trajectory Storage in Multiple Object Tracking: Impairments Due to Common Item Motions , 2006, Perception.

[15]  Edward K. Vogel,et al.  Come Together, Right Now: Dynamic Overwriting of an Object's History through Common Fate , 2014, Journal of Cognitive Neuroscience.

[16]  P. Cavanagh,et al.  Tracking multiple targets with multifocal attention , 2005, Trends in Cognitive Sciences.

[17]  Radoslaw Martin Cichy,et al.  Object Vision in a Structured World , 2019, Trends in Cognitive Sciences.

[18]  A. Pouget,et al.  Marginalization in Neural Circuits with Divisive Normalization , 2011, The Journal of Neuroscience.

[19]  M. Graziano,et al.  Tuning of MST neurons to spiral motions , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[20]  D. Bradley,et al.  Structure and function of visual area MT. , 2005, Annual review of neuroscience.

[21]  Ed Vul,et al.  Attention Modulates Spatial Precision in Multiple-Object Tracking , 2016, Top. Cogn. Sci..

[22]  Xuemin Zhang,et al.  Additivity of Feature-Based and Symmetry-Based Grouping Effects in Multiple Object Tracking , 2016, Front. Psychol..

[23]  Z. Pylyshyn,et al.  Is motion extrapolation employed in multiple object tracking? Tracking as a low-level, non-predictive function , 2006, Cognitive Psychology.

[24]  Z W Pylyshyn,et al.  Tracking multiple independent targets: evidence for a parallel tracking mechanism. , 1988, Spatial vision.

[25]  Eero P. Simoncelli,et al.  Noise characteristics and prior expectations in human visual speed perception , 2006, Nature Neuroscience.

[26]  Jonathan I. Flombaum,et al.  Close encounters of the distracting kind: Identifying the cause of visual tracking errors , 2012, Attention, Perception, & Psychophysics.

[27]  Luigi Acerbi,et al.  On the Origins of Suboptimality in Human Probabilistic Inference , 2014, PLoS Comput. Biol..

[28]  Martin Wattenberg,et al.  Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.

[29]  M. Dawson,et al.  The how and why of what went where in apparent motion: modeling solutions to the motion correspondence problem. , 1991, Psychological review.

[30]  O. Braddick,et al.  The temporal integration and resolution of velocity signals , 1991, Vision Research.

[31]  P. McLeod,et al.  Motion coherence and conjunction search: Implications for guided search theory , 1992, Perception & psychophysics.

[32]  Hrag Pailian,et al.  Hierarchical motion structure is employed by humans during visual perception , 2019, Journal of Vision.

[33]  Jason M. Scimeca,et al.  Tracking Multiple Objects Is Limited Only by Object Spacing, Not by Speed, Time, or Capacity , 2010, Psychological science.

[34]  S. Gunnar O. Johansson,et al.  Configurations in event perception : an experimental study , 1951 .

[35]  Gilles Faÿ,et al.  Características inmunológicas claves en la fisiopatología de la sepsis. Infectio , 2009 .

[36]  Brett R. Fajen,et al.  A Neural Model of MST and MT Explains Perceived Object Motion during Self-Motion , 2016, The Journal of Neuroscience.

[37]  Samuel J. Gershman,et al.  Discovering hierarchical motion structure , 2016, Vision Research.

[38]  Tao Gao,et al.  Seeing “What” Through “Why”: Evidence From Probing the Causal Structure of Hierarchical Motion , 2017, Journal of experimental psychology. General.

[39]  Alan A. Stocker,et al.  Sensory Adaptation within a Bayesian Framework for Perception , 2005, NIPS.

[40]  M. Wertheimer Untersuchungen zur Lehre von der Gestalt. II , 1923 .