Analysis by Synthesis: A (Re-)Emerging Program of Research for Language and Vision

This contribution reviews (some of) the history of analysis by synthesis, an approach to perception and comprehension articulated in the 1950s. Whereas much research has focused on bottom-up, feed-forward, inductive mechanisms, analysis by synthesis as a heuristic model emphasizes a balance of bottom-up and knowledge-driven, top-down, predictive steps in speech perception and language comprehension. This idea aligns well with contemporary Bayesian approaches to perception (in language and other domains), which are illustrated with examples from different aspects of perception and comprehension. Results from psycholinguistics, the cognitive neuroscience of language, and visual object recognition suggest that analysis by synthesis can provide a productive way of structuring biolinguistic research. Current evidence suggests that such a model is theoretically well motivated, biologically sensible, and becomes computationally tractable borrowing from Bayesian formalizations.

[1]  Lin Foxhall,et al.  A VIEW FROM THE TOP , 1997 .

[2]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[3]  David Poeppel,et al.  Compound words and structure in the lexicon , 2007 .

[4]  M. Berger,et al.  High Gamma Power Is Phase-Locked to Theta Oscillations in Human Neocortex , 2006, Science.

[5]  Colin Phillips,et al.  Multiple dependencies and the role of the grammar in real-time comprehension1 , 2009, Journal of Linguistics.

[6]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[7]  Kenneth N Stevens,et al.  Toward a model for lexical access based on acoustic landmarks and distinctive features. , 2002, The Journal of the Acoustical Society of America.

[8]  K. Goodman Reading: A psycholinguistic guessing game , 1967 .

[9]  E. Halgren,et al.  Top-down facilitation of visual recognition. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  ALEC MARANTZ,et al.  Generative linguistics within the cognitive neuroscience of language , 2005 .

[11]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  D. Norris,et al.  Shortlist B: a Bayesian model of continuous speech recognition. , 2008, Psychological review.

[13]  Zellig S. Harris,et al.  Papers in structural and transformational linguistics , 1951 .

[14]  SHALOM LAPPIN,et al.  Machine learning theory and practice as a source of insight into universal grammar , 2007 .

[15]  Jeremy I. Skipper,et al.  Seeing Voices : How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception , 2007 .

[16]  David Poeppel,et al.  Visual speech speeds up the neural processing of auditory speech. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  F. Guenther Cortical interactions underlying the production of speech sounds. , 2006, Journal of communication disorders.

[18]  David Poeppel,et al.  Processing correlates of lexical semantic complexity , 2003, Cognition.

[19]  Mark Steedman,et al.  A Bottom-Up Parsing Model of Local Coherence Effects , 2010 .

[20]  F. Pulvermüller,et al.  Grasping Ideas with the Motor System: Semantic Somatotopy in Idiom Comprehension , 2009 .

[21]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[22]  A. Lotto,et al.  Putting phonetic context effects into context: A commentary on Fowler (2006) , 2006, Perception & psychophysics.

[23]  Jean Vroomen,et al.  Neural Correlates of Multisensory Integration of Ecologically Valid Audiovisual Events , 2007, Journal of Cognitive Neuroscience.

[24]  Friedemann Pulvermüller,et al.  Motor cortex maps articulatory features of speech sounds , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[25]  W. Marslen-Wilson,et al.  The temporal structure of spoken language understanding , 1980, Cognition.

[26]  M. Bar,et al.  Top-down predictions in the cognitive brain , 2007, Brain and Cognition.

[27]  George Yule,et al.  The study of language , 1998 .

[28]  Richard M. Stern,et al.  Analysis-by-synthesis features for speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[30]  M. Bar The proactive brain: using analogies and associations to generate predictions , 2007, Trends in Cognitive Sciences.

[31]  M. Halle From Memory to Speech and Back: Papers on Phonetics and Phonology 1954 - 2002 , 2003 .

[32]  A. Battersby Plans and the Structure of Behavior , 1968 .

[33]  Luc H. Arnal,et al.  Dual Neural Routing of Visual Facilitation in Speech Processing , 2009, The Journal of Neuroscience.

[34]  L. Fadiga,et al.  The Motor Somatotopy of Speech Perception , 2009, Current Biology.

[35]  Massimo Piattelli-Palmarini,et al.  Still a bridge too far? Biolinguistic questions for grounding language on brains , 2008 .

[36]  M. Iacoboni,et al.  Listening to speech activates motor areas involved in speech production , 2004, Nature Neuroscience.

[37]  Antonino Vallesi,et al.  Effects of TMS on Different Stages of Motor and Non-Motor Verb Processing in the Primary Motor Cortex , 2009, PloS one.

[38]  Noam Chomsky,et al.  Of minds and language : a dialogue with Noam Chomsky in the Basque country , 2009 .

[39]  L. Fadiga,et al.  Active perception: sensorimotor circuits as a cortical basis for language , 2010, Nature Reviews Neuroscience.

[40]  Gregory Hickok,et al.  Eight Problems for the Mirror Neuron Theory of Action Understanding in Monkeys and Humans , 2009, Journal of Cognitive Neuroscience.

[41]  R. Jakobson On Language , 1990 .

[42]  M. Turvey,et al.  The motor theory of speech perception reviewed , 2006, Psychonomic bulletin & review.

[43]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[44]  David Poeppel,et al.  The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric sampling in time' , 2003, Speech Commun..

[45]  Eugene Galanter,et al.  Handbook of mathematical psychology: I. , 1963 .

[46]  C. Phillips Linear Order and Constituency , 2003, Linguistic Inquiry.

[47]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[48]  Kam L. Wong Analysis or synthesis , 1985 .

[49]  David Poeppel,et al.  Feedforward and feedback in speech perception: Revisiting analysis by synthesis , 2011 .

[50]  G. Miller,et al.  Plans and the structure of behavior , 1960 .

[51]  S. Crain,et al.  The case of the missing generalizations , 2009 .

[52]  G. Rizzolatti The mirror neuron system and its function in humans , 2005, Anatomy and Embryology.

[53]  J. Zwart The Minimalist Program , 1998, Journal of Linguistics.

[54]  David Poeppel,et al.  Towards a new functional anatomy of language , 2004, Cognition.

[55]  K. Stevens,et al.  Reduction of Speech Spectra by Analysis‐by‐Synthesis Techniques , 1961 .

[56]  Richard S. J. Frackowiak,et al.  Endogenous Cortical Rhythms Determine Cerebral Specialization for Speech Perception and Production , 2007, Neuron.

[57]  D. Spalding The Principles of Psychology , 1873, Nature.

[58]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[59]  David Poeppel,et al.  The relation between linguistics and neuroscience , 2005 .

[60]  M. Halle,et al.  Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[61]  G. Rizzolatti,et al.  Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons , 2002, Science.

[62]  R. Reilly,et al.  Connectionist approaches to natural language processing , 1994 .

[63]  Ken Howard,et al.  The view from the top , 2003, Nature.

[64]  A. Boemio,et al.  Hierarchical and asymmetric temporal sensitivity in human auditory cortices , 2005, Nature Neuroscience.

[65]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[66]  D. Broadbent,et al.  Information Conveyed by Vowels , 1957 .

[67]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[68]  Ivan Titov,et al.  Incremental Bayesian networks for structure prediction , 2007, ICML '07.

[69]  G. Miller Some psychological studies of grammar. , 1962 .

[70]  David P. Medeiros Optimal Growth in Phrase Structure , 2008 .

[71]  Liina Pylkkänen,et al.  A visual M170 effect of morphological complexity , 2009 .

[72]  Kenneth N. Stevens,et al.  Speech recognition: A model and a program for research , 1962, IRE Trans. Inf. Theory.