You cannot speak and listen at the same time: a probabilistic model of turn-taking

Turn-taking is a preverbal skill whose mastering constitutes an important precondition for many social interactions and joint actions. However, the cognitive mechanisms supporting turn-taking abilities are still poorly understood. Here, we propose a computational analysis of turn-taking in terms of two general mechanisms supporting joint actions: action prediction (e.g., recognizing the interlocutor’s message and predicting the end of turn) and signaling (e.g., modifying one’s own speech to make it more predictable and discriminable). We test the hypothesis that in a simulated conversational scenario dyads using these two mechanisms can recognize the utterances of their co-actors faster, which in turn permits them to give and take turns more efficiently. Furthermore, we discuss how turn-taking dynamics depend on the fact that agents cannot simultaneously use their internal models for both action (or messages) prediction and production, as these have different requirements—or, in other words, they cannot speak and listen at the same time with the same level of accuracy. Our results provide a computational-level characterization of turn-taking in terms of cognitive mechanisms of action prediction and signaling that are shared across various interaction and joint action domains.

[1]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[2]  Adam N. Sanborn Types of approximation for probabilistic cognition: Sampling and variational , 2017, Brain and Cognition.

[3]  Björn Lindblom,et al.  Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[4]  Roger K. Moore PRESENCE: A Human-Inspired Architecture for Speech-Based Human-Machine Interaction , 2007, IEEE Transactions on Computers.

[5]  Y. Paulignan,et al.  An Interference Effect of Observed Biological Movement on Action , 2003, Current Biology.

[6]  Karl J. Friston,et al.  Active Inference: A Process Theory , 2017, Neural Computation.

[7]  Giovanni Pezzulo,et al.  Shared action spaces: a basis function framework for social re-calibration of sensorimotor representations supporting joint action , 2013, Front. Hum. Neurosci..

[8]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[9]  Björn Granström,et al.  Multimodality in Language and Speech Systems , 2002 .

[10]  Giovanni Pezzulo,et al.  Prefrontal Goal Codes Emerge as Latent States in Probabilistic Value Learning , 2016, Journal of Cognitive Neuroscience.

[11]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[12]  Michael C. Frank,et al.  PSYCHOLOGICAL SCIENCE Research Article Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning , 2022 .

[13]  G. Dell,et al.  Adapting production to comprehension: The explicit mention of instruments , 1987, Cognitive Psychology.

[14]  G. Pezzulo,et al.  Avoiding Accidents at the Champagne Reception , 2017, Psychological science.

[15]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[16]  Kensy Cooperrider Roots of human sociality: culture, cognition and interaction , 2009 .

[17]  Kristinn R. Thórisson,et al.  Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action , 2002 .

[18]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[19]  Giovanni Pezzulo,et al.  What should I do next? Using shared representations to solve interaction problems , 2011, Experimental Brain Research.

[20]  J. Kalaska,et al.  Neural Correlates of Reaching Decisions in Dorsal Premotor Cortex: Specification of Multiple Direction Choices and Final Selection of Action , 2005, Neuron.

[21]  Jan-Peter de Holger N. J. Ruiter,et al.  Projecting the End of a Speaker's Turn: A Cognitive Cornerstone of Conversation , 2006 .

[22]  Stefan Kopp,et al.  A model for production, perception, and acquisition of actions in face-to-face communication , 2010, Cognitive Processing.

[23]  Cordula Vesper,et al.  Making oneself predictable: reduced temporal variability facilitates joint action coordination , 2011, Experimental Brain Research.

[24]  Martin J. Pickering,et al.  Edinburgh Research Explorer A cognitive architecture for the coordination of utterances , 2022 .

[25]  Karl J. Friston,et al.  Action perception as hypothesis testing , 2017, Cortex.

[26]  C. Hofsten An action perspective on motor development , 2004, Trends in Cognitive Sciences.

[27]  R. Mccall,et al.  The Genetic and Environmental Origins of Learning Abilities and Disabilities in the Early School , 2007, Monographs of the Society for Research in Child Development.

[28]  Margaret Wilson,et al.  An oscillator model of the timing of turn-taking , 2005, Psychonomic bulletin & review.

[29]  Giovanni Pezzulo,et al.  A Programmer-Interpreter Neural Network Architecture for Prefrontal Cognitive Control , 2015, Int. J. Neural Syst..

[30]  H. Bekkering,et al.  Joint action: bodies and minds moving together , 2006, Trends in Cognitive Sciences.

[31]  Noah D. Goodman,et al.  A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[32]  Yiannis Demiris,et al.  Hierarchical attentive multiple models for execution and recognition of actions , 2006, Robotics Auton. Syst..

[33]  Rebecca J. Brand,et al.  Infants prefer motionese to adult-directed action. , 2008, Developmental science.

[34]  Giovanni Pezzulo,et al.  Problem Solving as Probabilistic Inference with Subgoaling: Explaining Human Successes and Pitfalls in the Tower of Hanoi , 2016, PLoS Comput. Biol..

[35]  H. H. Clark,et al.  Audience Design in Meaning and Reference , 1982 .

[36]  Dare A. Baldwin,et al.  Evidence for ‘motionese’: modifications in mothers’ infant-directed action , 2002 .

[37]  S. Levinson On the Human "Interaction Engine" , 2020, Roots of Human Sociality.

[38]  Giovanni Pezzulo,et al.  The intentional stance as structure learning: a computational perspective on mindreading , 2015, Biological Cybernetics.

[39]  Karl J. Friston,et al.  Action and behavior: a free-energy formulation , 2010, Biological Cybernetics.

[40]  P. Kuhl,et al.  Cross-language analysis of phonetic units in language addressed to infants. , 1997, Science.

[41]  Arnaud Revel,et al.  Emergence of structured interactions: From a theoretical model to pragmatic robotics , 2009, Neural Networks.

[42]  Giovanni Pezzulo,et al.  Sensorimotor Coarticulation in the Execution and Recognition of Intentional Actions , 2017, Front. Psychol..

[43]  M. Pickering,et al.  Why is conversation so easy? , 2004, Trends in Cognitive Sciences.

[44]  Daniel A. Braun,et al.  Signaling equilibria in sensorimotor interactions , 2015, Cognition.

[45]  Giovanni Pezzulo,et al.  Intentional strategies that make co-actors more predictable: the case of signaling. , 2013 .

[46]  M. Pickering,et al.  An integrated theory of language production and comprehension. , 2013, The Behavioral and brain sciences.

[47]  Chrystopher L. Nehaniv,et al.  Emergent dynamics of turn-taking interaction in drumming games with a humanoid robot , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[48]  J. F. Soechting,et al.  Coarticulation in Fluent Fingerspelling , 2003, The Journal of Neuroscience.

[49]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[50]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[51]  Rajesh P. N. Rao,et al.  Bayesian brain : probabilistic approaches to neural coding , 2006 .

[52]  Ivan Toni,et al.  Neural Correlates of Intentional Communication , 2010, Front. Neurosci..

[53]  Petra Wagner,et al.  Pitch and duration as a basis for entrainment of overlapped speech onsets , 2013, INTERSPEECH.

[54]  R. Johansson,et al.  Prediction Precedes Control in Motor Learning , 2003, Current Biology.

[55]  B. Repp,et al.  Pianists duet better when they play with themselves: On the possible role of action simulation in synchronization , 2007, Consciousness and Cognition.

[56]  M. Pickering,et al.  Do people use language production to make predictions during comprehension? , 2007, Trends in Cognitive Sciences.

[57]  Wolfgang Maass,et al.  Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons , 2011, PLoS Comput. Biol..

[58]  G. Pezzulo,et al.  Human Sensorimotor Communication: A Theory of Signaling in Online Social Interactions , 2013, PloS one.

[59]  P. Kay,et al.  Universals and cultural variation in turn-taking in conversation , 2009, Proceedings of the National Academy of Sciences.

[60]  K. Doya,et al.  A unifying computational framework for motor control and social interaction. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[61]  A. Glenberg,et al.  Action-based language: A theory of language acquisition, comprehension, and production , 2012, Cortex.

[62]  L. Fadiga,et al.  Active perception: sensorimotor circuits as a cortical basis for language , 2010, Nature Reviews Neuroscience.

[63]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[64]  Karl J. Friston,et al.  Predictive coding: an account of the mirror neuron system , 2007, Cognitive Processing.

[65]  Lilla Magyari,et al.  Prediction of Turn-Ends Based on Anticipation of Upcoming Words , 2012, Front. Psychology.

[66]  Martin J. Pickering,et al.  The use of content and timing to predict turn transitions , 2015, Front. Psychol..

[67]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[68]  M. Heldner Detection thresholds for gaps, overlaps, and no-gap-no-overlaps. , 2011, The Journal of the Acoustical Society of America.

[69]  Thomas L. Griffiths,et al.  One and Done? Optimal Decisions From Very Few Samples , 2014, Cogn. Sci..

[70]  S. Levinson,et al.  Brain Mechanisms Underlying Human Communication , 2009, Front. Hum. Neurosci..

[71]  P. Lieberman Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech , 1963 .

[72]  P. Berkes,et al.  Statistically Optimal Perception and Learning: from Behavior to Neural Representations , 2022 .

[73]  Giovanni Pezzulo,et al.  Shared Representations as Coordination Tools for Interaction , 2011 .

[74]  Natalie Sebanz,et al.  Prediction in Joint Action: What, When, and Where , 2009, Top. Cogn. Sci..

[75]  G. Pezzulo Tracing the Roots of Cognition in Predictive Processing , 2017 .

[76]  Giovanni Pezzulo,et al.  Studying mirror mechanisms within generative and predictive architectures for joint action , 2013, Cortex.

[77]  S. Levinson Turn-taking in Human Communication – Origins and Implications for Language Processing , 2016, Trends in Cognitive Sciences.

[78]  Kristinn R. Thórisson,et al.  Learning Smooth, Human-Like Turntaking in Realtime Dialogue , 2008, IVA.

[79]  Giovanni Pezzulo,et al.  The “Interaction Engine”: A Common Pragmatic Competence Across Linguistic and Nonlinguistic Interactions , 2012, IEEE Transactions on Autonomous Mental Development.

[80]  Moritz M. Daum,et al.  The use of intonation for turn anticipation in observed conversations without visual signals as source of information , 2015, Front. Psychol..

[81]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[82]  Karl J. Friston,et al.  A Duet for one , 2015, Consciousness and Cognition.

[83]  Karl J. Friston,et al.  Active inference, communication and hermeneutics , 2015, Cortex.

[84]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[85]  Yiannis Demiris,et al.  Towards Active Event Recognition , 2013, IJCAI.

[86]  M. Candidi,et al.  Kinematics fingerprints of leader and follower role-taking during cooperative joint actions , 2013, Experimental Brain Research.

[87]  G. Dell,et al.  Effect of Ambiguity and Lexical Availability on Syntactic and Lexical Production , 2000, Cognitive Psychology.

[88]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[89]  Karl J. Friston,et al.  Neuroscience and Biobehavioral Reviews , 2022 .

[90]  Giovanni Pezzulo,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Motor Simulation via Coupled Internal Models Using Sequential Monte Carlo , 2022 .

[91]  G. Csibra,et al.  Natural pedagogy , 2009, Trends in Cognitive Sciences.

[92]  Adam N. Sanborn,et al.  Bayesian Brains without Probabilities , 2016, Trends in Cognitive Sciences.

[93]  Jill Gilkerson,et al.  A Social Feedback Loop for Speech Development and Its Reduction in Autism , 2014, Psychological science.

[94]  Mattias Heldner,et al.  Pauses, gaps and overlaps in conversations , 2010, J. Phonetics.

[95]  Giovanni Pezzulo,et al.  Nonparametric Problem-Space Clustering: Learning Efficient Codes for Cognitive Control Tasks , 2016, Entropy.

[96]  Michael J. Richardson,et al.  Strategic communication and behavioral coupling in asymmetric joint action , 2014, Experimental Brain Research.

[97]  L. Craighero,et al.  Leadership in Orchestra Emerges from the Causal Relationships of Movement Kinematics , 2012, PloS one.

[98]  J. Kalaska,et al.  Neural mechanisms for interacting with a world full of action choices. , 2010, Annual review of neuroscience.

[99]  Azwirman Gusrialdi,et al.  Modeling inter-human movement coordination: synchronization governs joint task dynamics , 2012, Biological Cybernetics.

[100]  Martin V. Butz,et al.  Toward a Unified Sub-symbolic Computational Theory of Cognition , 2016, Front. Psychol..

[101]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[102]  Yiannis Demiris,et al.  Echo State Gaussian Process , 2011, IEEE Transactions on Neural Networks.

[103]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[104]  Konrad Paul Kording,et al.  Review TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Bayesian decision theory in sensorimotor control , 2022 .

[105]  Mattias Heldner,et al.  Very Short Utterances and Timing in Turn-Taking , 2011, INTERSPEECH.

[106]  G. Pezzulo,et al.  Navigating the Affordance Landscape: Feedback Control as a Process Model of Behavior and Cognition , 2016, Trends in Cognitive Sciences.

[107]  Richard B Ivry,et al.  Temporal Control and Coordination: The Multiple Timer Model , 2002, Brain and Cognition.

[108]  József Fiser,et al.  Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment , 2011, Science.

[109]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[110]  G. Pezzulo,et al.  Interactional leader–follower sensorimotor communication strategies during repetitive joint actions , 2015, Journal of The Royal Society Interface.

[111]  Karl J. Friston,et al.  Action understanding and active inference , 2011, Biological Cybernetics.

[112]  S. Feldstein,et al.  Rhythms of dialogue in infancy: coordinated timing in development. , 2001, Monographs of the Society for Research in Child Development.