Cognitive interaction with virtual assistants: From philosophical foundations to illustrative examples in aeronautics

Abstract Why do we perceive virtual assistants as something radically new? Our hypothesis is that today virtual assistants are raising an expectation for natural interaction with the human. Human interaction is by nature cognitive and collaborative. Human sciences help to flesh the ingredients of such cognitive interaction. Uttering a sentence is at the same time: producing sound; making a well-formed sentence; giving meaning; enriching a common background of beliefs and intentions; making something together. In this paper, we remind the basics of human cognitive communication as developed by human sciences, particularly philosophy of mind. We propose a definition of this way of interacting with computer as ‘cognitive interaction’, and we summarize the main characteristics of this interaction mode into a layered model. Finally we develop case studies to illustrate concretely the concepts. We analyze in light of our theoretical model three approaches of conversational systems in AI, to illustrate the different available options to implement the pragmatic dimension of cognitive interaction. We analyze first the seminal approach of Grosz and Sidner [20], and then we describe how the now classical approach of discourse structure developed by Asher and Lascarides [5] could capture the pragmatic dimension of interaction with an intelligent virtual assistant. Finally, we wonder whether a state-of-the-art chat bot framework actually implements the needed level of cognitive interaction. The contribution of this paper is: to remind and summarize essential ideas from other disciplines which are relevant to understand what should be the interaction with virtual assistants should be; to explain why the cooperation with virtual assistants is something special; to delineate the challenges we have to solve if we are to develop truly collaborative virtual assistants.

[1]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  D. Sperber,et al.  Meaning and Relevance , 2012 .

[3]  Eric Horvitz,et al.  The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users , 1998, UAI.

[4]  J. Searle Intentionality: An Essay in the Philosophy of Mind , 1983 .

[5]  William J. Clancey,et al.  Shared Awareness, Autonomy and Trust in Human-Robot Teamwork , 2014, AAAI Fall Symposia.

[6]  Zhou Yu,et al.  Incremental Coordination: Attention-Centric Speech Production in a Physically Situated Conversational Agent , 2015, SIGDIAL Conference.

[7]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[8]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[9]  Yoav Shoham,et al.  Logical Theories of Intention and the Database Perspective , 2009, J. Philos. Log..

[10]  Dan Jurafsky,et al.  Pragmatics and Computational Linguistics , 2008 .

[11]  H. Grice Logic and conversation , 1975 .

[12]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[13]  Rachid Alami,et al.  A management of mutual belief for human-robot interaction , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[14]  Pascal Denis,et al.  Une approche sémantique et rhétorique du dialogue. Un cas d'étude: l'explication d'un itinéraire , 2002 .

[15]  M. Tomasello Origins of human communication , 2008 .

[16]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[17]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[18]  Alex Lascarides,et al.  Agreement, Disputes and Commitments in Dialogue , 2009, J. Semant..

[19]  James F. Allen,et al.  A Cognitive Model for Collaborative Agents , 2011, AAAI Fall Symposium: Advances in Cognitive Systems.

[20]  Craige Roberts,et al.  Information Structure: Towards an integrated formal theory of pragmatics , 2012 .

[21]  Alois Knoll,et al.  Human-Robot dialogue for joint construction tasks , 2006, ICMI '06.

[22]  Michael E. Bratman Précis of Shared agency: a planning theory of acting together , 2013 .

[23]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  Craige Roberts,et al.  Context in Dynamic Interpretation , 2008 .

[26]  J. Searle Mind: A Brief Introduction , 2004 .

[27]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[28]  Matt Moss,et al.  Speech Acts , 2018, Oxford Scholarship Online.

[29]  Nima Mesgarani,et al.  Speaker-Independent Speech Separation With Deep Attractor Network , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[31]  François Recanati,et al.  Truth-Conditional Pragmatics , 2011 .

[32]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[33]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[34]  F. Récanati,et al.  Philosophie du langage (et de l'esprit) , 2008 .

[35]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[36]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Jonathon Shlens,et al.  A Tutorial on Independent Component Analysis , 2014, ArXiv.

[38]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[39]  Denys Bernard,et al.  Cognitive interaction: Towards "cognitivity" requirements for the design of virtual assistants , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[40]  Bilge Mutlu,et al.  Cognitive Human-Robot Interaction , 2016, Springer Handbook of Robotics, 2nd Ed..

[41]  Nicholas Asher,et al.  La SDRT: une approche de la cohérence du discours dans la tradition de la sémantique dynamique , 2001 .

[42]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[43]  Navdeep Jaitly,et al.  Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.