ROILA : RObot Interaction LAnguage

The number of robots in our society is increasing rapidly. The number of service robots that interact with everyday people already outnumbers industrial robots. The easiest way to communicate with these service robots, such as Roomba or Nao, would be natural speech. However, the limitations prevailing in current speech recognition technology for natural language is a major obstacle behind the unanimous acceptance of speech interaction for robots. Current speech recognition technology is at times not good enough for it to be deployed in natural environments, where the ambience influences its performance. Moreover, state-of-art automatic speech recognition has not advanced far enough for most applications, partly due to the inherent properties of natural languages that make them difficult for a machine to recognize. Examples are ambiguity in context and homophones (words that sound the same but have different meanings). As a consequence of the prior discussed problems at times miscommunication occurs between the user and robot. The mismatch between humans’ expectations and the abilities of interactive robots often results in frustration for the user. Palm Inc. faced a similar problem with hand writing recognition for their handheld computers. They invented Graffiti, an artificial alphabet, that was easy to learn and easy for the computer to recognize. Our Robot Interaction Language (ROILA) takes a similar approach by offering a speech recognition friendly artificial language that is easy to learn for humans and easy to understand for robots with an ultimate goal of outperforming natural language in terms of speech recognition accuracy. There exist numerous artificial languages, Esperanto for example; but to the best of our knowledge these artificial languages were not designed to optimize human machine/robot interaction but rather to improve human-human communication. The design of ROILA was an iterative process having iterations within each step. It started off with a linguistic overview of a pre-selection of existing artificial languages across the dimensions of morphology (grammar) and phonology (the sounds of the language). The artificial languages were also analyzed in comparison to natural languages. The overview resulted in a number of linguistic trends that we would carefully incorporate in the design of ROILA with the claim that whatever linguistic features are common amongst these exist- ing languages would be easier to learn if they are made part of ROILA. The actual construction of the ROILA language began with the composition of its vocabulary. A genetic algorithm was implemented which generated the best fit vocabulary. In principle, the words of this vocabulary would have the least likelihood of being confused with each other and therefore be easy to recognize for the speech recognizer. Experimental evaluations were conducted on the vocabulary to determine its recognition accuracy. The results of these experiments were used to refine the vocabulary. The third phase of the design was the design of the grammar. Using the questions, options, and criteria (QOC) technique, rational decisions were made regarding the selection of grammatical markings. Recognition accuracy and ease of human learnability were two important criteria. In the end we drafted a simple grammar that did not have irregularities or exceptions in its rules and markings were represented by adding isolated words rather than inflecting existing words of a sentence. As a conclusion to the design phase and also as a proof of concept we designed an initial prototype of ROILA by using the LEGO Mindstorms NXT platform. ROILA was demonstrated in use to instruct a LEGO robot to navigate in its environment, analogous to the principles of the turtle robot. As a final evaluation of ROILA we conducted a large scale experiment of the language. ROILA was exposed to Dutch high school students who spent three weeks learning and practicing the language. A ROILA curriculum was carefully designed for the students to aid them in their learning both in school and at home. In-school learning was more interactive and hands on as the students tested their ROILA skills by speaking to and playing with LEGO robots. At the end of the curriculum the students attempted a ROILA proficiency test and if successful they were invited to play a complete game with a LEGO robot. Throughout the whole learning process, subjective and objective experiences of the students was measured to determine if indeed ROILA was easy to learn for the students and easy to recognize for the machine. Our results indicate that ROILA was deemed to have a better recognition accuracy than English and that it was preferred more by the students in comparison to English as their language of choice while interacting with LEGO Mindstorms robots.

[1]  Arika Okrent,et al.  In the Land of Invented Languages: A Celebration of Linguistic Creativity, Madness, and Genius , 2010 .

[2]  Guillaume Belrose,et al.  Computer Pidgin Language: A new language to talk to your computer? , 2001 .

[3]  Christopher R. Houck,et al.  A Genetic Algorithm for Function Optimization: A Matlab Implementation , 2001 .

[4]  C. K. Ogden,et al.  Basic English : a general introduction with rules and grammar , 1930 .

[5]  Gary Lupyan,et al.  Case, Word Order, and Language Learnability: Insights from Connectionist Modeling , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[6]  Robert S.P. Beekes,et al.  Comparative Indo-European Linguistics: An introduction , 1995 .

[7]  Pavel Slavík,et al.  Non-speech input and speech recognition for real-time control of computer games , 2006, Assets '06.

[8]  Emiel Krahmer,et al.  Using child-robot interaction to investigate the user acceptance of constrained and artificial languages , 2010, 19th International Symposium in Robot and Human Interactive Communication.

[9]  Thomas W. Malone,et al.  Toward a Theory of Intrinsically Motivating Instruction , 1981, Cogn. Sci..

[10]  Wenli Zhou,et al.  A Comparison between HTK and SPHINX on Chinese Mandarin , 2009, 2009 International Joint Conference on Artificial Intelligence.

[11]  Barbara F. Grimes Ethnologue Languages of the World , 1988 .

[12]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[13]  Peter Ladefoged,et al.  Vowels and Consonants , 2000, Manchu Grammar.

[14]  António J. S. Teixeira,et al.  Human-robot interaction through spoken language dialogue , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[15]  Robert J. Sternberg,et al.  A Theory‐Based Approach to the Measurement of Foreign Language Learning Ability: The Canal‐F Theory and Test , 2000 .

[16]  Alistair D. N. Edwards,et al.  Improving the usability of speech-based interfaces for blind users , 1996, Assets '96.

[17]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[18]  D. Kort,et al.  D3.3 : Game Experience Questionnaire:development of a self-report measure to assess the psychological impact of digital games , 2007 .

[19]  John D. Gould,et al.  Why reading was slower from CRT displays than from paper , 1986, CHI '87.

[20]  Hynek Hermansky,et al.  On Confusions in a Phoneme Recognizer , 2007 .

[21]  Alon Efrat,et al.  Advances in phonetic word spotting , 2001, CIKM '01.

[22]  N. Lazzaro Why we Play Games: Four Keys to More Emotion without Story , 2004 .

[23]  Thomas P. Moran,et al.  Questions, Options, and Criteria: Elements of Design Space Analysis , 1991, Hum. Comput. Interact..

[24]  Elaine Toms,et al.  The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives , 2006, CHI.

[25]  Ben Shneiderman,et al.  Getting real about speech: overdue or overhyped? , 2002, CHI Extended Abstracts.

[26]  Christoph Bartneck,et al.  Designing an Artificial Robotic Interaction Language , 2009, INTERACT.

[27]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[28]  Humphrey Tonkin,et al.  Esperanto: Language, Literature, and Community , 1992 .

[29]  Christoph Bartneck,et al.  What you say is not what you get: arguing for artificial languages instead of natural languages in human robot speech interaction , 2009 .

[30]  B S Atal,et al.  Speech technology in 2001: new research directions. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[32]  Alexander H. Waibel,et al.  Conversational speech systems for on-board car navigation and assistance , 1998, ICSLP.

[33]  J Makhoul,et al.  State of the art in continuous speech recognition. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Jong Kyoung Kim,et al.  Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[35]  Alexander I. Rudnicky,et al.  Universal speech interfaces , 2001, INTR.

[36]  Y. Poortinga,et al.  Emotion without a word: shame and guilt among Rarámuri Indians and rural Javanese. , 2006, Journal of personality and social psychology.

[37]  Emiel Krahmer,et al.  Who is more expressive during child-robot interaction: Pakistani or dutch children? , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[38]  Vladimir A. Kulyukin,et al.  On natural language dialogue with assistive robots , 2006, HRI '06.

[39]  Ebru Arisoy,et al.  A Universal Human Machine Speech Interaction Language for Robust Speech Recognition Applications , 2004, TSD.

[40]  Mike Wald,et al.  Correcting automatic speech recognition captioning errors in real time , 2007, Int. J. Speech Technol..

[41]  David V. Tiedeman,et al.  Modern Language Aptitude Test , 1960 .

[42]  Günther Görz,et al.  Towards understanding spontaneous speech: word accuracy vs. concept accuracy , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[43]  P. Ladefoged,et al.  The sounds of the world's languages , 1996 .

[44]  J. Rubin WHAT THE GOOD LANGUAGE LEARNER CAN TEACH US , 1975 .

[45]  Trung Bui,et al.  Multimodal Dialogue Management - State of the art , 2006 .

[46]  Alexander H. Waibel,et al.  Smart Sight: a tourist assistant system , 1999, Digest of Papers. Third International Symposium on Wearable Computers.

[47]  Louis Boves,et al.  Syllable-Length Acoustic Units in Large-Vocabulary Continuous Speech Recognition , 2005 .

[48]  Jp Tom Djajadiningrat,et al.  Design and semantics of form and movement:DeSForM 2013, 8th International Conference on Design and Semantics of Form and Movement, September 22-25, 2013, Wuxi, China , 2013 .

[49]  Constantine D. Spyropoulos,et al.  HUMAN-ROBOT INTERACTION BASED ON SPOKEN NATURAL LANGUAGE DIALOGUE , 2001 .

[50]  Ben Shneiderman,et al.  The limits of speech recognition , 2000, CACM.

[51]  van Kg Koen Turnhout Socially aware conversational agents , 2007 .

[52]  John Fry,et al.  Natural dialogue with the Jijo-2 office robot , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[53]  H H Koester,et al.  User Performance With Speech Recognition: A Literature Review , 2001, Assistive technology : the official journal of RESNA.

[54]  Chen Liu,et al.  Training Acoustic Models with Speech Data from Different Languages , 2005 .

[55]  K. Á. T.,et al.  Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI) , 2000, Natural Language Engineering.

[56]  Fang Chen,et al.  Designing Human Interface in Speech Technology , 2005 .

[57]  Ian Maddieson,et al.  Patterns of sounds , 1986 .

[58]  Clive Souter,et al.  Dialogue Management Systems: a Survey and Overview , 1997 .

[59]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[60]  Takayuki Kanda,et al.  Does the Design of a Robot Influence Its Animacy and Perceived Intelligence? , 2009, Int. J. Soc. Robotics.

[61]  Omar Mubin,et al.  Exploring multimodal robotic interaction through storytelling for Aphasics , 2008, BCS HCI.

[62]  Cynthia J. Solomon,et al.  A case study of a young child doing turtle graphics in LOGO , 1976, AFIPS '76.

[63]  David Crystal,et al.  The Cambridge Encyclopedia of Language , 2012, Modern Language Review.

[64]  Christoph Bartneck,et al.  Using word spotting to evaluate roila: a speech recognition friendly artificial language , 2010, CHI EA '10.

[65]  Manfred Tscheligi,et al.  CHI '04 Extended Abstracts on Human Factors in Computing Systems , 2004, CHI 2004.

[66]  Steve J. Young,et al.  Bootstrapping language models for dialogue systems , 2006, INTERSPEECH.

[67]  Yoshinori Kuno,et al.  Human-robot speech interface understanding inexplicit utterances using vision , 2004, CHI EA '04.

[68]  Kirsten Malmkjaer,et al.  The Linguistics Encyclopedia , 2002 .

[69]  Yacine Bellik,et al.  Multimodal interfaces: new solutions to the problem of computer accessibilty for the blind , 1994, CHI '94.

[70]  Emiel Krahmer,et al.  Child-robot interaction during collaborative game play: effects of age and gender on emotion and experience , 2010, OZCHI '10.

[71]  Tom Johnstone,et al.  Emotional speech elicited using computer games , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[72]  Ronald Rosenfeld,et al.  Speech Graffiti vs. Natural Language: Assessing the User Experience , 2004, HLT-NAACL.

[73]  Suleman Shahid,et al.  aMAZEd: designing an affective social game for children , 2007, IDC.

[74]  Christoph Bartneck,et al.  Towards the Design and Evaluation of ROILA: A Speech Recognition Friendly Artificial Language , 2010, IceTAL.

[75]  Emiel Krahmer,et al.  Alone or Together: Exploring the Effect of Physical Co-presence on the Emotional Expressions of Game Playing Children Across Cultures , 2008, Fun and Games.

[76]  Xue Yan,et al.  iCat: an animated user-interface robot with personality , 2005, AAMAS '05.

[77]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[78]  E. Barakova,et al.  Social training of autistic children with interactive intelligent agents. , 2009, Journal of integrative neuroscience.

[79]  John Zimmerman,et al.  Research through design as a method for interaction design research in HCI , 2007, CHI.

[80]  James Floyd Kelly,et al.  LEGO® MINDSTORMS® NXT , 2011 .

[81]  Manfred K. Warmuth,et al.  THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .