Incompleteness and Fragmentation: Possible Formal Cues to Cognitive Processes Behind Spoken Utterances

What may eventually connect engineers and linguists most is their common interest in language, more specifically language technology: engineers build more and more intelligent robots desirably communicating with humans through language. Linguists wish to verify their theoretical understanding of language and speech through practical implementations. Robotics is then a place for the two to meet. However, speech, especially within spontaneous communication seems to often withstand usual generalizations: the sounds you hear are not the sounds you describe in a laboratory, the words you read in a written text may be hard to identify by speech segmentation, the sequences of words that make up a sentence are often too fragmented to be considered a “real” sentence from a grammar book. Yet, humans communicate, and this is most often, successful. Typically this is achieved through cognition, where people not only use words, these are used in context. People also use words in semantic context, by combining voices and gestures , in a dynamically changing, multimodal situational context. Each individual does not simply pick out words from the flow of a verbal interaction, but also observes and reacts to other, using multimodal cues as a point of reference and inference making navigation in communication. It is reasonable to believe that participants in a multimodal communication event follow a set of general, partly innate rules based on a general model of communication. The model presented below interperate numerous forms of dialogue by uncovering their syntax , prosody and overall multimodality within the HuComTech corpus of Hungarian. The research aims at improving the robustness of the spoken form of natural language technology.

[1]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[2]  G. Rizzolatti,et al.  Action recognition in the premotor cortex. , 1996, Brain : a journal of neurology.

[3]  Piet Mertens,et al.  The Prosogram: Semi-Automatic Transcription of Prosody Based on a Tonal Perception Model , 2004 .

[4]  Roel M. Willems,et al.  Seeing and Hearing Meaning: ERP and fMRI Evidence of Word versus Picture Integration into a Sentence Context , 2008, Journal of Cognitive Neuroscience.

[5]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[6]  Roel M. Willems,et al.  Hand preference influences neural correlates of action observation , 2009, Brain Research.

[7]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[8]  C. Ronald Kube,et al.  Task Modelling in Collective Robotics , 1997, Auton. Robots.

[9]  Roel M. Willems,et al.  Neural evidence for the interplay between language, gesture, and action: A review , 2007, Brain and Language.

[10]  I. Szekrenyes,et al.  Annotation of spoken syntax in relation to prosody and multimodal pragmatics , 2012, 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom).

[11]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[12]  Ágnes Abuczki A multimodal analysis of the sequential organization of verbal and nonverbal interaction , 2011 .

[13]  István Szekrényes Annotation and interpretation of prosodic data in the HuComTech corpus for multimodal user interfaces , 2013, Journal on Multimodal User Interfaces.

[14]  M. Arbib,et al.  Language within our grasp , 1998, Trends in Neurosciences.