Learning Semantics of Movement

The basic scientific question behind this presentation is how to model computationally the interrelated processes of understanding natural language and perceiving and producing movement in real world contexts. Rather than relying on manually defined representations, we approach this problem by analyzing the statistical regularities in contexts of language use. A central aspect is that the learning of internal representations is based on large amounts of fully or mostly unlabeled training data [1, 2]. A promising methodological approach towards this is that data from different sources can be brought together through multiview/transfer/multi-task learning (see e.g. [2, 3, 4, 5]). Another promising approach is learning relations. Relations are powerful abstractions because they allow reasoning not just about objects, but about their combinations. Recent work suggests that at least limited classes of relations can be learned from data [6]. We have earlier conducted research in which text contexts have been used to learn semantic similarities (see e.g. [7, 8]. However, to reach a more human-level understanding, we have to take into account that language is fully understood only through its use in its multimodal and embodied contexts including linguistic, visual, auditory, tactile and kinestetic dimensions. In general, patterns and signals are natural representations for multimodal contexts. These differ considerably from the discrete representations of symbols and expressions in symbolic languages. For instance, images are typically represented as numerical matrices. We have earlier conducted research related to the combination of image and language data (see e.g. [9, 10]).

[1]  Ben Taskar,et al.  Movie/Script: Alignment and Parsing of Video and Text Transcription , 2008, ECCV.

[2]  F. Pollick,et al.  A motion capture library for the study of identity, gender, and emotion perception from biological motion , 2006, Behavior research methods.

[3]  Hassan Ghasemzadeh,et al.  Sport training using body sensor networks: a statistical approach to measure wrist rotation for golf swing , 2009, BODYNETS.

[4]  Samuel Kaski,et al.  Focused Multi-task Learning Using Gaussian Processes , 2011, ECML/PKDD.

[5]  David A. Forsyth,et al.  Motion synthesis from annotations , 2003, ACM Trans. Graph..

[6]  R. Wallace The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason , 1988 .

[7]  Mathias Creutz,et al.  Data analysis of conceptual similarities of Finnish verbs , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[8]  Jorma Laaksonen,et al.  Inferring semantics from textual information in multimedia retrieval , 2008, Neurocomputing.

[9]  Jorma Laaksonen,et al.  Evaluating the performance in automatic image annotation: Example case by adaptive fusion of global image features , 2007, Signal Process. Image Commun..

[10]  Timo Honkela,et al.  Contextual Relations of Words in Grimm Tales, Analyzed by Self-Organizing Map , 1995 .

[11]  Harri Valpola,et al.  Denoising Source Separation , 2005, J. Mach. Learn. Res..

[12]  S. Vereza Philosophy in the flesh: the embodied mind and its challenge to Western thought , 2001 .

[13]  Michael I. Jordan,et al.  Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.

[14]  Catherine L. Harris,et al.  The human semantic potential: Spatial language and constrained connectionism , 1997 .

[15]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[16]  Jorma Laaksonen,et al.  Analysis of Semantic Information Available in an Image Collection Augmented with Auxiliary Data , 2006, AIAI.

[17]  Tapio Takala,et al.  Detecting Emotional Content from the Motion of an Orchestra Conductor , 2005, Gesture Workshop.

[18]  Jaakko J. Väyrynen,et al.  WordICA—emergence of linguistic representations for words by independent component analysis , 2010, Natural Language Engineering.

[19]  Hui Gao,et al.  Gender Recognition from Walking Movements using Adaptive Three-Mode PCA , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[20]  Tapio Takala,et al.  Evaluating Emotional Content of Acted and Algorithmically Modified Motions , 2011, Trans. Edutainment.

[21]  Jerome A. Feldman,et al.  When push comes to shove: a computational model of the role of motor control in the acquisition of action verbs , 1997 .

[22]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[24]  Jason Weston,et al.  Label Ranking under Ambiguous Supervision for Learning Semantic Correspondences , 2010, ICML.

[25]  Antonio Camurri,et al.  Technique for automatic emotion recognition by body gesture analysis , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.