More-Natural Mimetic Words Generation for Fine-Grained Gait Description

A mimetic word is used to verbally express the manner of a phenomenon intuitively. The Japanese language is known to have a greater number of mimetic words in its vocabulary than most other languages. Especially, since human gaits are one of the most commonly represented behavior by mimetic words in the language, we consider that it should be suitable for labels of fine-grained gait recognition. In addition, Japanese mimetic words have a more decomposable structure than these in other languages such as English. So it is said that they have sound-symbolism and their phonemes are strongly related to the impressions of various phenomena. Thanks to this, native Japanese speakers can express their impressions on them briefly and intuitively using various mimetic words. Our previous work proposed a framework to convert the body-parts movements to an arbitrary mimetic word by a regression model. The framework introduced a “phonetic space” based on sound-symbolism, and it enabled fine-grained gait description using the generated mimetic words consisting of an arbitrary combination of phonemes. However, this method did not consider the “naturalness” of the description. Thus, in this paper, we propose an improved mimetic word generation module considering its naturalness, and update the description framework. Here, we define the co-occurrence frequency of phonemes composing a mimetic word as the naturalness. To investigate the co-occurrence frequency, we collected many mimetic words through a subjective experiment. As a result of evaluation experiments, we confirmed that the proposed module could describe gaits with more natural mimetic words while maintaining the description accuracy.

[1]  Shrikanth S. Narayanan,et al.  Analysis of Audio Clustering using Word Descriptions , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Shoko Hamano The sound-symbolic system of Japanese , 2000 .

[3]  Junji Watanabe,et al.  Automatic Estimation of Multidimensional Ratings from a Single Sound-Symbolic Word and Word-Based Visualization of Tactile Perceptual Space , 2017, IEEE Trans. Haptics.

[4]  Changhe Tu,et al.  Classification of gait anomalies from kinect , 2018, The Visual Computer.

[5]  W. Köhler Gestalt Psychology: An Introduction to New Concepts in Modern Psychology , 1970 .

[6]  Yoshihiko Nakamura,et al.  Linking human motions and objects to language for synthesizing action sentences , 2019, Auton. Robots.

[7]  Shigeo Morishima,et al.  Automatic depiction of onomatopoeia in animation considering physical phenomena , 2014, MIG.

[8]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shrikanth S. Narayanan,et al.  Classification of sound clips by two schemes: Using onomatopoeia and semantic labels , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[10]  Yasushi Makihara,et al.  Gait-Based Age Estimation Using a DenseNet , 2018, ACCV Workshops.

[11]  Hiroshi Murase,et al.  Toward Describing Human Gaits by Onomatopoeias , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[12]  Wataru Shimoda,et al.  A visual analysis on recognizability and discriminability of onomatopoeia words with DCNN features , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[13]  V. Ramachandran,et al.  Synaesthesia? A window into perception, thought and language , 2001 .