Human-Robot Interaction with Smart Shopping Trolley Using Sign Language: Data Collection

The paper presents a concept of a smart robotic trolley for supermarkets with a multimodal user interface, including sign language and acoustic speech recognition, and equipped with a touchscreen. Considerable progress in hand gesture recognition and automatic speech recognition within the last years has brought to life many human-computer interaction systems. At the moment the level of voiced speech and isolated/static hand gesture automatic recognition quality is quite high. However, continuous or dynamic sign language recognition still remains an unresolved challenge. There exists no automatic recognition system for Russian sign language nowadays. There are also no relevant data for model training. In the present research, we try to fill in this gap for the Russian sign language. We present a Kinect 2.0 based software-hardware complex for collection of multimodal sign language databases with an optical video camera, infrared camera and depth sensor. We describe the architecture of the developed software as well as some details of the collected database. The collected corpus is meant for further development of a Russian sign language recognition system, which will be embedded into a robotic trolley for supermarkets with gestural and speech interfaces. The architecture of the developed system is also presented in the paper.

[1]  Helman Stern,et al.  Sensors for Gesture Recognition Systems , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Carol Neidle,et al.  A new web interface to facilitate access to corpora: development of the ASLLRP data access interface , 2012 .

[3]  Marek Hrúz,et al.  Sign Language Numeral Gestures Recognition Using Convolutional Neural Network , 2018, ICR.

[4]  Francisco José Madrid-Cuevas,et al.  Depth silhouettes for gesture recognition , 2008, Pattern Recognit. Lett..

[5]  Onno Crasborn,et al.  The Corpus NGT: An online corpus for professionals and laymen , 2008 .

[6]  Hafiz Imtiaz,et al.  A template matching approach of one-shot-learning gesture recognition , 2013, Pattern Recognit. Lett..

[7]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[8]  Prahlad Vadakkepat,et al.  HAND POSTURE AND FACE RECOGNITION USING A FUZZY-ROUGH APPROACH , 2010 .

[9]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[10]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[11]  Mauro Donadeo,et al.  Combining multiple depth-based descriptors for hand gesture recognition , 2014, Pattern Recognit. Lett..

[12]  Wolfgang Minker,et al.  Multimodal speech recognition: increasing accuracy using high speed video data , 2018, Journal on Multimodal User Interfaces.

[13]  Yale Song,et al.  Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database , 2011, Face and Gesture 2011.

[14]  Zaid Omar,et al.  A review of hand gesture and sign language recognition techniques , 2017, International Journal of Machine Learning and Cybernetics.

[15]  Ying Wu,et al.  Human hand modeling, analysis and animation in the context of HCI , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[16]  Martin Saerbeck,et al.  Recent methods and databases in vision-based hand gesture recognition: A review , 2015, Comput. Vis. Image Underst..

[17]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[18]  Alexander L. Ronzhin,et al.  HAVRUS Corpus: High-Speed Recordings of Audio-Visual Russian Speech , 2016, SPECOM.

[19]  Joachim Hornegger,et al.  Gesture recognition with a Time-Of-Flight camera , 2008, Int. J. Intell. Syst. Technol. Appl..

[20]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Dmitry Ryumin,et al.  Towards Automatic Recognition of Sign Language Gestures Using Kinect 2.0 , 2017, HCI.

[22]  Gang Hua,et al.  Dynamic hand gesture recognition: An exemplar-based approach from motion divergence fields , 2012, Image Vis. Comput..

[23]  Hermann Ney,et al.  Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers , 2015, Comput. Vis. Image Underst..

[24]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[25]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[26]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[27]  Moritz Knorr,et al.  The significance of facial features for automatic sign language recognition , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.