Machine learning via multimodal signal processing

This paper proposes a methodology for recognition of vocal music (Byzantine music) via multi-modal signals processing. A sequence of multi-modal signals is captured from the expert's (teacher) and student's hymns performances, respectively. The machine learning system is trained using the values of particular features which are extracted from the captured multi-modal signals. After the system is being trained then it becomes able to recognize any hymn performance from the corpus. Training and recognition takes place in real time by utilizing machine learning techniques. The evaluation of the system was carried out with the cross - validation statistical method Jackknife, giving promising results.

[1]  Petros Maragos,et al.  Multimodal gesture recognition via multiple hypotheses rescoring , 2015, J. Mach. Learn. Res..

[2]  Norbert Schnell,et al.  Wireless sensor interface and gesture-follower for music pedagogy , 2007, NIME '07.

[3]  Marcelo M. Wanderley,et al.  Recognition, Analysis and Performance with Expressive Conducting Gestures , 2004, ICMC.

[4]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[6]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[7]  Richard F. Lyon,et al.  An analog electronic cochlea , 1988, IEEE Trans. Acoust. Speech Signal Process..

[8]  Barry Vercoe,et al.  Folk Music Classification Using Hidden Markov Models , 2001 .

[9]  D. Levey Recognition , 2018 .

[10]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Gianluca Monaci,et al.  On the modelling of multi-modal data using redundant dictionaries , 2007 .

[13]  K. Davis,et al.  Automatic Recognition of Spoken Digits , 1952 .

[14]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[15]  Harry F. Olson,et al.  Phonetic typewriter , 1957 .

[16]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[17]  Aaron F. Bobick,et al.  Realtime online adaptive gesture recognition , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).