Automatic Evaluation of Karaoke Singing Based on Pitch, Volume, and Rhythm Features

This study aims to develop an automatic singing evaluation system for Karaoke performances. Many Karaoke systems in the market today come with a scoring function. The addition of the feature enhances the entertainment appeal of the system due to the competitive nature of humans. The automatic Karaoke scoring mechanism to date, however, is still rudimentary, often giving inconsistent results with scoring by human raters. A cause of blunder arises from the fact that often only the singing volume is used as the evaluation criteria. To improve on the singing evaluation capabilities on Karaoke machines, this study exploits various acoustic features, including pitch, volume, and rhythm to assess a singing performance. We invited a number of singers having different levels of singing capabilities to record for Karaoke solo vocal samples. The performances were rated independently by four musicians, and then used in conjunction with additional Karaoke Video Compact Disk music for the training of our proposed system. Our experiment shows that the results of automatic singing evaluation are close to the human rating, where the Pearson product-moment correlation coefficient between them is 0.82.

[1]  Hsin-Min Wang,et al.  A Query-by-Singing System for Retrieving Karaoke Music , 2008, IEEE Transactions on Multimedia.

[2]  Masataka Goto,et al.  MiruSinger: A Singing Skill Visualization Interface Using Real-Time Feedback and Music CD Recordings as Referential Data , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[3]  Alma Mater Studiorum Subjective evaluation of common singing skills using the rank ordering method , 2006 .

[4]  K. Omori,et al.  Singing power ratio: quantitative evaluation of singing voice quality. , 1996, Journal of voice : official journal of the Voice Foundation.

[5]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[6]  B. Galler,et al.  Predicting musical pitch from component frequency ratios , 1979 .

[7]  Masataka Goto,et al.  An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features , 2006, INTERSPEECH.

[8]  W. S. Brown,et al.  Perceptual and acoustic study of professionally trained versus untrained voices. , 2000, Journal of voice : official journal of the Voice Foundation.

[9]  John R. Lindsay Smith,et al.  Learning to Pronounce Vowel Sounds in a Foreign Language using Acoustic Measurements of the Vocal Tract as Feedback in Real Time , 1998 .

[10]  Peter Desain,et al.  Development of real-time visual feedback assistance in singing training: a review , 2006, J. Comput. Assist. Learn..

[11]  郑军 Method and system for karaoke scoring , 2009 .

[12]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[13]  Graham F. Welch,et al.  Real-time Visual Feedback in the Development of Vocal Pitch Accuracy in Singing , 1989 .

[14]  Helmer Strik,et al.  The Pedagogy-Technology Interface in Computer Assisted Pronunciation Training , 2002 .

[15]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[16]  田中孝浩 Karaoke scoring apparatus analyzing singing voice relative to melody data , 1997 .

[17]  Yukari Hirata,et al.  Computer Assisted Pronunciation Training for Native English Speakers Learning Japanese Pitch and Durational Contrasts , 2004 .

[18]  朴璟洙 Performance evaluation method for use in karaoke apparatus , 1996 .

[19]  Jordi Bonada,et al.  PERFORMANCE ANALYSIS AND SCORING OF THE SINGING VOICE , 2009 .

[20]  Partha Lal A comparison of singing evaluation algorithms , 2006, INTERSPEECH.

[21]  J. Estis,et al.  The singing power ratio as an objective measure of singing voice quality in untrained talented and nontalented singers. , 2006, Journal of voice : official journal of the Voice Foundation.