Automatic Solfège Assessment

This paper presents a note-by-note approach for automatic solfège assessment. The proposed system uses melodic transcription techniques to extract the sung notes from the audio signal, and the sequence of melodic segments is subsequently processed by a two stage algorithm. On the first stage, an aggregation process is introduced to perform the temporal alignment between the transcribed melody and the music score (ground truth). This stage implicitly aggregates and links the best combination of the extracted melodic segments with the expected note in the ground truth. On the second stage, a statistical method is used to evaluate the accuracy of each detected sung note. The technique is implemented using a Bayesian classifier, which is trained using an audio dataset containing individual scores provided by a committee of expert listeners. These individual scores were measured at each musical note, regarding the pitch, onset, and offset accuracy. Experimental results indicate that the classification scheme is suitable to be used as an assessment tool, providing useful feedback to the student.

[1]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Emilia Gómez,et al.  Fundamental frequency alignment vs. note-based melodic similarity for singing voice assessment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[4]  Emilia Gómez,et al.  Towards Computer-Assisted Flamenco Transcription: An Experimental Comparison of Automatic Transcription Algorithms as Applied to A Cappella Singing , 2013, Computer Music Journal.

[5]  Ranjita Bhagwan,et al.  Automatically Identifying Vocal Expressions for Music Transcription , 2013, ISMIR.

[6]  Cláudio Rosito Jung,et al.  Dynamic Time Warping for Music Conducting Gestures Evaluation , 2015, IEEE Transactions on Multimedia.

[7]  Emilio Molina,et al.  Evaluation Framework for Automatic Singing Transcription , 2014, ISMIR.

[8]  J. Wapnick,et al.  Expert consensus in solo voice performance evaluation. , 1997, Journal of voice : official journal of the Voice Foundation.

[9]  P. Rao,et al.  Assessing vowel quality for singing evaluation , 2012, 2012 National Conference on Communications (NCC).

[10]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[11]  Jia-Ching Wang,et al.  Automatic singing evaluating system based on acoustic features and rhythm , 2014, 2014 International Conference on Orange Technologies.

[12]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[13]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[14]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[15]  Timo Viitaniemi,et al.  Probabilistic models for the transcription of single-voice melodies , 2003 .

[16]  Emilio Molina,et al.  SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[18]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[19]  Anssi Klapuri,et al.  Modelling of note events for singing transcription , 2004, SAPA@INTERSPEECH.

[20]  Simon Dixon,et al.  PYIN: A fundamental frequency estimator using probabilistic threshold distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).