论文信息 - Finding the optimum training solution for Byzantine music recognition

Finding the optimum training solution for Byzantine music recognition — A Max/Msp approach

This paper presents a methodology implementation of a turn-key training system for music (singing voice). The implementation took place with Max/Msp development software. Initially we create a small corpus of anthems for testing purposes by recording four(4) small hymns. Each hymn is been performed three(3) times from the same chanter. The reason of that repetition answers the purpose of finding the best performance(s) - therefore called optimum solution(s) - of the hymns. The process starts by extracting time series vectors from recorded wav files. After the extraction a statistical method for cross validation, called Jackknife, is applied in order to find the optimum solution(s) which will be used to train the system. Once the optimum solution(s) is(are) found, the system is ready to be trained. The training and recognition procedures takes place in real time via intelligent techniques which are a combination of Hidden Markov Models(HMM) and Dynamic Time Wrapping(DTW) algorithms. The evaluation of the system takes place simultaneously with Jackknife highlight and the optimum training solution is been highlighted at the same time. Precision and Recall metrics, are being estimated in order to validate the use of correct singing performance.

[1] T. K. Vintsyuk. Speech discrimination by dynamic programming , 1968 .

[2] Harry F. Olson,et al. Phonetic typewriter , 1957 .

[3] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[4] Norbert Schnell,et al. Wireless sensor interface and gesture-follower for music pedagogy , 2007, NIME '07.

[5] Barry Vercoe,et al. Folk Music Classification Using Hidden Markov Models , 2001 .

[6] C. D. Forgie,et al. Automatic Recognition of Spoken Digits , 1958 .

[7] K. Davis,et al. Automatic Recognition of Spoken Digits , 1952 .

[8] Richard F. Lyon,et al. An analog electronic cochlea , 1988, IEEE Trans. Acoust. Speech Signal Process..

[9] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[10] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[11] Mikhail Malt,et al. Zsa . Descriptors : a library for real-time descriptors analysis , 2008 .

[12] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[13] Sotiris Manitsaris,et al. Finger musical gesture recognition in 3D space without any tangible instrument for performing arts , 2015, Int. J. Arts Technol..

[14] G. W. Hughes,et al. Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .