Musical pitch estimation using a supervised single hidden layer feed-forward neural network

Musical pitch estimation is used to find musical note pitch or the fundamental frequency (F0) of audio signal which can be applied to a pre-processing part of many applications such as sound separation, musical note transcription, etc. In this work, a method for the pitch estimation based on classification framework has been designed using a supervised single hidden layer feed-forward neural network. To make this method have good performances in terms of generalization, high-speed training and small network size, two main investigations have been done. First, we find the suitable feature vector by comparing different performances of feature generation methods using extreme learning machine (ELM) framework for training the network. Second, different input-weight fine tuning methods have been compared for reducing the network size. We evaluated the method using multiple-pitch multi-instrument signals generated from datasets of real musical instrument recordings. For feature generation method, the feature vector generated from combining pitch histogram and pitch-frequency scaled spectrum shows the best performance in the experiment. For the fine tuning method, we compare ELM framework with Cuckoo search and sign-based propagation tunings. After the network size is further reduced to 40%, we found that the network trained with sign-based propagation tuning shows a better performance than that trained by ELM framework for the unseen dataset.

[1]  José Manuel Iñesta Quereda,et al.  Pattern Recognition Algorithms for Polyphonic Music Transcription , 2004, PRIS.

[2]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[3]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[5]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[6]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[7]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[8]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[9]  Francisco Fernández de Vega,et al.  Hybrid Genetic Algorithm Based on Gene Fragment Competition for Polyphonic Music Transcription , 2008, EvoWorkshops.

[10]  김용수,et al.  Extreme Learning Machine 기반 퍼지 패턴 분류기 설계 , 2015 .

[11]  Anssi Klapuri,et al.  Separation of harmonic sounds using linear models for the overtone series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[13]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[14]  Ning Ma,et al.  Exploiting correlogram structure for robust speech recognition with multiple speech sources , 2007, Speech Commun..

[15]  Bo Xu,et al.  Multi-pitch determination algorithm based on mixture laplacian distribution , 2010, 2010 International Conference on Audio, Language and Image Processing.

[16]  A. de Cheveigné Multiple F0 estimation , 2006 .

[17]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[18]  Andreas Jakobsson,et al.  Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[19]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[20]  Peng Li,et al.  Multipitch Detection Based on Weighted Summary Correlogram , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.

[21]  Tao Li,et al.  Pitch recognition based on intelligent neural network system , 2004, 2004 International Conference on Communications, Circuits and Systems (IEEE Cat. No.04EX914).

[22]  Bryan Pardo,et al.  Harmonically Informed Multi-Pitch Tracking , 2009, ISMIR.

[23]  DeLiang Wang,et al.  Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Pat Taweewat Musical visualization and F0 estimation using neural network , 2010, 2010 International Conference on Audio, Language and Image Processing.

[25]  D. Grigor'ev,et al.  Model of a neuron trained to extract periodicity , 2010 .

[26]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[27]  Xunkai Wei,et al.  Comparative Study of Extreme Learning Machine and Support Vector Machine , 2006, ISNN.

[28]  Pinar Civicioglu,et al.  A conceptual comparison of the Cuckoo-search, particle swarm optimization, differential evolution and artificial bee colony algorithms , 2013, Artificial Intelligence Review.

[29]  Pat Taweewat Feature for Musical Pitch Estimation from Simplified Auditory Model , 2010 .

[30]  B. Delgutte,et al.  Pitch of complex tones: rate-place and interspike interval representations in the auditory nerve. , 2005, Journal of neurophysiology.

[31]  Xin-She Yang,et al.  Cuckoo Search via Lévy flights , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[32]  B. Delgutte,et al.  Pitch Representations in the Auditory Nerve: Two Concurrent Complex Tones Chair, Department Committee on Graduate Students , 2022 .

[33]  J. Iñesta,et al.  Polyphonic music transcription through dynamic networks and spectral pattern identification ∗ , 2003 .

[34]  Franz Pernkopf,et al.  Gain-robust multi-pitch tracking using sparse nonnegative matrix factorization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Ronald A. Cole,et al.  Pitch detection with a neural-net classifier , 1991, IEEE Trans. Signal Process..

[36]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Saeed Tavakoli,et al.  Improved Cuckoo Search Algorithm for Feed forward Neural Network Training , 2011 .