Predominant Instrument Recognition in Polyphonic Music Using GMM-DNN Framework

In this paper, the predominant instrument recognition in polyphonic music is addressed using timbral descriptors in three frameworks-Gaussian mixture model (GMM), deep neural network (DNN), and hybrid GMM-DNN. Three sets of features, namely, mel-frequency cepstral coefficient (MFCC) features, modified group delay features (MODGDF), and lowlevel timbral features are computed, and the experiments are conducted with individual set and its early integration. Performance is systematically evaluated using IRMAS dataset. The results obtained for GMM, DNN, and GMM-DNN are 65.60%, 85.60%, and 93.20%, respectively on timbral feature fusion. Architectural choice of DNN using GMM derived features on the feature fusion paradigm showed improvement in the system performance. Thus, the proposed experiments demonstrate the potential of timbral descriptors and DNN based systems in recognizing predominant instrument in polyphonic music.

[1]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[2]  HEMA A MURTHY,et al.  Group delay functions and its applications in speech technology , 2011 .

[3]  Ferdinand Fuhrmann,et al.  Polyphonic Instrument Recognition for exploring semantic Similarities in Music , 2010 .

[4]  Hema A. Murthy,et al.  Group delay based melody monopitch extraction from music , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  George E. Dahl Deep Learning Approaches to Problems in Speech Recognition, Computational Chemistry, and Natural Language Text Processing , 2015 .

[6]  Petri Toiviainen,et al.  A Matlab Toolbox for Music Information Retrieval , 2007, GfKl.

[7]  Bozena Kostek Automatic classification of musical instrument sounds , 2000 .

[8]  Peter Li,et al.  Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks , 2015, ArXiv.

[9]  Ian Kaminskyj,et al.  Automatic source identification of monophonic musical instrument sounds , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[10]  Yi-Hsuan Yang,et al.  Sparse cepstral codes and power scale for instrument identification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Hema A. Murthy,et al.  Melodic pitch extraction from music signals using modified group delay functions , 2013, 2013 National Conference on Communications (NCC).

[12]  Ferdinand Fuhrmann Automatic musical instrument recognition from polyphonic music audio signals , 2012 .

[13]  Hynek Hermansky,et al.  Combining evidence from a generative and a discriminative model in phoneme recognition , 2008, INTERSPEECH.

[14]  Toni Heittola,et al.  Modified Group Delay Feature for Musical Instrument Recognition , 2013 .

[15]  Jae-Hun Kim,et al.  Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Jordi Janer,et al.  A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals , 2012, ISMIR.

[17]  Gaël Richard,et al.  Instrument recognition in polyphonic music based on automatic taxonomies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Patrick Susini,et al.  The Timbre Toolbox: extracting audio descriptors from musical signals. , 2011, The Journal of the Acoustical Society of America.

[19]  Hema A. Murthy,et al.  Music genre classification by fusion of Modified Group Delay and Melodic Features , 2017, 2017 Twenty-third National Conference on Communications (NCC).

[20]  Hema A. Murthy,et al.  Two-pitch tracking in co-channel speech using modified group delay functions , 2017, Speech Commun..

[21]  Yen-Chi Chen,et al.  A tutorial on kernel density estimation and recent advances , 2017, 1704.03924.

[22]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[23]  Masataka Goto,et al.  Instrument Identification in Polyphonic Music: Feature Weighting to Minimize Influence of Sound Overlaps , 2007, EURASIP J. Adv. Signal Process..

[24]  Shrikanth S. Narayanan,et al.  An Overview on Perceptually Motivated Audio Indexing and Classification , 2013, Proceedings of the IEEE.