Detecting fingering of overblown flute sound using sparse feature learning

In woodwind instruments such as a flute, producing a higher-pitched tone than a standard tone by increasing the blowing pressure is called overblowing, and this allows several distinct fingerings for the same notes. This article presents a method that attempts to learn acoustic features that are more appropriate than conventional features such as mel-frequency cepstral coefficients (MFCCs) in detecting the fingering from a flute sound using unsupervised feature learning. To do so, we first extract a spectrogram from the audio and convert it to a mel scale. Then, we concatenate four consecutive mel-spectrogram frames to include short temporal information and use it as a front end for the sparse filtering algorithm. The learned feature is then max-pooled, resulting in a final feature vector for the classifier that has extra robustness. We demonstrate the advantages of the proposed method in a twofold manner: we first visualize and analyze the differences in the learned features between the tones generated by standard and overblown fingerings. We then perform a quantitative evaluation through classification tasks on six selected pitches with up to five different fingerings that include a variety of octave-related and non-octave-related fingerings. The results confirm that the learned features using the proposed method significantly outperform the conventional MFCCs and the residual noise spectrum in every experimental condition for the classification tasks.

[1]  Christian Osendorfer,et al.  Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[2]  D C Miller,et al.  THE INFLUENCE OF THE MATERIAL OF WIND-INSTRUMENTS ON THE TONE QUALITY. , 1909, Science.

[3]  Marcelo M. Wanderley,et al.  Indirect Acquisition of Fingerings of harmonic Notes on the Flute , 2007, ICMC.

[4]  Philippe Depalle,et al.  Detecting overblown flute fingerings from the residual noise spectrum. , 2010, The Journal of the Acoustical Society of America.

[5]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  Stuart Hannabuss A Dictionary for the Modern Flutist , 2010 .

[8]  Juhan Nam,et al.  Learning Sparse Feature Representations for Music Annotation and Retrieval , 2012, ISMIR.

[9]  J. Coltman,et al.  Sounding Mechanism of the Flute and Organ Pipe , 1968 .

[10]  Andrew Y. Ng,et al.  Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Kyogu Lee,et al.  Hierarchical Approach to Detect Common Mistakes of Beginner Flute Players , 2014, ISMIR.

[12]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[13]  K. Berger Some Factors in the Recognition of Timbre , 1964 .

[14]  Susan J. Maclagan A Dictionary for the Modern Flutist , 2009 .

[15]  Juhan Nam,et al.  A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[16]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[19]  Neville H Fletcher,et al.  ACOUSTIC IMPEDANCE SPECTRA OF CLASSICAL AND MODERN FLUTES , 2001 .

[20]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[21]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[22]  Douglas Eck,et al.  Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio , 2011, ISMIR.

[23]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[24]  N. Fletcher,et al.  ACOUSTIC IMPEDANCES OF CLASSICAL AND MODERN FLUTES , 2000 .

[25]  Jiquan Ngiam,et al.  Sparse Filtering , 2011, NIPS.

[26]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.