Structural segmentation of Hindustani concert audio with posterior features

Structural segmentation of music involves identifying boundaries between homogenous regions where the homogeneity involves one or more musical dimensions, and therefore depends on the musical genre. In this work, we address the segmentation of Hindustani instrumental concert recordings at the highest time-scale, that is, concert sections marked by prominent changes in rhythmic structure. Tempo features are effectively combined with energy and chroma features motivated by musicological knowledge and acoustic observations. Posterior probability features from unsupervised model fitting of the frame-level acoustic features are shown to significantly improve robustness to local acoustic variations. Finally, two diverse change detection criteria are combined to obtain a superior segmentation system.

[1]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[2]  Jieping Xu,et al.  Rhythm-Based Segmentation of Popular Chinese Music , 2005, ISMIR.

[3]  Hervé Bourlard,et al.  Posterior features for template-based ASR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  W. Marsden I and J , 2012 .

[5]  James R. Glass,et al.  Towards multi-speaker unsupervised speech pattern discovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Anssi Klapuri,et al.  State of the Art Report: Audio-Based Music Structure Analysis , 2010, ISMIR.

[7]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[8]  Peter Grosche,et al.  Cyclic tempogram—A mid-level tempo representation for musicsignals , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Masataka Goto,et al.  A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting , 2007, ISMIR.

[10]  Mark B. Sandler,et al.  Structural Segmentation of Musical Audio by Constrained Clustering , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[12]  Hanna M. Lukashevich Towards Quantitative Measures of Evaluating Song Segmentation , 2008, ISMIR.

[13]  Thippur V. Sreenivas,et al.  Hierarchical Classification of Carnatic Music Forms , 2013, ISMIR.

[14]  Kristoffer Jensen,et al.  Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony , 2007, EURASIP J. Adv. Signal Process..

[15]  Geoffroy Peeters,et al.  Template-Based Estimation of Time-Varying Tempo , 2007, EURASIP J. Adv. Signal Process..

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  B. Wade Music in India: The Classical Traditions , 1979 .

[18]  S. Dixon ONSET DETECTION REVISITED , 2006 .