Modulation Spectral Features: In Pursuit of Invariant Representations of Music with Application to Unsupervised Source Identification

Abstract Modulation frequency analysis has been studied predominantly in research areas such as communications, filtering and coding of digital signals, and representations of neurons in the biomedical field. In turn, modulation frequency features have surfaced in the area of music data mining, also known as music information retrieval (MIR). The term ‘modulation spectral features’ has been used rather loosely, but we refer to them as temporal patterns in a signal that may be revealed via the modulation spectrum. This paper consists of a literature survey of modulation features in which we review the historical use of general modulation features along with evolving interest in exploring modulation spectral features for data mining in music signals. We also discuss challenges encountered in employing modulation spectral features with music signals and suggest new directions in this area. Lastly, we further exploit modulation spectral features by applying these motivated concepts, properties, and parameters of modulation spectra and its features to our preliminary, unsupervised source identification method for sound sources of periodic, temporal patterns.

[1]  Kuldip K. Paliwal,et al.  Single-channel speech enhancement using spectral subtraction in the short-time modulation domain , 2010, Speech Commun..

[2]  Les E. Atlas,et al.  Time-Frequency Coherent Modulation Filtering of Nonstationary Signals , 2009, IEEE Transactions on Signal Processing.

[3]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[6]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[7]  Qin Li,et al.  Properties for modulation spectral filtering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  David Wessel,et al.  Realtime Multiple-pitch and Multiple-instrument Recognition For Music Signals using Sparse Non-negative Constraints , 2007 .

[9]  Yunfei Chen,et al.  On secrecy outage of MISO SWIPT systems in the presence of imperfect CSI , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[10]  Hrishikesh Deshpande,et al.  CLASSIFICATION OF MUSIC SIGNALS IN THE VISUAL DOMAIN , 2001 .

[11]  Rainer Martin,et al.  Hierarchical audio classification using cepstral modulation ratio regressions based on Legendre polynomials , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Nathalie Delprat,et al.  Global frequency modulation laws extraction from the Gabor transform of a signal: a first study of the interacting components case , 1997, IEEE Trans. Speech Audio Process..

[13]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[14]  E. Ambikairajah,et al.  Modulation Features for Speech and Music Classification , 2006, 2006 10th IEEE Singapore International Conference on Communication Systems.

[15]  Kuldip K. Paliwal,et al.  Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator , 2012, Speech Commun..

[16]  Christian Uhle,et al.  EXTRACTION OF DRUM TRACKS FROM POLYPHONIC MUSIC USING INDEPENDENT SUBSPACE ANALYSIS , 2003 .

[17]  Douglas E. Sturim,et al.  Automatic dysphonia recognition using biologically-inspired amplitude-modulation features , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Joonas Kauppinen Music Data Mining edited by Tao Li, Mitsunori Ogihara, George Tzanetakis , 2012 .

[19]  Birger Kollmeier,et al.  Modulation-based detection of speech in real background noise: Generalization to novel background classes , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Petros Maragos,et al.  Am-fm modulation features for music instrument signal analysis and recognition , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[21]  Les E. Atlas,et al.  Coherent modulation spectral filtering for single-channel music source separation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[22]  Constantine Kotropoulos,et al.  Music genre classification via sparse representations of auditory temporal modulations , 2009, 2009 17th European Signal Processing Conference.

[23]  Jessika Eichel Fundamentals Of Speech , 2016 .

[24]  Dan Barry,et al.  Drum Source Separation using Percussive Feature Detection and Spectral Modulation , 2005 .

[25]  Shihab A. Shamma,et al.  Representation of musical timbre in the auditory cortex , 1997 .

[26]  Chang-Hsing Lee,et al.  Music genre classification using modulation spectral features and multiple prototype vectors representation , 2011, 2011 4th International Congress on Image and Signal Processing.

[27]  Dominik B. Loeffler Instrument Timbres and Pitch Estimation in Polyphonic Music , 2008 .

[28]  Kun-Ming Yu,et al.  Automatic Music Genre Classification using Modulation Spectral Contrast Feature , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[29]  Vinoo Alluri,et al.  Exploring Perceptual and Acoustical Correlates of Polyphonic Timbre , 2010 .

[30]  John H. L. Hansen,et al.  Exploring Hilbert envelope based acoustic features in i-vector speaker verification using HT-PLDA , 2011 .

[31]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[32]  T. Dau Modeling auditory processing of amplitude modulation , 1997 .

[33]  Henkjan Honing,et al.  Time–frequency representation of musical rhythm by continuous wavelets , 2008 .

[34]  Hynek Hermansky,et al.  Modulation frequency features for phoneme recognition in noisy speech. , 2009, The Journal of the Acoustical Society of America.

[35]  S. Shamma,et al.  Temporal coherence and attention in auditory scene analysis , 2011, Trends in Neurosciences.

[36]  Petros Maragos,et al.  Modulation models for image processing and wavelet-based image demodulation , 1992, [1992] Conference Record of the Twenty-Sixth Asilomar Conference on Signals, Systems & Computers.

[37]  Constantine Kotropoulos,et al.  Non-Negative Multilinear Principal Component Analysis of Auditory Temporal Modulations for Music Genre Classification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Emiru Tsunoo,et al.  Rhythm map: Extraction of unit rhythmic patterns and analysis of rhythmic structure from music acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[40]  Daniel P. W. Ellis,et al.  Signal Processing for Music Analysis , 2011, IEEE Journal of Selected Topics in Signal Processing.

[41]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[42]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[43]  Yannis Stylianou,et al.  Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features , 2011, Speech Commun..

[44]  Sascha Disch,et al.  Multiband perceptual modulation analysis, processing and synthesis of audio signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  Nima Mesgarani,et al.  Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[47]  Keld K. Jensen,et al.  Timbre Models of Musical Sounds , 1999 .

[48]  Perfecto Herrera-Boyer,et al.  Automatic Classification of Musical Instrument Sounds , 2003 .

[49]  Ramdas Kumaresan,et al.  On decomposing speech into modulated components , 2000, IEEE Trans. Speech Audio Process..

[50]  Hervé Bourlard,et al.  Mel-cepstrum modulation spectrum (MCMS) features for robust ASR , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[51]  Tomi Kinnunen Joint Acoustic-Modulation Frequency for Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[52]  Nicolás Ruiz-Reyes,et al.  Amplitude modulated sinusoidal modeling for audio onset detection , 2010, 2010 18th European Signal Processing Conference.

[53]  Kun-Ming Yu,et al.  Automatic Music Genre Classification Based on Modulation Spectral Analysis of Spectral and Cepstral Features , 2009, IEEE Transactions on Multimedia.

[54]  Hynek Hermansky,et al.  Static and dynamic modulation spectrum for speech recognition , 2009, INTERSPEECH.

[55]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[56]  Yannis Stylianou,et al.  Rhythmic similarity of music based on dynamic periodicity warping , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[57]  Yannis Stylianou,et al.  Evaluation of modulation frequency features for speaker verification and identification , 2009, 2009 17th European Signal Processing Conference.

[58]  Rainer Martin,et al.  Musical genre classification based on a highly-resolved cepstral modulation spectrum , 2010, 2010 18th European Signal Processing Conference.

[59]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[60]  Birger Kollmeier,et al.  Amplitude modulation spectrogram based features for robust speech recognition in noisy and reverberant environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[61]  Antti Eronen,et al.  Comparison of features for musical instrument recognition , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[62]  Gaël Richard,et al.  Temporal Integration for Audio Classification With Application to Musical Instrument Classification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[63]  Hirokazu Kameoka,et al.  Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram , 2008, 2008 16th European Signal Processing Conference.

[64]  Jörn Anemüller,et al.  Detection of speech embedded in real acoustic background based on amplitude modulation spectrogram features , 2008, INTERSPEECH.

[65]  Renato Anghinah,et al.  EEG amplitude modulation analysis for semi-automated diagnosis of Alzheimer’s disease , 2012, EURASIP Journal on Advances in Signal Processing.

[66]  Gerald Langner,et al.  Temporal processing of pitch in the auditory system , 1997 .

[67]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[68]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[69]  Birger Kollmeier,et al.  Robust speech detection in real acoustic backgrounds with perceptually motivated features , 2011, Speech Commun..

[70]  Chang-Hsing Lee,et al.  Modulation Spectral Analysis of Static and Transitional Information of Cepstral and Spectral Features for Music Genre Classification , 2009, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[71]  Sunil Kumar Kopparapu,et al.  Music and vocal separation using multiband modulation based features , 2010, 2010 IEEE Symposium on Industrial Electronics and Applications (ISIEA).

[72]  DeLiang Wang,et al.  Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[73]  Jyh-Shing Roger Jang,et al.  Combining Visual and Acoustic Features for Music Genre Classification , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[74]  Dirk T. M. Slock,et al.  Periodic signal extraction with global amplitude and phase modulation for music signal decomposition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[75]  G. Kramer Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .

[76]  John H. L. Hansen,et al.  Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[77]  Les Atlas,et al.  Modulation Spectral Transforms -Application to Speech Separation and Modification- , 2003 .

[78]  David V. Anderson,et al.  Exploring frequency modulation features and resolution in the modulation spectrum , 2013, 2013 IEEE Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE).

[79]  Moo Young Kim,et al.  Music genre/mood classification using a feature-based modulation spectrum , 2011, International Conference on Mobile IT Convergence.

[80]  Les Atlas,et al.  Analysis of signal reconstruction after modulation filtering , 2005, SPIE Optics + Photonics.

[81]  Xuan Zhu,et al.  A Tempo Feature via Modulation Spectrum Analysis and its Application to Music Emotion Classification , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[82]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .