Statistical Modeling and Retrieval of Polyphonic Music

In this article, we propose a solution to the problem of query by example for polyphonic music audio. We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (the spiral array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create a-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81% of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75% of the time.

[1]  Ching-Hua Chuan,et al.  Polyphonic Audio Key Finding Using the Spiral Array CEG Algorithm , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[3]  J. Iñesta,et al.  Polyphonic music transcription through dynamic networks and spectral pattern identification ∗ , 2003 .

[4]  Elaine Chew,et al.  Key distributions as musical fingerprints for similarity assessment , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[5]  E. Chew Modeling Tonality: Applications to Music Cognition , 2001 .

[6]  Stefan M. Rüger,et al.  A Comparative and Fault-tolerance Study of the Use of N-grams with Polyphonic Music , 2002, ISMIR.

[7]  Jeremy Pickens,et al.  Polyphonic music modeling with random fields , 2003, MULTIMEDIA '03.

[8]  E. Chew Towards a mathematical model of tonality , 2000 .

[9]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[10]  Mark B. Sandler,et al.  Polyphonic Score Retrieval Using Polyphonic Audio Queries: A Harmonic Modeling Approach , 2003, ISMIR.

[11]  Christopher Raphael,et al.  Automatic Transcription of Piano Music , 2002, ISMIR.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Elaine Chew,et al.  Real-Time Pitch Spelling Using the Spiral Array , 2005, Computer Music Journal.

[14]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..