Melodic Grouping in Music Information Retrieval: New Methods and Applications

We introduce the MIR task of segmenting melodies into phrases, summarise the musicological and psychological background to the task and review existing computational methods before presenting a new model, IDyOM, for melodic segmentation based on statistical learning and information-dynamic analysis. The performance of the model is compared to several existing algorithms in predicting the annotated phrase boundaries in a large corpus of folk music. The results indicate that four algorithms produce acceptable results: one of these is the IDyOM model which performs much better than naive statistical models and approaches the performance of the best-performing rule-based models. Further slight performance improvement can be obtained by combining the output of the four algorithms in a hybrid model, although the performance of this model is moderate at best, leaving a great deal of room for improvement on this task.

[1]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[2]  D. Temperley The Cognition of Basic Musical Structures , 2001 .

[3]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[4]  Walter J Dowling,et al.  Rhythmic groups and subjective chunks in memory for melodies , 1973 .

[5]  Nick Chater,et al.  Reconciling simplicity and likelihood principles in perceptual organization. , 1996, Psychological review.

[6]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[7]  Thomas H. Stoffer,et al.  Representation of Phrase Structure in the Perception of Music , 1985 .

[8]  W. Orrison,et al.  Functional Brain Imaging , 1995 .

[9]  Bob L. Sturm,et al.  Proceedings of the International Computer Music Conference , 2011 .

[10]  Sloboda Ja,et al.  The psychological reality of musical segments. , 1980 .

[11]  Frans Wiering,et al.  An Experimental Comparison of Human and Automatic Music Segmentation , 2008 .

[12]  Guy Lapalme,et al.  Performance Measures in Classification of Human Communications , 2007, Canadian Conference on AI.

[13]  M. Goldsmith,et al.  Statistical Learning by 8-Month-Old Infants , 1996 .

[14]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[15]  John G. Cleary,et al.  Unbounded Length Contexts for PPM , 1997 .

[16]  H. Barlow,et al.  A dictionary of musical themes , 1975 .

[17]  J. Sloboda,et al.  The psychological reality of musical segments. , 1980, Canadian journal of psychology.

[18]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[19]  Paul R. Cohen,et al.  Voting experts: An unsupervised algorithm for segmenting sequences , 2007, Intell. Data Anal..

[20]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[21]  Michael R. Brent,et al.  Speech segmentation and word discovery: a computational , 1999 .

[22]  N. Chater The Search for Simplicity: A Fundamental Cognitive Principle? , 1999 .

[23]  Ian H. Witten,et al.  Multiple viewpoint systems for music prediction , 1995 .

[24]  P. Todd,et al.  Musical networks: Parallel distributed perception and performance , 1999 .

[25]  J. Saffran,et al.  Absolute pitch in infant auditory learning: evidence for developmental reorganization. , 2001, Developmental psychology.

[26]  Yuhong Yang Review of “Nonlinear Estimation and Classification”, by D.D. Denison, M.H. Hansen, C.C. Holmes, B. Mallick, and B. Yu (eds.), , 2004 .

[27]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[28]  Belinda Thom,et al.  Melodic segmentation: evaluating the performance of algorithms and musical experts , 2002, ICMC.

[29]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[30]  Michael R. Brent,et al.  An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[31]  D. Norman,et al.  PSYCHOLOGICAL REVIEW PRIMARY MEMORY1 , 1965 .

[32]  Suzanne Bunton,et al.  Semantically Motivated Improvements for PPM Variants , 1997, Comput. J..

[33]  Emilios Cambouropoulos,et al.  Musical Parallelism and Melodic Segmentation: : A Computational Approach , 2006 .

[34]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[35]  Irène Deliège Grouping Conditions in Listening to Music: An Approach to Lerdahl & Jackendoff's Grouping Preference Rules , 1987 .

[36]  K. Koffka Principles Of Gestalt Psychology , 1936 .

[37]  I. Peretz,et al.  Processing of local and global musical information by unilateral brain-damaged patients. , 1990, Brain : a journal of neurology.

[38]  Elizabeth K. Johnson,et al.  Statistical learning of tone sequences by human infants and adults , 1999, Cognition.

[39]  P. Jusczyk The discovery of spoken language , 1997 .

[40]  J. Fodor,et al.  The psychological reality of linguistic segments , 1965 .

[41]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[42]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[43]  Sven Ahlbäck Melody Beyond Notes: A Study of Melody Cognition , 2004 .

[44]  E. Narmour The Analysis and Cognition of Melodic Complexity: The Implication-Realization Model , 1992 .

[45]  Carol L. Krumhansl,et al.  Perceiving Musical Time , 1990 .

[46]  Peter A. Tucker,et al.  Primary Memory , 1965, Encyclopedia of Database Systems.

[47]  E. Narmour The analysis and cognition of basic melodic structures , 1992 .

[48]  D. Broadbent,et al.  Perception of Sequence in Auditory Events , 1960 .

[49]  Annabel J. Cohen,et al.  Parsing of Melody: Quantification and Testing of the Local Grouping Rules of Lerdahl and Jackendoff's A Generative Theory of Tonal Music , 2004 .

[50]  R. Jackendoff Consciousness and the Computational Mind , 1987 .

[51]  L. Rüschendorf,et al.  On the Perception of Time , 2009, Gerontology.

[52]  M. Brent Speech segmentation and word discovery: a computational perspective , 1999, Trends in Cognitive Sciences.

[53]  Robert O. Gjerdingen,et al.  Apparent Motion in Music , 1994 .

[54]  Nicola Orio,et al.  A Comparison of Manual and Automatic Melody Segmentation , 2002, ISMIR.

[55]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[56]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[57]  Geraint A. Wiggins,et al.  Methods for Combining Statistical Models of Music , 2004, CMMR.

[58]  J. Saffran Absolute pitch in infancy and adulthood: the role of tonal structure , 2003 .

[59]  Geraint A. Wiggins,et al.  MEMORY AND MELODIC DENSITY: A MODEL FOR MELODY SEGMENTATION , 2003 .

[60]  C Snow,et al.  Child language data exchange system , 1984, Journal of Child Language.

[61]  I. Peretz,et al.  Clustering in music: an appraisal of task factors. , 1989, International journal of psychology : Journal international de psychologie.

[62]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[63]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[64]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[65]  A. Gregory Perception of clicks in music. , 1977, Perception & psychophysics.

[66]  Geraint A. Wiggins,et al.  Improved Methods for Statistical Modelling of Monophonic Music , 2004 .

[67]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[68]  Neil P. McAngus Todd,et al.  The auditory “Primal Sketch”: A multiscale model of rhythmic grouping , 1994 .

[69]  L. Polansky,et al.  Temporal Gestalt Perception in Music , 1980 .

[70]  T G Bever,et al.  Harmonic structure as a determinant of melodic organization , 1981, Memory & cognition.

[71]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[72]  I. Peretz,et al.  Contribution of different cortical areas in the temporal lobes to music processing. , 1998, Brain : a journal of neurology.

[73]  John Hale,et al.  Uncertainty About the Rest of the Sentence , 2006, Cogn. Sci..

[74]  M. Bruderer Perception and modeling of segment boundaries in popular music , 2008 .

[75]  Frans Wiering,et al.  An experimental comparison of human and automatich segmentation , 2008 .

[76]  Rens Bod,et al.  Memory-Based Models of Melodic Analysis: Challenging the Gestalt Principles , 2002 .

[77]  Uffe K. Wiil,et al.  Computer Music Modelling and Retrieval , 2004 .

[78]  N. Chater The Search for Sim plicity: A Fundam ental Cognitive Principle? , 1999 .

[79]  Plumbley,et al.  Information dynamics and the perception of temporal structure , 2009 .

[80]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[81]  Mark B. Sandler,et al.  Using duration models to reduce fragmentation in audio segmentation , 2006, Machine Learning.

[82]  Emilios Cambouropoulos,et al.  The Local Boundary Detection Model (LBDM) and its Application in the Study of Expressive Timing , 2001, ICMC.

[83]  Christoph Wolff Repertoire International Des Sources Musicales (RISM) , 2007 .

[84]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[85]  G. Bower Organizational factors in memory. , 1970 .

[86]  Klaus Keil Repertoire International Des Sources Musicales (RISM) , 2009 .

[87]  Leonard B. Meyer Meaning in music and information theory. , 1957 .

[88]  N. Jesper Larsson Extended application of suffix trees to data compression , 1996, Proceedings of Data Compression Conference - DCC '96.

[89]  Mark D. Plumbley,et al.  Information dynamics: patterns of expectation and surprise in the perception of music , 2009, Connect. Sci..

[90]  John G. Cleary,et al.  Unbounded length contexts for PPM , 1995, Proceedings DCC '95 Data Compression Conference.

[91]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[92]  Agneta Nordberg Functional brain imaging , 2006 .

[93]  Michael Collins,et al.  Review of Beyond grammar: an experience-based theory of language by Rens Bod. CSLI Publications 1998. , 1999 .