Toward Automatic Music Audio Summary Generation from Signal Analysis

This paper deals with the automatic generation of music audio summaries from signal analysis without the use of any other information. The strategy employed here is to consider the audio signal as a succession of “states” (at various scales) corresponding to the structure (at various scales) of a piece of music. This is, of course, only applicable to certain kinds of musical genres based on some kind of repetition. From the audio signal, we first derive dynamic features representing the time evolution of the energy content in various frequency bands. These features constitute our observations from which we derive a representation of the music in terms of “states”. Since human segmentation and grouping performs better upon subsequent hearings, this “natural” approach is followed here. The first pass of the proposed algorithm uses segmentation in order to create “templates”. The second pass uses these templates in order to propose a structure of the music using unsupervised learning methods (Kmeans and hidden Markov model). The audio summary is finally constructed by choosing a representative example of each state. Further refinements of the summary audio signal construction, uses overlapadd, and a tempo detection/ beat alignment in order to improve the audio quality of the created summary.

[1]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[2]  Philippe Aigrain,et al.  Representation-based user interfaces for the audiovisual library of the year 2000 , 1995, Electronic Imaging.

[3]  Philippe Aigrain,et al.  Representation-based user interfaces for the audiovisual library of the year 2000 , 1995, Electronic imaging.

[4]  Alan P. Parkes,et al.  Filmic space-time diagrams for video structure representation , 1996, Signal Process. Image Commun..

[5]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[6]  Mark Sandler,et al.  Finding Repeating Patterns in Acoustic Musical Signals : Applications for Audio Thumbnailing , 2002 .

[7]  Takafumi Miyatake,et al.  IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system , 1991, CHI.

[8]  William P. Birmingham,et al.  MUSART: Music Retrieval Via Aural Queries , 2001, ISMIR.

[9]  François Pachet,et al.  The CUIDADO Project , 2002, ISMIR.

[10]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[11]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[13]  Mark Sandler,et al.  Segmentation of Musical Signals Using Hidden Markov Models. , 2001 .

[14]  Irène Deliège,et al.  A perceptual approach to contemporary musical forms , 1989 .

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.