Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models

In this paper, we address an important step toward our goal of automatic musical accompaniment-the segmentation problem. Given a score to a piece of monophonic music and a sampled recording of a performance of that score, we attempt to segment the data into a sequence of contiguous regions corresponding to the notes and rests in the score. Within the framework of a hidden Markov model, we model our prior knowledge, perform unsupervised learning of the data model parameters, and compute the segmentation that globally minimizes the posterior expected number of segmentation errors. We also show how to produce "online" estimates of score position. We present examples of our experimental results, and readers are encouraged to access actual sound data we have made available from these experiments.

[1]  Philip A. Chou,et al.  Document Image Decoding Using Markov Source Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Richard M. Schwartz,et al.  Improved topic discrimination of broadcast news using a model of multiple simultaneous topics , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Roger B. Dannenberg,et al.  Real-Time Computer Accompaniment of Keyboard Performances , 1985, ICMC.

[4]  Roger B. Dannenberg,et al.  New Techniques for Enhanced Quality of Computer Accompaniment , 1988, ICMC.

[5]  Miller Puckette,et al.  Synthetic Rehearsal: Training the Synthetic Performer , 1985, ICMC.

[6]  Judith C. Brown Musical fundamental frequency tracking using a pattern recognition method , 1992 .

[7]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[8]  Roger B. Dannenberg,et al.  An On-Line Algorithm for Real-Time Accompaniment , 1984, ICMC.

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10]  Barry Vercoe,et al.  The Synthetic Performer in The Context of Live Performance , 1984, International Conference on Mathematics and Computing.

[11]  Bridget Baird,et al.  Artificial Intelligence and Music: Implementing an Interactive Computer Performer , 1993 .

[12]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[14]  Roland Sauerbrey,et al.  Biography , 1992, Ann. Pure Appl. Log..

[15]  Roger B. Dannenberg,et al.  A Stochastic Method of Tracking a Vocal Performer , 1997, ICMC.