Toward an Intelligent Editor of Digital Audio: Signal Processing Methods

In this article, we will describe signal processing methods that have been developed for use in an automatic music analysis system. A companion article by Chafe, Mont-Reynaud, and Rush, also in this issue of the Journal, deals with higher-level issues, namely the recognition of musical constructs. Unless one is willing to settle for a direct interface between musician and computer, such as the hardwired keyboard in the Xerox PARC system (Ornstein and Maxwell 1981), techniques must be developed to extract musical features from the sound itself. The approach of combining signal processing with knowledge engineering seems quite promising for music analysis. In contrast with many of the signals to which signal processing methods are applied, musical signals usually contain a great deal of order, chiefly in the form of quasi-periodicity (pitch and rhythm), and are not usually severely corrupted with random noise. By taking advantage of these features, one can construct mechanisms that provide musically significant descriptions of real data, such as tempo tracking ("foot-tapping"), meter analysis, attack characterization, pitch characterization (including vibrato), and timbre analysis. In this article and its companion, sample results of some promising strategies for accomplishing these goals are presented. In particular, we will concentrate on the problems of primary segmentation, that is, the first few passes through the data using little or no a priori knowledge. If we can mark the begin time for each new event in the music, the task of classifying and parameterizing each event is made easier. We have tried three approaches to this segmentation problem: (1) an amplitude thresholding method, (2) a linear predictive coding (LPC) method, and (3) a pitch detection method. While we will also discuss more advanced strategies, these are generally awaiting implementation and thus are not included in he examples.