Audio Coding for Representation in MIDI via Pitch Detection Using Harmonic Dictionaries

The search for a flexible and concise alternate representation for digital musical sound leads to the proposal for the use of the MIDI (Musical Instrument Digital Interface) protocol. The problem becomes one of automating the conversion process from sound to MIDI. This requires processing musical sound and extracting the information necessary to represent the sound as MIDI data. We have conducted studies which have led to algorithms for segmentation of the sound and pitch detection of the individual notes. We describe a novel method for pitch detection using subset selection with dictionaries containing harmonic spectra from samples of musical sounds. Examples demonstrating applicability to monophonic sounds as well as signals with multiple sound sources are given, including detection of objects in a complex background scene.

[1]  James Anderson Moorer,et al.  On the segmentation and analysis of continuous musical sound by digital computer , 1975 .

[2]  Andranick Tanguiane Artificial Perception and Music Recognition , 1993, Lecture Notes in Computer Science.

[3]  E. Terhardt,et al.  Algorithm for extraction of pitch and pitch salience from complex tonal signals , 1982 .

[4]  D. Thomson,et al.  Spectrum estimation and harmonic analysis , 1982, Proceedings of the IEEE.

[5]  Julius O. Smith,et al.  Techniques for Note Identification in Polyphonic Music , 1985, ICMC.

[6]  Joseph Rothstein,et al.  MIDI: A Comprehensive Introduction , 1992 .

[7]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[8]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[9]  E. Terhardt,et al.  Pitch of complex signals according to virtual‐pitch theory: Tests, examples, and predictions , 1982 .

[10]  Roland Wilson,et al.  A generalized wavelet transform for Fourier analysis: The multiresolution Fourier transform and its application to image and audio signal analysis , 1992, IEEE Trans. Inf. Theory.

[11]  Ahmed H. Tewfik,et al.  Audio coding for conversion to MIDI , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[12]  Andranik S. Tangiuane Artificial Perception and Music Recognition , 1993 .

[13]  M. David Freedman Analysis of Musical Instrument Tones , 1967 .

[14]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[15]  Hermann von Helmholtz,et al.  On the Sensations of Tone , 1954 .

[16]  Daniel P. W. Ellis,et al.  A computer implementation of psychoacoustic grouping rules , 1993, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[17]  W. Andrew Schloss,et al.  Toward an Intelligent Editor of Digital Audio: Signal Processing Methods , 1982 .

[18]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[19]  Ingrid Daubechies,et al.  Time-frequency localization operators: A geometric phase space approach , 1988, IEEE Trans. Inf. Theory.

[20]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[21]  M. Mathews,et al.  Analysis of musical‐instrument tones , 1969 .

[22]  D. Slepian Prolate spheroidal wave functions, fourier analysis, and uncertainty — V: the discrete case , 1978, The Bell System Technical Journal.

[23]  Melville Clark,et al.  Duration of Attack Transients of Nonpercussive Orchestral Instruments , 1965 .

[24]  B. Moore An Introduction to the Psychology of Hearing , 1977 .