A sound source identification system for ensemble music based on template adaptation and music stream extraction

Sound source identification is an important problem in auditory scene analysis when multiple sound objects are simultaneously present in the scene. This paper proposes an adaptive method for sound source identification that is applicable to real performances of ensemble music. For musical sound source identification, the feature-based methods and template-matching-based methods were already proposed. However, it is difficult to extract features of a single note from a sound mixture. In addition, sound variability has been a problem when dealing with real music performances. Thus this paper proposes an adaptive method for template matching that can cope with variability in musical sounds. The method is based on the matched filtering and does not require a feature extraction process. Moreover, this paper discusses musical context integration based on the Bayesian probabilistic networks. Evaluations using recordings of real ensemble performances have revealed that the proposed method improve the source identification accuracy from 60.8% to 88.5% on average.

[1]  Reinhold Orglmeister,et al.  A contextual blind separation of delayed and convolved sources , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[3]  Jean-Francois Cardoso,et al.  Source separation using higher order moments , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Guy J. Brown,et al.  Perceptual Grouping of Musical Sounds : A Computational Model , 1994 .

[5]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[6]  T. Niihara,et al.  Transcription of sung song , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[8]  Kunio Kashino,et al.  Application of the Bayesian probability network to music scene analysis , 1998 .

[9]  Tomohiro Nakatani,et al.  Residue-Driven Architecture for Computational Auditory Scene Analysis , 1995, IJCAI.

[10]  Kunio Kashino,et al.  Organization of Hierarchical Perceptual Sounds: Music Scene Analysis with Autonomous Processing Modules and a Quantitative Information Integration Mechanism , 1995, IJCAI.

[11]  Arye Nehorai,et al.  Adaptive comb filtering for harmonic signal enhancement , 1986, IEEE Trans. Acoust. Speech Signal Process..

[12]  Victor R. Lesser,et al.  IPUS: An Architecture for Integrated Signal Processing and Signal Interpretation in Complex Environments , 1993, AAAI.

[13]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[14]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[15]  G. H. Yates,et al.  Signal Processing for a Cocktail Party Effect , 1969 .

[16]  Pattie Maes,et al.  Designing autonomous agents: Theory and practice from biology to engineering and back , 1990, Robotics Auton. Syst..

[17]  Kunio Kashino,et al.  A Sound Source Separation System with the Ability of Automatic Tone Modeling , 1993, International Conference on Mathematics and Computing.

[18]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[19]  Bernard Mont-Reynaud Problem-solving Strategies in a Music Transcription System , 1985, IJCAI.

[20]  Piero Cosi,et al.  Auditory modelling and self‐organizing neural networks for timbre classification , 1994 .

[21]  Chris Chafe,et al.  Source separation and note identification in polyphonic music , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.