Sound Source Separation in Monaural Music Signals

Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is modeled as a weighted sum of basis functions. We review the existing algorithms which use independent component analysis, sparse coding, and non-negative matrix factorization to estimate the basis functions from an input mixture signal. Our proposed unsupervised separation algorithm based on the instantaneous model combines non-negative matrix factorization with sparseness and temporal continuity objectives. The algorithm is based on minimizing the reconstruction error between the magnitude spectrogram of the observed signal and the model, while restricting the basis functions and their gains to non-negative values, and the gains to be sparse and continuous in time. In the minimization, we consider iterative algorithms which are initialized with random values and updated so that the value of the total objective cost function decreases at each iteration. Both multiplicative update rules and a steepest descent algorithm are proposed for this task. To improve the convergence of the projected steepest descent algorithm, we propose an augmented divergence to measure the reconstruction error. Simulation experiments on generated mixtures of pitched instruments and drums were run to monitor the behavior of the proposed method. The proposed method enables average signal-to-distortion ratio (SDR) of 7.3 dB, which is higher than the SDRs obtained with the other tested methods based on the instantaneous signal model. To enable separating entities which correspond better to real-world sound objects, we propose two convolutive signal models which can be used to represent

[1]  Mototsugu Abe,et al.  Auditory scene analysis based on time-frequency integration of shared FM and AM , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Mark D. Plumbley Conditions for nonnegative independent component analysis , 2002, IEEE Signal Processing Letters.

[3]  Mototsugu Abe,et al.  Design Criteria for Simple Sinusoidal Parameter Estimation Based on Quadratic Interpolation of FFT Magnitude Peaks , 2004 .

[4]  J. Beauchamp,et al.  Fundamental frequency estimation of musical signals using a two‐way mismatch procedure , 1994 .

[5]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[6]  P. Philippe,et al.  One microphone singing voice separation using source-adapted models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[7]  Thomas F. Quatieri,et al.  An approach to co-channel talker interference suppression using a sinusoidal model for speech , 1990, IEEE Trans. Acoust. Speech Signal Process..

[8]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  T. Gautama,et al.  Separation of acoustic signals using self-organizing neural networks , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[10]  M. Huotilainen,et al.  Decomposition and modification of musical instrument sounds using a fractional delay allpass filter , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[11]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[12]  Anssi Klapuri,et al.  Separation of harmonic sounds using linear models for the overtone series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Mark D. Plumbley,et al.  An Independent Component Analysis Approach to Automatic Music Transcription , 2003 .

[14]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[15]  Shlomo Dubnov Extracting Sound Objects by Independent Subspace Analysis , 2002 .

[16]  Jean-Franois Cardoso High-Order Contrasts for Independent Component Analysis , 1999, Neural Computation.

[17]  Kunio Kashino,et al.  Auditory Scene Analysis in Music Signals , 2006 .

[18]  P. Smaragdis,et al.  Independent component analysis for automatic note extraction from musical trills. , 2004, The Journal of the Acoustical Society of America.

[19]  Daniel D. Lee,et al.  Statistical signal processing with nonnegativity constraints , 2003, INTERSPEECH.

[20]  Chih-Jen Lin,et al.  On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization , 2007, IEEE Transactions on Neural Networks.

[21]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[22]  A. de Cheveigné,et al.  The auditory system as a separation machine , 2001 .

[23]  Harald Viste,et al.  Binaural localization and separation techniques , 2004 .

[24]  Mike E. Davies,et al.  Sparse audio representations using the MCLT , 2006, Signal Process..

[25]  Tuomas Virtanen,et al.  ALGORITHM FOR THE SEPARATION OF HARMONIC SOUNDS WITH TIME- FREQUENCY SMOOTHNESS CONSTRAINT , 2003 .

[26]  Mitch Weintraub The GRASP sound separation system , 1984, ICASSP.

[27]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[28]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[29]  Tuomas Virtanen,et al.  Speech recognition using factorial hidden Markov models for separation in the feature space , 2006, INTERSPEECH.

[30]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[31]  Simon J. Godsill,et al.  Multidimensional optimisation of harmonic signals , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[32]  Emmanuel Vincent,et al.  Musical source separation using time-frequency source priors , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[34]  Mark R. Every,et al.  A SPECTRAL-FILTERING APPROACH TO MUSIC SIGNAL SEPARATION , 2004 .

[35]  S. Haykin Unsupervised adaptive filtering, vol. 1: Blind source separation , 2000 .

[36]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[37]  S. Schwerman,et al.  The Physics of Musical Instruments , 1991 .

[38]  Richard F. Lyon,et al.  Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Simon J. Godsill,et al.  Bayesian harmonic models for musical pitch estimation and analysis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Eugene Coyle,et al.  Prior Subspace Analysis for Drum Transcription , 2003 .

[41]  Harald Viste,et al.  A method for separation of overlapping partials based on similarity of temporal envelopes in multichannel mixtures , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Tuomas Virtanen,et al.  Sound Source Separation Using Sparse Coding with Temporal Continuity Objective , 2003, ICMC.

[43]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[44]  Heiko Purnhagen,et al.  HILN-the MPEG-4 parametric audio coding tools , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[45]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.

[46]  Judith C. Brown,et al.  An efficient algorithm for the calculation of a constant Q transform , 1992 .

[47]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.

[48]  Xavier Serra,et al.  Musical Sound Modeling with Sinusoids plus Noise , 1997 .

[49]  Michael M. Goodwin,et al.  Matching pursuit with damped sinusoids , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  Mark R. Every,et al.  Separation of synchronous pitched notes by spectral filtering of harmonics , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  W. Press,et al.  Numerical Recipes in Fortran: The Art of Scientific Computing.@@@Numerical Recipes in C: The Art of Scientific Computing. , 1994 .

[52]  Lawrence K. Saul,et al.  Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization , 2004, NIPS.

[53]  Simon J. Godsill,et al.  Bayesian harmonic models for musical signal analysis , 2003 .

[54]  Matti Karjalainen,et al.  Restoration and Enhancement of Solo Guitar Recordings Based on Sound Source Modeling , 2002 .

[55]  Rémi Gribonval,et al.  Harmonic decomposition of audio signals with matching pursuit , 2003, IEEE Trans. Signal Process..

[56]  Julius O. Smith,et al.  Audio representations for data compression and compressed domain processing , 1998 .

[57]  Mark D. Plumbley,et al.  A prototype system for object coding of musical audio , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[58]  Mike E. Davies,et al.  Sparse and shift-Invariant representations of music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[59]  Unto K. Laine,et al.  Frequency-warped signal processing for audio applications , 2000 .

[60]  Mark D. Plumbley,et al.  IF THE INDEPENDENT COMPONENTS OF NATURAL IMAGES ARE EDGES, WHAT ARE THE INDEPENDENT COMPONENTS OF NATURAL SOUNDS? , 2001 .

[61]  Derry Fitzgerald,et al.  SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION , 2002 .

[62]  Anssi Klapuri,et al.  Robust Multipitch Estimation for the Analysis and Manipulation of Polyphonic Musical Signals , 2000 .

[63]  Kazunori Ozawa,et al.  A bitrate and bandwidth scalable CELP coder , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[64]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[65]  Samer A. Abdallah,et al.  Towards music perception by redundancy reduction and unsupervised learning in probabilistic models , 2002 .

[66]  Xavier Rodet Musical Sound Signal Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models , 1997 .

[67]  Michael A. Casey,et al.  Auditory group theory with applications to statistical basis methods for structured audio , 1998 .

[68]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[69]  Bruno A. Olshausen,et al.  Learning sparse, overcomplete representations of time-varying natural images , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[70]  Tero Tolonen Methods for Separation of Harmonic Sound Sources Using Sinusoidal Modeling , 1999 .

[71]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[72]  Simon J. Godsill,et al.  A Bayesian Approach for Blind Separation of Sparse Sources , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[73]  Tero Tolonen,et al.  Object-based sound source modeling , 2000 .

[74]  Tuomas Virtanen,et al.  PERCEPTUALLY MOTIVATED PARAMETRIC REPRESENTATION FOR HARMONIC SOUNDS FOR DATA COMPRESSION PURPOSES , 2000 .

[75]  Xavier Rodet,et al.  Tracking of partials for additive sound synthesis using hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[76]  Paris Smaragdis,et al.  Mitsubishi Electric Research Laboratories , 1994 .

[77]  Ed F. Deprettere,et al.  Robust exponential modeling of audio signals , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[78]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[79]  L. Nishiguchi,et al.  Harmonic vector excitation coding of speech at 2.0 kbps , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[80]  Barry Vercoe,et al.  Music-listening systems , 2000 .

[81]  Petre Stoica,et al.  Statistical analysis of two nonlinear least-squares estimators of sine-wave parameters in the colored-noise case , 1989 .

[82]  Philippe Depalle,et al.  An Improved Additive Analysis Method Using Parametric Modelling of the Short-Time Fourier Transform , 1996, ICMC.

[83]  Daniel P. W. Ellis,et al.  Evaluating Speech Separation Systems , 2005, Speech Separation by Humans and Machines.

[84]  Erkki Oja,et al.  A "nonnegative PCA" algorithm for independent component analysis , 2004, IEEE Transactions on Neural Networks.

[85]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[86]  Tuomas Virtanen,et al.  Unsupervised Learning Methods for Source Separation in Monaural Music Signals , 2006 .

[87]  Derry Fitzgerald,et al.  Automatic Drum Transcription and Source Separation , 2004 .

[88]  Teresa H. Meng,et al.  A perceptually based audio signal model with application to scalable audio compression , 1999 .

[89]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[90]  Bhiksha Raj,et al.  Soft mask estimation for single channel speaker separation , 2004, SAPA@INTERSPEECH.

[91]  Judith C. Brown Musical fundamental frequency tracking using a pattern recognition method , 1992 .

[92]  B. Hofmann-Wellenhof,et al.  Introduction to spectral analysis , 1986 .

[93]  John G. Beerends Audio Quality Determination Based on Perceptual Measurement Techniques , 2002 .

[94]  Brendan J. Frey,et al.  Probabilistic Inference of Speech Signals from Phaseless Spectrograms , 2003, NIPS.

[95]  Edward A. Lee,et al.  Adaptive Signal Models: Theory, Algorithms, and Audio Applications , 1998 .

[96]  Daniel P. W. Ellis,et al.  Mid-level representations for Computational Auditory Scene Analysis , 1995, IJCAI 1995.

[97]  Anssi Klapuri,et al.  Separation of harmonic sounds using multipitch analysis and iterative parameter estimation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[98]  Jean-François Cardoso,et al.  Multidimensional independent component analysis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[99]  Masataka Goto A Predominant-F0 Estimation Method for Real-world Musical Audio Signals: MAP Estimation for Incorporating Prior Knowledge about F0s and Tone Models , 2001 .

[100]  P. Depalle,et al.  Extraction of spectral peak parameters using a short-time Fourier transform modeling and no sidelobe windows , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[101]  Ronald N. Bracewell,et al.  The Fourier Transform and Its Applications , 1966 .

[102]  Harvey Fletcher,et al.  Quality of Piano Tones , 1962 .

[103]  SIGNAL MODELING WITH SINUSOIDS PLUS NOISE , 2001 .

[104]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[105]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[106]  Anssi Klapuri,et al.  Multipitch estimation and sound separation by the spectral smoothness principle , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[107]  Stephen T. Neely,et al.  Signals, Sound, and Sensation , 1997 .

[108]  Anssi Klapuri,et al.  Signal Processing Methods for the Automatic Transcription of Music , 2004 .

[109]  Teuvo Kohonen,et al.  Emergence of invariant-feature detectors in the adaptive-subspace self-organizing map , 1996, Biological Cybernetics.

[110]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[111]  Christian Uhle,et al.  EXTRACTION OF DRUM TRACKS FROM POLYPHONIC MUSIC USING INDEPENDENT SUBSPACE ANALYSIS , 2003 .

[112]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[113]  R. Meddis,et al.  A unitary model of pitch perception. , 1997, The Journal of the Acoustical Society of America.

[114]  Alois Kufner Fourier Series , 1971 .

[115]  Paul Mermelstein,et al.  Evaluation of a segmental SNR measure as an indicator of the quality of ADPCM coded speech , 1979 .

[116]  Mark R. Every,et al.  Separation of musical sources and structure from single-channel polyphonic recordings , 2006 .

[117]  A. Doucet,et al.  Joint Bayesian detection and estimation of noisy sinusoids via reversible jump MCMC , 1998 .

[118]  Masataka Goto Music Scene Description , 2006 .

[119]  B. Raj,et al.  Latent variable decomposition of spectrograms for single channel speaker separation , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[120]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[121]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[122]  Kari Torkkola,et al.  Blind Separation For Audio Signals - Are We There Yet? , 1999 .

[123]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[124]  Morten Mørup,et al.  Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[125]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[126]  VirtanenTuomas Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007 .

[127]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[128]  Rafal Bogacz,et al.  Emergence of Movement Sensitive Neurons' Properties by Learning a Sparse Code for Natural Moving Images , 2000, NIPS.

[129]  Bruno Torrésani,et al.  Sparse adaptive representations for musical signals , 2006 .

[130]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[131]  J. W. Tukey,et al.  The Measurement of Power Spectra from the Point of View of Communications Engineering , 1958 .

[132]  D G Childers,et al.  Cochannel speech separation. , 1988, The Journal of the Acoustical Society of America.

[133]  Tuomas Virtanen Accurate Sinusoidal Model Analysis and Parameter Reduction by Fusion of Components , 2001 .

[134]  Ma Conway,et al.  Handbook of perception and cognition , 1996 .

[135]  Mike E. Davies,et al.  Unsupervised learning of sparse and shift-invariant decompositions of polyphonic music , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[136]  Bruno A. Olshausen,et al.  Sparse Codes and Spikes , 2001 .

[137]  A Lewis,et al.  THE SCIENCE OF SOUND , 1997 .

[138]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[139]  A.K. Krishnamurthy,et al.  Multidimensional digital signal processing , 1985, Proceedings of the IEEE.

[140]  Xavier Rodet,et al.  Music Transcription with ISA and HMM , 2004, ICA.

[141]  Tuomas Virtanen,et al.  Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling , 2003 .

[142]  Julius O. Smith,et al.  PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation , 1987, ICMC.

[143]  Philippe Lepain Polyphonic Pitch Extraction from Musical Signals , 1999 .

[144]  Avery Li-Chun Wang,et al.  Instantaneous and frequency-warped signal processing techniques for auditory source separation , 1994 .

[145]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[146]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[147]  Te-Won Lee,et al.  A Maximum Likelihood Approach to Single-channel Source Separation , 2003, J. Mach. Learn. Res..

[148]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[149]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[150]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[151]  Dilip Sarkar,et al.  Methods to speed up error back-propagation learning algorithm , 1995, CSUR.

[152]  Eric Moulines,et al.  HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[153]  Anssi Klapuri,et al.  Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[154]  Ferran Marqués,et al.  The MPEG-4 Book , 2003, J. Electronic Imaging.

[155]  Derry Fitzgerald,et al.  Sound Source Separation Using Shifted Non-Negative Tensor Factorisation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[156]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[157]  S. Handel,et al.  Chapter 12 – Timbre Perception and Auditory Object Identification , 1995 .

[158]  Kunio Kashino,et al.  Organization of Hierarchical Perceptual Sounds: Music Scene Analysis with Autonomous Processing Modules and a Quantitative Information Integration Mechanism , 1995, IJCAI.

[159]  Eric D. Scheirer,et al.  Structured audio and effects processing in the MPEG-4 multimedia standard , 1999, Multimedia Systems.

[160]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[161]  Anssi Klapuri,et al.  Separation of harmonic sound sources using sinusoidal modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[162]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[163]  Anssi Klapuri,et al.  Auditory-Model Based Methods for Multiple Fundamental Frequency Estimation , 2006 .

[164]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[165]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[166]  Andrew D. Sterian,et al.  Model-based segmentation of time-frequency images for musical transcription. , 1999 .

[167]  Robert C. Maher,et al.  Evaluation of a method for separating digitized duet signals , 1990 .

[168]  Tuomas Virtanen,et al.  Separation of sound sources by convolutive sparse coding , 2004, SAPA@INTERSPEECH.

[169]  Matti Karjalainen,et al.  Separation of speech signals using iterative multi-pitch analysis and prediction , 1999, EUROSPEECH.

[170]  Gal Chechik,et al.  Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway , 2001, NIPS.

[171]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[172]  Jouni Paulus,et al.  Drum transcription with non-negative spectrogram factorisation , 2005, 2005 13th European Signal Processing Conference.

[173]  Kunio Kashino,et al.  A Sound Source Separation System with the Ability of Automatic Tone Modeling , 1993, International Conference on Mathematics and Computing.

[174]  Paris Smaragdis,et al.  Evaluation of blind signal separation methods , 1999 .

[175]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[176]  Xavier Serra,et al.  A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[177]  Thomas Sikora,et al.  How Efficient is MPEG-7 for General Sound Recognition? , 2004 .

[178]  Vesa Välimäki,et al.  Physical Modeling of Plucked String Instruments with Application to Real-Time Sound Synthesis , 1996 .

[179]  Alexandros Eleftheriadis,et al.  MPEG-4's binary format for scene description , 2000, Signal Process. Image Commun..

[180]  M. Casey,et al.  MPEG-7 sound-recognition tools , 2001, IEEE Trans. Circuits Syst. Video Technol..

[181]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[182]  Hagai Attias,et al.  Temporal Low-Order Statistics of Natural Sounds , 1996, NIPS.

[183]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[184]  Mark D. Plumbley,et al.  Polyphonic transcription by non-negative sparse coding of power spectra , 2004, ISMIR.

[185]  Anssi Klapuri,et al.  Recognition of acoustic noise mixtures by combined bottom-up and top-down processing , 2000, 2000 10th European Signal Processing Conference.

[186]  Te-Won Lee,et al.  Blind Separation of Delayed and Convolved Sources , 1996, NIPS.

[187]  Anssi Klapuri,et al.  Modeling musical sounds with an interpolating state model , 2005, 2005 13th European Signal Processing Conference.

[188]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[189]  Ali Taylan Cemgil,et al.  Bayesian Music Transcription , 1997 .

[190]  John G. Beerends,et al.  A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation , 1992 .

[191]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .