A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments
暂无分享,去创建一个
[1] Laurent Couvreur,et al. Blind Model Selection for Automatic Speech Recognition in Reverberant Environments , 2004, J. VLSI Signal Process..
[2] N. Suga,et al. Neural basis of amplitude-spectrum representation in auditory cortex of the mustached bat. , 1982, Journal of neurophysiology.
[3] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[4] Chen Yang,et al. Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR , 2005, IEEE Transactions on Audio, Speech, and Language Processing.
[5] E. B. Newman,et al. The precedence effect in sound localization. , 1949, The American journal of psychology.
[6] Hynek Hermansky,et al. Study on the dereverberation of speech based on temporal envelope filtering , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[7] William A. Yost,et al. Spatial hearing: The psychophysics of human sound localization, revised edition , 1998 .
[8] David Marr,et al. VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .
[9] S. A. Shamma,et al. Spectral Gradient Columns in Primary Auditory Cortex: Physiological and Psychoacoustical Correlates , 1991 .
[10] J. Moncur,et al. Binaural and monaural speech intelligibility in reverberation. , 1967, Journal of speech and hearing research.
[11] Douglas L. Jones,et al. Performance of time- and frequency-domain binaural beamformers based on recorded signals from real rooms. , 2004, The Journal of the Acoustical Society of America.
[12] Ning Ma,et al. A speech fragment approach to localising multiple speakers in reverberant environments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[13] Guy J. Brown,et al. Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.
[14] A. Oppenheim,et al. Nonlinear filtering of multiplied and convolved signals , 1968 .
[15] B C Wheeler,et al. Localization of multiple sound sources with two microphones. , 2000, The Journal of the Acoustical Society of America.
[16] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.
[17] S. Gelfand,et al. Effects of small room reverberation upon the recognition of some consonant features , 1979 .
[18] Ruth Y Litovsky,et al. Localization dominance in the median-sagittal plane: effect of stimulus duration. , 2004, The Journal of the Acoustical Society of America.
[19] E. de Boer,et al. On ringing limits of the auditory periphery , 2004, Biological Cybernetics.
[20] Les E. Atlas,et al. Acoustic diversity for improved speech recognition in reverberant environments , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[21] B. Grothe,et al. Precise inhibition is essential for microsecond interaural time difference coding , 2002, Nature.
[22] L A JEFFRESS,et al. A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.
[23] M Haggard,et al. Selectivity for distortions and words in speech perception. , 1974, British journal of psychology.
[24] T Sone,et al. On the perception of direction of echo. , 1968, The Journal of the Acoustical Society of America.
[25] Douglas L. Jones,et al. Localization-based grouping , 2006 .
[26] D. Deutsch. Two-channel listening to musical scales. , 1975, The Journal of the Acoustical Society of America.
[27] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.
[28] Guy J. Brown,et al. A blackboard architecture for computational auditory scene analysis , 1999, Speech Commun..
[29] B. P. Bogert,et al. The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .
[30] C. Schreiner,et al. Periodicity coding in the inferior colliculus of the cat. II. Topographical organization. , 1988, Journal of neurophysiology.
[31] H. Gaskell. The precedence effect , 1983, Hearing Research.
[32] R K Clifton. Breakdown of echo suppression in the precedence effect. , 1987, The Journal of the Acoustical Society of America.
[33] Willard R. Thurlow,et al. Precedence-Suppression Effects for Two Click Sources , 1961 .
[34] D.P. Skinner,et al. The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.
[35] DeLiang Wang,et al. Binaural sound segregation for multisource reverberant environments , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[36] M S Brandstein. Time-delay estimation of reverberated speech exploiting harmonic structure. , 1999, The Journal of the Acoustical Society of America.
[37] Ning Ma,et al. Speech fragment decoding techniques for simultaneous speaker identification and speech recognition , 2010, Comput. Speech Lang..
[38] D. Banks. Localisation and separation of simultaneous voices with two microphones , 1993 .
[39] Bayya Yegnanarayana,et al. Enhancement of reverberant speech using LP residual signal , 2000, IEEE Trans. Speech Audio Process..
[40] C. Faller,et al. Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.
[41] DeLiang Wang,et al. A one-microphone algorithm for reverberant speech enhancement , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[42] Mikio Tohyama,et al. Source waveform recovery in a reverberant space by cepstrum dereverberation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[43] L. V. Noorden. Temporal coherence in the perception of tone sequences , 1975 .
[44] Ning Ma,et al. Exploiting correlogram structure for robust speech recognition with multiple speech sources , 2007, Speech Commun..
[45] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[46] Tim Brookes,et al. Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[47] Richard F. Lyon,et al. Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[48] Steven M. Kay,et al. Cochannel speaker separation by harmonic enhancement and suppression , 1997, IEEE Trans. Speech Audio Process..
[49] John F. Culling,et al. Effects of simulated reverberation on the use of binaural cues and fundamental-frequency differences for separating concurrent vowels , 1994, Speech Commun..
[50] Marc Moonen,et al. Assessment of dereverberation algorithms for large vocabulary speech recognition systems , 2003, INTERSPEECH.
[51] Maurizio Omologo,et al. Experiments of speech recognition in a noisy and reverberant environment using a microphone array and HMM adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[52] R. K. Clifton,et al. Dynamic processes in the precedence effect. , 1991, The Journal of the Acoustical Society of America.
[53] Jonas Braasch,et al. Modelling of Binaural Hearing , 2005 .
[54] Alain de Cheveigné,et al. Speech f0 extraction based on Licklider's pitch perception model , 1991 .
[55] B H Repp,et al. On the possible role of auditory short-term adaptation in perception of the prevocalic [m]-[n] contrast. , 1987, The Journal of the Acoustical Society of America.
[56] Nelson Morgan,et al. Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments , 1998 .
[57] Richard M. Stern,et al. Missing Feature Speech Recognition using Dereverberation and Echo Suppression in Reverberant Environments , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[58] Richard F. Lyon,et al. A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[59] B. Juang,et al. Harmonicity based dereverberation with maximum a posteriori estimation , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..
[60] Steven van de Par,et al. The normalized correlation: Accounting for NoSπ thresholds with Gaussian and ‘‘low‐noise’’ masking noise , 1999 .
[61] R. H. Bolt,et al. Theory of Speech masking by reverberation , 1949 .
[62] B C Wheeler,et al. A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers. , 2001, The Journal of the Acoustical Society of America.
[63] R. Meddis. Simulation of mechanical to neural transduction in the auditory receptor. , 1986, The Journal of the Acoustical Society of America.
[64] Ewan A. Macpherson,et al. A Computer Model of Binaural Localization for Stereo Imaging Measurement , 1989 .
[65] Tomohiro Nakatani,et al. Harmonic sound stream segregation using localization and its application to speech stream segregation , 1999, Speech Commun..
[66] Richard M. Stern,et al. Efficient Cepstral Normalization for Robust Speech Recognition , 1993, HLT.
[67] B. Moore. An Introduction to the Psychology of Hearing: Sixth Edition , 2012 .
[68] R L Freyman,et al. Effect of click rate and delay on breakdown of the precedence effect. , 1987, Perception & psychophysics.
[69] Jean Rouat,et al. A pitch determination and voiced/unvoiced decision algorithm for noisy speech , 1995, Speech Commun..
[70] P M Zurek,et al. The precedence effect and its possible role in the avoidance of interaural ambiguities. , 1980, The Journal of the Acoustical Society of America.
[71] Martin Cooke,et al. Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.
[72] Steven Greenberg,et al. Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..
[73] S M Abel,et al. Sound localization: effects of reverberation time, speaker array, stimulus frequency, and stimulus rise/decay. , 1993, The Journal of the Acoustical Society of America.
[74] John Mourjopoulos,et al. Real-Time Room Equalization Based on Complex Smoothing: Robustness Results , 2004 .
[75] Jon Barker,et al. An automatic speech recognition system based on the scene analysis account of auditory perception , 2007, Speech Commun..
[76] DeLiang Wang,et al. A Supervised Learning Approach to Monaural Segregation of Reverberant Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[77] T. Houtgast,et al. A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .
[78] DeLiang Wang,et al. Sequential Organization of Speech in Reverberant Environments by Integrating Monaural Grouping and Binaural Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[79] M. Schroeder. Period histogram and product spectrum: new methods for fundamental-frequency measurement. , 1968, The Journal of the Acoustical Society of America.
[80] T. W. Parsons. Separation of speech from interfering speech by means of harmonic selection , 1976 .
[81] B. Atal. Automatic Speaker Recognition Based on Pitch Contours , 1969 .
[82] Hiroshi Sawada,et al. Overcomplete BSS for Convolutive Mixtures Based on Hierarchical Clustering , 2004, ICA.
[83] Richard F. Lyon. A computational model of binaural localization and separation , 1983, ICASSP.
[84] Terrence J. Sejnowski,et al. Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.
[85] Brian R Glasberg,et al. Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.
[86] DeLiang Wang,et al. Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).
[87] Ken'ichi Furuya,et al. Real-time source separation based on sound localization in a reverberant environment , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.
[88] S. van de Par,et al. The normalized interaural correlation: accounting for NoS pi thresholds obtained with Gaussian and "low-noise" masking noise. , 1999, The Journal of the Acoustical Society of America.
[89] Guy J. Brown,et al. Techniques for handling convolutional distortion with 'missing data' automatic speech recognition , 2004, Speech Commun..
[90] Ruth Y. Litovsky,et al. Positional dependence on localization dominance in the median‐sagittal plane , 1997 .
[91] Laurie R. Fincham. Refinements in the Impulse Testing of Loudspeakers , 1985 .
[92] Martin F. Schlang,et al. An auditory based approach for echo compensation with modulation filtering , 1989, EUROSPEECH.
[93] B. Kollmeier,et al. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. , 1994, The Journal of the Acoustical Society of America.
[94] L. Auger. The Journal of the Acoustical Society of America , 1949 .
[95] G. F. Kuhn. Model for the interaural time differences in the azimuthal plane , 1977 .
[96] R. Plomp,et al. Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.
[97] Andrew W. Fitzgibbon,et al. An Experimental Comparison of Range Image Segmentation Algorithms , 1996, IEEE Trans. Pattern Anal. Mach. Intell..
[98] D W Grantham,et al. Left-right asymmetry in the buildup of echo suppression in normal-hearing adults. , 1996, The Journal of the Acoustical Society of America.
[99] Barbara G. Shinn-Cunningham,et al. PERCEPTUAL CONSENQUECES OF INCLUDING REVERBERATION IN SPATIAL AUDITORY DISPLAYS , 2003 .
[100] Guy J. Brown,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .
[101] Tim Brookes,et al. Ideal Binary Mask Ratio: A Novel Metric for Assessing Binary-Mask-Based Sound Source Separation Algorithms , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[102] W. Hartmann. Localization of sound in rooms. , 1983, The Journal of the Acoustical Society of America.
[103] N. Durlach. Equalization and Cancellation Theory of Binaural Masking‐Level Differences , 1963 .
[104] Guy J. Brown,et al. Missing data speech recognition in reverberant conditions , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[105] Daniel Patrick Whittlesey Ellis,et al. Prediction-driven computational auditory scene analysis , 1996 .
[106] Stuart Gatehouse,et al. Perceptual segregation of competing speech sounds: the role of spatial location. , 1999, The Journal of the Acoustical Society of America.
[107] E D Schubert,et al. Envelope versus microstructure in the fusion of dichotic signals. , 1969, The Journal of the Acoustical Society of America.
[108] Guy J. Brown,et al. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation , 2004, Speech Commun..
[109] Masashi Unoki,et al. Robust and accurate F0 estimation for reverberant speech by utilizing complex cepstrum analysis , 2007 .
[110] Bill Gardner,et al. HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .
[111] W. Koenig,et al. Subjective Effects in Binaural Hearing , 1950 .
[112] J. Culling,et al. Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. , 1995, The Journal of the Acoustical Society of America.
[113] Peter H. Rogers,et al. Human capabilities of dereverberation , 2000 .
[114] Keith D. Martin. Echo suppression in a computational model of the precedence effect , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.
[115] A J King,et al. Spatial response properties of acoustically responsive neurons in the superior colliculus of the ferret: a map of auditory space. , 1987, Journal of neurophysiology.
[116] Sandra J. Guzman,et al. Auditory Processing of Sound Sources , 1996 .
[117] A. J. Watkins. Central, auditory mechanisms of perceptual compensation for spectral-envelope distortion. , 1991, The Journal of the Acoustical Society of America.
[118] Michael Kleinschmidt. IMPORTANCE OF EARLY AND LATE REFLECTIONS FOR AUTOMATIC SPEECH RECOGNITION IN REVERBERANT ENVIRONMENTS , 2003 .
[119] DeLiang Wang,et al. Auditory Segmentation Based on Onset and Offset Analysis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[120] Brad Libbey,et al. The effect of overlap-masking on binaural reverberant word intelligibility. , 2004, The Journal of the Acoustical Society of America.
[121] P. N. Denbigh,et al. A sound segregation algorithm for reverberant conditions , 2001, Speech Commun..
[122] R. Meddis,et al. Implementation details of a computation model of the inner hair‐cell auditory‐nerve synapse , 1990 .
[123] DeLiang Wang,et al. Model-based sequential organization in cochannel speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[124] C. Cherry,et al. On human communication , 1966 .
[125] J. Pickles. An Introduction to the Physiology of Hearing , 1982 .
[126] Daniel P. W. Ellis,et al. Evaluating Source Separation Algorithms With Reverberant Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[127] R. Kumaresan,et al. Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications , 1999 .
[128] DeLiang Wang,et al. On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.
[129] W. Lindemann. Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. , 1986, The Journal of the Acoustical Society of America.
[130] John F Culling,et al. Trading of intensity and interaural coherence in dichotic pitch stimuli. , 2010, The Journal of the Acoustical Society of America.
[131] Yehuda Albeck. Sound localization and binaural processing , 1998 .
[132] D. D. Greenwood. Critical Bandwidth and the Frequency Coordinates of the Basilar Membrane , 1961 .
[133] P. Loizou,et al. Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.
[134] K Aikawa,et al. Cepstral representation of speech motivated by time-frequency masking: an application to speech recognition. , 1996, The Journal of the Acoustical Society of America.
[135] James H. Martin,et al. Speech and Language Processing An Introduction to Natural Language Processing , Computational Linguistics , and Speech Recognition Second Edition , 2008 .
[136] Kuansan Wang,et al. Spectral shape analysis in the central auditory system , 1995, IEEE Trans. Speech Audio Process..
[137] R. Patterson,et al. B OF THE SVOS FINAL REPORT ( Part A : The Auditory Filterbank ) AN EFFICIENT AUDITORY FIL TERBANK BASED ON THE GAMMATONE FUNCTION , 2010 .
[138] Ning Ma,et al. Integrating pitch and localisation cues at a speech fragment level , 2007, INTERSPEECH.
[139] Guy J. Brown,et al. A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[140] Chaz Yee Toh,et al. Effects of reverberation on perceptual segregation of competing voices. , 2003, The Journal of the Acoustical Society of America.
[141] D. Grantham,et al. Cross-spectral and temporal factors in the precedence effect: discrimination suppression of the lag sound in free-field. , 1997, The Journal of the Acoustical Society of America.
[142] R W Hukin,et al. Effects of reverberation on spatial, prosodic, and vocal-tract size cues to selective attention. , 2000, The Journal of the Acoustical Society of America.
[143] J. Licklider,et al. A duplex theory of pitch perception , 1951, Experientia.
[144] Douglas L. Jones,et al. Beamforming with collocated microphone arrays , 2003 .
[145] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[146] T. Langhans,et al. Speech enhancement by nonlinear multiband envelope filtering , 1982, ICASSP.
[147] Stephanie Seneff. Pitch and spectral estimation of speech based on auditory synchrony model , 1984, ICASSP.
[148] Stephanie Seneff,et al. Pitch and spectral estimation of speech based on auditory synchrony model , 1983, ICASSP.
[149] M. Tohyama,et al. Blind dereverberation using short‐time cepstrum frame subtraction , 1999 .
[150] Tomohiro Nakatani,et al. One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals , 2003, NIPS.
[151] John F Culling,et al. The spatial unmasking of speech: evidence for within-channel processing of interaural time delay. , 2005, The Journal of the Acoustical Society of America.
[152] Patrick A. Naylor,et al. Speech Dereverberation , 2010 .
[153] T Houtgast,et al. A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.
[154] N. Sutherland,et al. Grouping Frequency Components of Vowels: When is a Harmonic not a Harmonic? , 1984 .
[155] T. Yin,et al. Psychophysical and physiological evidence for a precedence effect in the median sagittal plane. , 1997, Journal of neurophysiology.
[156] M. Bodden. Modeling human sound-source localization and the cocktail-party-effect , 1993 .
[157] Mitchel Weintraub,et al. A theory and computational model of auditory monaural sound separation , 1985 .
[158] J. Pickles. An Introduction to the Physiology of Hearing, Second Edition , 1988 .
[159] P M Zurek,et al. Adjustment and discrimination measurements of the precedence effect. , 1993, The Journal of the Acoustical Society of America.
[160] Guy J. Brown,et al. Computational auditory scene analysis , 1994, Comput. Speech Lang..
[161] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[162] Richard L Freyman,et al. Auditory target detection in reverberation. , 2004, The Journal of the Acoustical Society of America.
[163] Q. Summerfield,et al. Auditory enhancement of changes in spectral amplitude. , 1987, The Journal of the Acoustical Society of America.
[164] Mark A. Clements,et al. A Computationally Compact Divergence Measure for Speech Processing , 1991, IEEE Trans. Pattern Anal. Mach. Intell..
[165] DeLiang Wang,et al. On the optimality of ideal binary time-frequency masks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.