Usable speech processing: a filterless approach in the presence of interference

A speech signal is commonly corrupted by non-stationary interference in different environments. The interference distributes itself nonuniformly over the temporal regions of the signal. This makes certain segments more speech dominant and hence, “usable” for speech processing applications like speaker identification and speech recognition. The main part of this paper examines the detection of usable segments in a co-channel environment having two interfering speakers. A method based on determining the Target-to-Interferer Ratio to detect the usable portions in which one speaker is very dominant over the other is discussed. A short discussion of additive noise distortion follows. Experimental results are presented.

[1]  Donald G. Childers,et al.  Speech processing and synthesis toolboxes , 1999 .

[2]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[3]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[4]  Yves Talpaert Differential geometry with applications to mechanics and physics , 2000 .

[5]  Richard W. Hamming,et al.  The Art of Probability for Scientists and Engineers , 1991 .

[6]  Ananth N. Iyer,et al.  Speaker identification improvement using the usable speech concept , 2004, 2004 12th European Signal Processing Conference.

[7]  William S. Rayens,et al.  Independent Component Analysis: Principles and Practice , 2003, Technometrics.

[8]  Mübeccel Demirekler,et al.  An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification , 2000, Speech Commun..

[9]  Matiur Rahman,et al.  Applied vector analysis , 2001 .

[10]  Jae S. Lim,et al.  Speech enhancement , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Bernard D. Flury,et al.  Why Multivariate Statistics , 1997 .

[12]  Henry Stark,et al.  Probability, Random Processes, and Estimation Theory for Engineers , 1995 .

[13]  Luc Gagnon,et al.  Nonlinear processing of phase vocoded speech , 1990 .

[14]  G. Thomas Calculus and Analytic Geometry , 1953 .

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Bradley W. Jackson,et al.  Applied Combinatorics With Problem Solving , 1989 .

[17]  Richard Ricart Speaker Identification Technology. , 1996 .

[18]  Robert E. Yantorno,et al.  FUSION - THE NEXT STEP IN USABLE SPEECH DETECTION , 2001 .

[19]  Brett Y. Smolenski,et al.  Co-channel speaker segment separation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  F. Takens Detecting strange attractors in turbulence , 1981 .

[21]  Stanley J. Wenndt,et al.  Developing usable speech criteria for speaker identification technology , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[22]  G. Seber,et al.  Nonlinear Regression: Seber/Nonlinear Regression , 2005 .

[23]  Brett Y. Smolenski,et al.  Enhancement of Speaker Identification using SID-usable speech , 2005, 2005 13th European Signal Processing Conference.

[24]  Stanley J. Wenndt,et al.  Spectral autocorrelation ratio as a usability measure of speech segments under co-channel conditions , 2000 .

[25]  Henry W. Altland,et al.  Regression Analysis: Statistical Modeling of a Response Variable , 1998, Technometrics.

[26]  C. L. Nikias,et al.  Signal processing with alpha-stable distributions and applications , 1995 .

[27]  Fa-Long Luo,et al.  Applied neural networks for signal processing , 1997 .

[28]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[29]  William L. Hays,et al.  Statistical Tests and Experimental Design: A Guidebook. , 1985 .

[30]  Peter J. W. Rayner,et al.  Digital Audio Restoration: A Statistical Model Based Approach , 1998 .

[31]  Richard J. Mammone,et al.  New LP-derived features for speaker identification , 1994, IEEE Trans. Speech Audio Process..

[32]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[33]  Stanley J. Wenndt,et al.  Adjacent pitch period comparison (appc) as a usability measure of speech segments under co-channel conditions , 2001 .

[34]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[35]  Pramod K. Varshney,et al.  Distributed Detection and Data Fusion , 1996 .

[36]  Douglas D. O'Shaughnessy Speech Communications: Human and Machine , 2012 .

[37]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[38]  D. L. Hall,et al.  Mathematical Techniques in Multisensor Data Fusion , 1992 .

[39]  Robert E. Yantorno Co-Channel Speech and Speaker Identification Study , 1998 .

[40]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[41]  Diane K. Michelson,et al.  Applied Statistics for Engineers and Scientists , 2001, Technometrics.

[42]  Richard J. Mammone,et al.  Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions , 1998, IEEE Trans. Speech Audio Process..

[43]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[44]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[45]  Mourad Barkat,et al.  Signal detection and estimation , 1991 .

[46]  J. Astola,et al.  Fundamentals of Nonlinear Digital Filtering , 1997 .