Instantaneous frequency and detection of dynamics in speech

We show that instantaneous frequency (IF) can be used for automatic detection of speech dynamics. A very simple procedure for IF estimation is developed which spends much less computer resources in comparison with standard IF calculation methods. The input file is divided into frames, and IFs are calculated at points related to local maximums in frame. All inferences about presence of speech dynamics in the file are done on the base of distributions of IFs inside of the frames. Efficiency of considered technique is demonstrated by the problem of source separation.

[1]  Tetsuji Ogawa,et al.  Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Colleen Richey,et al.  Emotion detection in speech using deep networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Yifan Gong,et al.  Fundamentals of speech recognition , 2015 .

[5]  Emmanuel Vincent,et al.  A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. II. A/lgorithms and applications , 1992, Proc. IEEE.

[7]  Richard G. Lyons,et al.  Understanding Digital Signal Processing , 1996 .

[8]  Haizhou Li,et al.  Short-timed speech dynamics for speaker recognition , 1995 .

[9]  Fanrang Kong,et al.  Doppler Effect removal based on instantaneous frequency estimation and time domain re-sampling for wayside acoustic defective bearing detector system , 2014 .

[10]  Ali Shahzadi,et al.  Recognition of emotion using non-linear dynamics of speech , 2014, 7'th International Symposium on Telecommunications (IST'2014).

[11]  Stacy Marsella,et al.  The appraisal equivalence hypothesis: Verifying the domain-independence of a computational model of emotion dynamics , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[12]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals , 1992, Proc. IEEE.

[13]  Ryan M. Corey,et al.  Nonstationary source separation for underdetermined speech mixtures , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.