A novel pitch cycle detection algorithm for tele monitoring applications

Recent advancements in the field of artificial intelligence has resulted in development of natural language processing using speech signals recorded over communication channels. Typically these speech recordings are segmented over a fixed frame length, usually 25ms, and used for training and testing mathematical models. For applications like speaker recognition and verification, this traditional methodology results in inclusion of lot of unvoiced and silence segments. To eliminate these redundancies, in this paper we propose a novel pitch synchronous segmentation algorithm robust to noise content and variations in speech recording parameters like sampling frequency. With two datasets coming from a handheld device and noise proof studio quality recordings, performance of this algorithm has been tested. With a maximum accuracy of 96.5%, this algorithm provides encouraging results.

[1]  Lei Xie,et al.  A waveform representation framework for high-quality statistical parametric speech synthesis , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[2]  A G Ramakrishnan,et al.  Detection of the closure-burst transitions of stops and affricates in continuous speech using the plosion index. , 2014, The Journal of the Acoustical Society of America.

[3]  Mark Hasegawa-Johnson,et al.  Improvement of Probabilistic Acoustic Tube model for speech decomposition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  A G Ramakrishnan,et al.  Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. , 2015, The Journal of the Acoustical Society of America.

[6]  T. Nagarajan,et al.  Estimation of glottal closure instants from telephone speech using a group delay-based approach that considers speech signal as a spectrum , 2015, INTERSPEECH.

[7]  J. H. Chung,et al.  Pitch synchronous cepstrum for robust speaker recognition over telephone channels , 2004 .

[8]  Feng Huang,et al.  A method of speech periodicity enhancement using transform-domain signal decomposition , 2015, Speech Commun..

[9]  Samuel Kim,et al.  A pitch synchronous feature extraction method for speaker recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Ahmad Salman,et al.  Learning Speaker-Specific Characteristics With a Deep Neural Architecture , 2011, IEEE Transactions on Neural Networks.

[11]  G. Holmgren,et al.  Speaker recognition, speech characteristics, speech evaluation, and modification of speech signal--A selected bibliography , 1966 .

[12]  A. G. Ramakrishnan,et al.  Pitch-synchronous DCT features: A pilot study on speaker identification , 2018, ArXiv.