Automatic speech segmentation to improve speech synthesis performance

Emerging growth of information and communication technologies has influenced the research trends to focus on speech technologies. Pre-processing of speech signal serves various purposes in any speech processing application. It includes noise removal, endpoint detection, pre-emphasis, framing, windowing, echo canceling etc. Out of these, automatic word/sentence boundary detection is the fundamental step for applications like speech recognition and speech synthesis. This paper expose the problem of automatic words and sentences boundary detection in the silent and noisy situations. This study proposes an algorithm for automatic segmentation of Indian languages voiced speech. A modified data based scheme of finding the entropy of the speech data is placed for better performance. The entropy based method expose good performance features than the energy-based methods. To determine the candidates speech segments, adaptive threshold is used which are related to words and sentences. Simulation results expose that this algorithm will provide better performance than energy based algorithms.

[1]  Chai Wutiwiwatchai,et al.  Thai syllable segmentation for connected speech based on energy , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[2]  Hema A Murthy,et al.  Design and Development of a Text-To-Speech Synthesizer for Indian Languages , .

[3]  Karsten P. Ulland,et al.  Vii. References , 2022 .

[4]  Susanne Burger,et al.  Syllable detection in read and spontaneous speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[6]  Jean-Claude Junqua,et al.  A robust algorithm for word boundary detection in the presence of noise , 1994, IEEE Trans. Speech Audio Process..

[7]  Leonard Webster,et al.  Comparison of energy-based endpoint detectors for speech signal processing , 1996, Proceedings of SOUTHEASTCON '96.

[8]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[9]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.