Unit selection for Malay text-to-speech system using segmental context and simulated annealing

Unit selection method has become the main approach in speech synthesis. The increasing size of recorded speech has resulted in better synthesis speech quality but at the same time also resulted in more expensive computational effort. Therefore, this paper proposes a combination of segmental context matching procedure and Simulated Annealing (SA) in unit selection to improve the quality of synthetic speech and reduce the computational time. The process of unit selection is based on minimization of two costs: target cost and join cost. The segmental context (target cost), the first stage of unit selection matching procedure used to narrow down the search space, followed by an optimization method which is SA to find the units sequence with minimum join cost. Result shows that the synthesis words produced by the proposed system are 15.48% better compared to previous version of corpus-based Malay Text-to-Speech system. Future works may focus on combining SA with other heuristic methods to further enhancing the performance of unit selection.   Key words: Speech concatenation, unit selection, corpus based, heuristic method, simulated annealing.

[1]  A. Janicki,et al.  Taking advantage of pronunciation variation in unit selection speech synthesis for polish , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[2]  Takao Kobayashi,et al.  Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis , 2008, Speech Commun..

[3]  政子 鶴岡,et al.  1998 IEEE International Conference on SMCに参加して , 1998 .

[4]  John H. L. Hansen,et al.  A comparison of spectral smoothing methods for segment concatenation based speech synthesis , 2002, Speech Commun..

[5]  William Stallings,et al.  Wireless Communications and Networks , 2001, 2020 International Conference on Smart Systems and Technologies (SST).

[6]  M. H. Kim,et al.  Fast parallel simulated annealing for traveling salesman problem , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[7]  Ting-Yu Chen,et al.  Efficiency improvement of simulated annealing in optimal structural designs , 2002 .

[8]  R. Talafova,et al.  Indexing join costs for faster unit selection synthesis , 2008, 2008 15th International Conference on Systems, Signals and Image Processing.

[9]  Esther Klabbers,et al.  Synthesis of prosody using multi-level unit sequences , 2005, Speech Commun..

[10]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[11]  M. Montaz Ali,et al.  A direct search variant of the simulated annealing algorithm for optimization involving continuous variables , 2002, Comput. Oper. Res..

[12]  Helmut Mangold Speech Technology in Reality - Applications, Their Challenges and Solutions , 2001, TSD.

[13]  Ramez Elmasri,et al.  Optimizing clustering algorithm in mobile ad hoc networks using simulated annealing , 2003, 2003 IEEE Wireless Communications and Networking, 2003. WCNC 2003..

[14]  Manuel Duque-Antón Constructing Efficient Simulated Annealing Algorithms , 1997, Discret. Appl. Math..

[15]  Junichi Yamagishi,et al.  Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis , 2010, Speech Commun..

[16]  Tunga Güngör,et al.  A CORPUS-BASED CONCATENATIVE SPEECH SYNTHESIS SYSTEM FOR TURKISH , 2006 .

[17]  Tian Swee Tan,et al.  Corpus Design for Malay Corpus-based Speech Synthesis System , 2009 .

[18]  Aimilios Chalamandaris,et al.  A statistical method for database reduction for embedded unit selection speech synthesis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Tan Tian Swee,et al.  Wireless data gloves Malay sign language recognition system , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[20]  Albert Rilliard,et al.  Prosody Evaluation as a Diagnostic Process: Subjective vs. Objective Measurements , 2003, Int. J. Speech Technol..

[21]  Tian-Swee Tan,et al.  Implementation of Phonetic Context Variable Length Unit Selection Module for Malay Text to Speech , 2008 .

[22]  Paul C. Bagshaw,et al.  Concatenation cost calculation and optimisation for unit selection in TTS , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[23]  Tatsuya Kawahara,et al.  Admissible stopping in viterbi beam search for unit selection in concatenative speech synthesis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[25]  Eduardo Rodríguez Banga,et al.  A method for combining intonation modelling and speech unit selection in corpus-based speech synthesis systems , 2006, Speech Commun..

[26]  Jonathan Rose,et al.  Temperature measurement and equilibrium dynamics of simulated annealing placements , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[27]  Toshio Hirai,et al.  Using 5 ms segments in concatenative speech synthesis , 2004, SSW.

[28]  Simon King,et al.  Multisyn: Open-domain unit selection for the Festival speech synthesis system , 2007, Speech Commun..

[29]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[30]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[32]  S.H.S. Salleh,et al.  Corpus-based Malay text-to-speech synthesis system , 2008, 2008 14th Asia-Pacific Conference on Communications.