Normalization of Non Standard Words for Kannada Speech Synthesis

 Abstract: The purpose of summary of an article is to facilitate quick and accurate identification of the topic of published document. The objective is to save a prospective reader's time and effort in finding the useful information in a given article. This paper considers the task of text normalization in concatinative Text To Speech (TTS) synthesis for Kannada language. The main focus is to have a single document summarization tool based on statistical approach. This deals on how non standard Kannada words - acronyms, abbreviations, proper names derived from other languages or clutters, phone numbers, decimal numbers, fractions, ordinary numbers, sequence of numbers, money, dates, measures, titles, times and symbols - are preprocessed before passing it to the TTS system as an input. The paper also discusses about the methodology used to normalize the non Kannada text present in the input text to get an equivalent Kannada as output. The method uses a fast lexical analyzer, Jflex to scan the input to find the non standard words in the given input document.