ASCII Phonetic Symbols for the World''s Languages: Worldbet

A new ASCII encoding of the International Phonetic Alphabet (IPA) and additional symbols for speech database labeling has been designed for all languages. Many of the previous ASCII versions of the IPA were targeted at European languages and therefore left out many of the sounds of the other languages or used IPA symbols for non-European sounds like clicks, for plosive bursts. When an attempt was made to label a large number of languages with phonemic and phonetic symbols, these were found to be inadequate. The present scheme borrows on earlier work by George Allen, Ian Maddieson, John Wells, Laver et al. and Hieronymus et al. Wherever possible, the present scheme was made similar to the base IPA symbols, so that many of the symbols will seem to have obvious meanings. Many of the symbols are the same as other schemes. The underlying principle is that any spectrally and temporally distinct speech sound (not including pitch) which is phonemic in some language should have a separate base symbol. In most cases the base symbol consists of a concatination of an IPA symbol and diacritics. Thus it is easy to recognize the phonemic base symbols and compare the same broad phonetic sound across languages. Tone languages have diacritics applied to the vowel phoneme symbols to properly identify the phonemes in these languages. Allophonic variations due to contextural coarticulation and stress may be labelled by a diacritic attached to the base symbol. It is possible that some speech sounds which are phonemic in at least one of the world's languages, are missing from the present version. It is hoped that any oversights will be corrected in subsequent versions of Worldbet, and a standard method for constructing new symbols is presented. Introduction Many systems have been developed for writing the sounds of the world's languages. Many of the early workers made their own systems because there was no agreed standard or indeed knowledge of the complete speech sound inventory. The International Phonetic Alphabet was developed in 1888 and revised several times into its present form. It represents 105 years of experience with putting a symbol to each sound in all of the known languages in the world. The issues of economy of representation and the distinction between allophonic variation and true baseform sound have been worked out for many more languages since the IPA was originally formulated. Therefore it is a good place to begin for any multilanguage speech database labelling e ort. There are some sounds which are not normally included in the IPA which have been found to be useful in labelling large speech corpora like TIMIT, SCRIBE, BDSONS, and PHONDAT. These modern attempts at a standard ascii form of the IPA resulted in TIMITBET, MRPA, SAMPA, and SAMPA Extended to name a few of them. These phonetic alphabets were restricted to English or to European languages, and thus were too restricted in scope to be used in other major language families. The issue is whether or not the ascii representation is consistent, complete and logical for all of the IPA symbols. Worldbet is an attempt to have a phonetic alphabet which covers all of the world's languages in a systematic fashion. It is an ascii version of the IPA plus a number of symbols which were found useful in database labelling, which are not currently in the o cial IPA set. This list of extra symbols may grow with time until all of the important phenomena have a coherent symbol representation. This paper is organized to rst cover the general principles of Worldbet, discuss earlier labeling sets, give speci c symbol assignments, and discuss labeling methods. In Appendix A is an exhaustive list the Worldbet symbols and their corresponding labels in a few other systems, namely TIMITBET, SAMPA and JBET a phonetic alphabet used in speech synthesis. Appendix B is a table of place and manner of articulation v.s. Worldbet symbols. In Appendix C there are examples of Worldbet symbol inventories for several languages.