On the use of data-driven clustering technique for identification of poly- and mono-phonemes for four European languages

The research reported in this paper presents a method to identify poly- and mono-phonemes for four European languages. The functionality of the poly-phonemes is tested in two experiments, and a limited set of mono-phonemes is identified for a language-identification experiment. Ten acoustically-similar speech sounds were identified across the four languages British-English, Danish, German, and Italian. These sounds, which constitute a substantial proportion of the phonemes of each language, are designated as (language independent) poly-phonemes, and may serve as a multi-lingual training base for labelling and recognition systems. The remaining sounds of each language, which do not fulfil the similarity conditions, are dubbed mono-phonemes. Two application experiments were conducted. In the first the poly-phonemes are applied in a label alignment task. In the second a small selected of mono-phonemes for each of the four languages is used in a preliminary test of the ability of these sets to serve as language discriminators.<<ETX>>