Identification of regional variants of high German from digit sequences in German telephone speech

From the German SpeechDat(M) database of telephone speech the digit sequences items that were spoken as chains of individual digits were extracted. From these digit strings, a subset of 39 strings was selected by dialect experts and according to the region information provided by the speaker. The German federal states were used as region classes because this information can easily be provided by the speaker. 7 test persons were asked to listen to the subset of digit strings and to classify them by region. It was found that the overall success rate for the classification is 40%; if the regions neighboring the correct region are also counted as correct, the success rate is 68%.