Automatic discrimination among languages based on prosody alone

The development of methods for the automatic identification of languages is motivated both by speech-based applications intended for use in a multi-lingual environment, and by theoretical questions of cross-linguistic variation and similarity. We evaluate the potential utility of two prosodic variables, F$_{0}$ and amplitude envelope modulation, in a pairwise language discrimination task. Discrimination is done using a novel neural network which can successfully attend to temporal information at a range of timescales. Both variables are found to be useful in discriminating among languages, and confusion patterns, in general, reflect traditional intonational and rhythmic language classes. The methods employed allow empirical determination of prosodic similarity across languages.