A case for formant analysis in forensic speaker identification

Views differ on the relative importance for forensic speaker identification of different aspects of the speech signal. It is argued here that formants, whose frequencies and dynamics are the product of the interaction of an individual vocal tract with the idiosyncratic articulatory gestures needed to achieve linguistically agreed targets, are so central to speaker identity that they must play a pivotal role in speaker identification. As a practical demonstration a case is described in which F1, F2 analysis of a vowel and F2 analysis of three diphthongs show a consistent separation between two recordings, thus effectively eliminating a suspect from having made obscene telephone calls. Subsequent additional analysis, based on the statistical distribution of formant frequency estimates throughout the samples, confirms the distinctness of the voice of the suspect and that of the obscene caller. The theoretical foundation for several kinds of formant-based analysis is then discussed.