An Algorithm for Identifying Cognates in Bilingual Wordlists and its Applicability to Machine Translation

Abstract This article describes and discusses the theoretical model and the practical algorithm behind the program COGNATE which has been available since December 1991 in the pc/linguistics subdirectory of the anonymous ftp site garbo.uwasa.fi of the University of Vaasa, Finland, and at mirror sites. Given a list of words in any two languages transcribed in a phonemic or loosely phonemic alphabetical system, the program identifies probable letter correspondences and estimates how likely the members of each word pair are to be related or unrelated. The algorithm attempts to bring the beginning of a solution to two problems which are equivalent under the model used: