The Reconstruction Engine: A Computer Implementation of the Comparative Method

We describe the implementation of a computer program, the Reconstruction Engine (RE), which models the comparative method for establishing genetic affiliation among a group of languages. The program is a research tool designed to aid the linguist in evaluating specific hypotheses, by calculating the consequences of a set of postulated sound changes (proposed by the linguist) on complete lexicons of several languages. It divides the lexicons into a phonologically regular part and a part that deviates from the sound laws. RE is bi-directional: given words in modern languages, it can propose cognate sets (with reconstructions); given reconstructions, it can project the modern forms that would result from regular changes. RE operates either interactively, allowing word-by-word evaluation of hypothesized sound changes and semantic shifts, or in a "batch" mode, processing entire multilingual lexicons en masse.We describe the algorithms implemented in RE, specifically the parsing and combinatorial techniques used to make projections upstream or downstream in the sense of time, the procedures for creating and consolidating cognate sets based on these projections, and the ad hoc techniques developed for handling the semantic component of the comparative method.Other programs and computational approaches to historical linguistics are briefly reviewed.Some results from a study of the Tamang languages of Nepal (a subgroup of the Tibeto-Burman family) are presented, and data from these languages are used throughout for exemplification of the operation of the program.Finally, we discuss features of RE that make it possible to handle the complex and sometimes imprecise representations of lexical items, and speculate on possible directions for future research.

[1]  John B. Lowe,et al.  Computerized Tools for Reconstruction in Tibeto-Burman , 1989 .

[2]  Philip Baldi,et al.  Linguistic change and reconstruction methodology , 1990 .

[3]  P. Kiparsky Explanation In Phonology , 1982 .

[4]  Charles L. Eastlack Iberochange: A program to simulate systematic sound change in Ibero-Romance , 1977 .

[5]  Martine Mazaudon,et al.  Du bon usage de l'informatique en linguistique historique , 1991 .

[6]  Martine Mazaudon Consonantal Mutation and Tonal Split in the Tamang Sub-family of Tibeto-Burman , 1978 .

[7]  George F. MacDonald,et al.  The Canadian Museum of Civilization , 1995 .

[8]  John Hewson,et al.  A computer-generated dictionary of proto-Algonquian , 1993 .

[9]  Anthony Ralston,et al.  Encyclopedia of Computer Science , 1971 .

[10]  Martin Kay THE LOGIC OF COGNATE RECOGNITION IN HISTORICAL LINGUISTICS , 1964 .

[11]  John Hewson Reconstructing Prehistoric Languages on the Computer: the triumph of the Electronic Neogrammarian , 1973, COLING.

[12]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[13]  Winfried Lenders,et al.  Computer-Aided Research in Comparative and Historical Linguistics Computergestützte Untersuchungen in der vergleichenden und historischen Linguistik , 1989 .

[14]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[15]  David Finkel Book review: Encyclopedia of Computer Science, Third Edition, edited by Anthony Ralston and Edwin I). Reilly (Van Nostrand Reinhold, 1993) , 1994, PERV.

[16]  James A. Matisoff Variational semantics in Tibeto-Burman : the "organic" approach to linguistic comparison , 1980 .

[17]  William S.-Y. Wang Project DOC: its Methodological Basis , 1969, COLING.

[18]  Drew McDermott,et al.  Introduction to artificial intelligence , 1986, Addison-Wesley series in computer science.

[19]  Henry M. Hoenigswald The Principal Step in Comparative Grammar , 1950 .

[20]  Henry M. Hoenigswald Language Change and Linguistic Reconstruction , 1960 .

[21]  John S. Wimbish,et al.  Wordsurv : a program for analyzing language survey word lists , 1989 .

[22]  Calvert Watkins,et al.  New Parameters in Historical Linguistics, Philology, and Culture History , 1989 .

[23]  Instituttet for sammenlignende kulturforskning,et al.  The Comparative Method in Historical Linguistics , 1967 .

[24]  Steven Lee Hartman A universal alphabet for experiments in comparative phonology , 1981, Comput. Humanit..

[25]  Sarah K. Burton-Hunter Romance etymology: A computerized model , 1976 .

[26]  Stanton P. Durham,et al.  An Application of Computer Programming to the Reconstruction of a Proto-Language , 1969, COLING.

[27]  M. Swadesh Salish Internal Relationships , 1950, International Journal of American Linguistics.

[28]  Donald G. Frantz A PL/1 program to assist the comparative linguist , 1970, CACM.

[29]  R. Shafer,et al.  Classification of the Sino-Tibetan Languages , 1955 .