论文信息 - MDL-based Models for Alignment of Etymological Data

MDL-based Models for Alignment of Etymological Data

We introduce several models for alignment of etymological data, that is, for finding the best alignment, given a set of etymological data, at the sound or symbol level. This is intended to obtain a means of measuring the quality of the etymological data sets, in terms of their internal consistency. One of our main goals is to devise automatic methods for aligning the data that are as objective as possible, the models make no a priori assumptions—e.g., no preference for vowel-vowel or consonant-consonant alignments. We present a baseline model and several successive improvements, using data from the Uralic language family.

Hannes Wettig | Roman Yangarber | Suvi Hiltunen

[1] Luay Nakhleh,et al. An experimental study comparing linguistic phylogenetic reconstruction methods , 2013 .

[2] Hannes Wettig,et al. Probabilistic Models for Alignment of Etymological Data , 2011, NODALIDA.

[3] T. Warnow,et al. Perfect Phylogenetic Networks: A New Methodology for Reconstructing the Evolutionary History of Natural Languages , 2005 .

[4] Jorma Rissanen,et al. Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[5] Grzegorz Kondrak,et al. Combining Evidence in Cognate Identification , 2004, Canadian AI.

[6] Dan Klein,et al. A Probabilistic Approach to Diachronic Phonology , 2007, EMNLP-CoNLL.

[7] Grzegorz Kondrak,et al. Identifying Complex Sound Correspondences in Bilingual Wordlists , 2003, CICLing.

[8] Grzegorz Kondrak. Determining Recurrent Sound Correspondences by Inducing Translation Models , 2002, COLING.

[9] Paul M. B. Vitányi,et al. Author ' s personal copy A Fast Quartet tree heuristic for hierarchical clustering , 2010 .

[10] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[11] Tandy Warnow,et al. Indo‐European and Computational Cladistics , 2002 .

[12] Denis Sinor,et al. The Uralic languages : description, history and foreign influences , 1988 .

[13] Paul M. B. Vitányi,et al. Clustering by compression , 2003, IEEE Transactions on Information Theory.