Quantitative Classification of Indo-European Languages

In 1928 the Polish anthropologist Jan Czekanowski published in the ethnographical quarterly Lud' a study of the Indo-European languages in which he employed the method of differential diagnosis by quantitative correlation determinations which he had long been using with success in physical anthropology and ethnography. This method, whatever its field, rests upon the recognition of isolable and definable features or traits, which we shall hereafter refer to as elements, whose presence or absence can be determined for a number of populational groups or territorial entities, such as races, tribes, cultures, castes, or, in the present study, languages. The distribution of these is tabulated in terms of plus for presence in a particular group, minus for absence, and the question mark for unknown. Then each group is compared with each of the other groups in terms of the four-cell segregation familiar to statisticians. That is to say, four values are determined: a represents the number of elements common to both groups, b the number present in the first but absent in the second, c the number absent in the first but present in the second, and d the number absent in both. In other words, a and d are agreements, positive and negative respectively; b and c are disagreements. These four values are then substituted in a suitable formula, and a coefficient of similarity between the two groups results. When the coefficients for each pair of the groups being considered are assembled, we get a classification of the relative degrees of