Problematic use of Greenberg's linguistic classification of the Americas in studies of Native American genetic variation.

To the Editor: In recent years, there has been a burgeoning interest in comparisons of genetic and linguistic variation across human populations. This synthetic approach can be a powerful tool for reconstructing human prehistory, but only when the patterns of genetic and linguistic variation are accurately represented (Szathmary 1993). If one or both patterns are inaccurate, the resulting conclusions about human prehistory or gene-language correlations may be incorrect. Here, we present evidence that comparisons of genetic and linguistic variation in the Americas are problematic when they are based on Greenberg’s (1987) classification of Native American languages, for these very reasons. Greenberg (1987) argued that all Native American languages, except those of the “Na-Dene” and Eskimo-Aleut groups, are similar and can be classified into a single linguistic unit, which he called “Amerind.” His tripartite classification (Amerind, Na-Dene, and Eskimo-Aleut) was based on the method of multilateral comparison, which examines many languages simultaneously to detect similarities in a small number of basic words and grammatical elements (Greenberg 1987). Greenberg (1987) also suggested that his three language groupings represent three separate migrations to the Americas, and Greenberg et al. (1986) interpreted their synthesis of the linguistic, dental, and genetic evidence as supportive of this three-migration hypothesis. Over the past 18 years, this three-migration model has become entrenched in the genetics literature as the hypothesis against which new genetic data are tested (e.g., Torroni et al. 1993; Merriwether et al. 1995; Zegura et al. 2004), and Greenberg’s linguistic classification has been the primary scheme used in studies comparing genetic and linguistic variation in the Americas. Of 100 studies of Native American genetic variation published between 1987 and 2004, 61 cite Greenberg (1987) or Greenberg et al. (1986), and at least 19 others were influenced by his tripartite classification (15 studies use the Amerind, Na-Dene, and Eskimo-Aleut groupings, and 4 others use the similar language groupings of Greenberg’s student M. Ruhlen.) Whereas Greenberg’s classification has been widely and uncritically used by human geneticists, it has been rejected by virtually all historical linguists who study Native American languages. There are many errors in the data on which his classification is based (Goddard 1987; Adelaar 1989; Berman 1992; Kimball 1992; Poser 1992), and Greenberg’s criteria for determining linguistic relationships are widely regarded as invalid. His method of multilateral comparison assembled only superficial similarities between languages, and Greenberg did not distinguish similarities due to common ancestry (i.e., homology) from those due to other factors (which other linguists do). Linguistic similarities can also be due to factors such as chance, borrowing from neighboring languages, and onomatopoeia, so proposals of remote linguistic relationships are only plausible when these other possible explanations have been eliminated (Matisoff 1990; Mithun 1990; Goddard and Campbell 1994; Campbell 1997; Ringe 2000). Greenberg made no attempt to eliminate such explanations, and the putative long-range similarities he amassed appear to be mostly chance resemblances and the result of misanalysis—he compared many languages simultaneously (which increases the probability of finding chance resemblances), examined arbitrary segments of words, equated words with very different meanings (e.g., excrement, night, and grass), failed to analyze the structure of some words and falsely analyzed that of others, neglected regular sound correspondences between languages, and misinterpreted well-established findings (Chafe 1987; Bright 1988; Campbell 1988, 1997; Golla 1988; Goddard 1990; Rankin 1992; McMahon and McMahon 1995; Nichols and Peterson 1996). Consequently, empirical studies have shown that “the method of multilateral comparison fails every test; its results are utterly unreliable. Multilateral comparison is worse than useless: it is positively misleading, since the patterns of ‘evidence’ that it adduces in support of proposed linguistic relationships are in many cases mathematically indistinguishable from random patterns of chance resemblances” (Ringe 1994, p. 28; cf. Ringe 2002). Because of these problems, Greenberg’s methodology has proven incapable of distinguishing plausible proposals of linguistic relationships from implausible ones, such as Finnish-Amerind (Campbell 1988). Thus, specialists in Native American linguistics insist that Greenberg’s methodology was so flawed that it completely invalidates his conclusions about the unity of Amerind, and Greenberg himself estimated that 80%–90% of linguists agreed with this assessment (Lewin 1988). Given this, the use of Greenberg’s (1987) classification can confound attempts to understand the relationship between genetic and linguistic variation in the Americas. Many studies of Native American genetic variation continue to use this classification (e.g., Bortolini et al. 2002, 2003; Fernandez-Cobo et al. 2002; Lell et al. 2002; Gomez-Casado et al. 2003; Zegura et al. 2004). However, Hunley and Long (2004) recently showed that there is a poor fit between Greenberg’s classification and the patterns of Native American mtDNA variation. On the basis of their findings, we believe that Greenberg’s groupings should no longer be used in analyses of mtDNA variation. To further evaluate how the use of this classification influences our understanding of the relationship between genetic and linguistic variation in the Americas, we examined how well different linguistic classifications “explain” the patterns of Native American Y-chromosome variation. Data were compiled on the Y-chromosome haplogroups of 523 Native Americans, representing 36 populations (table 1). We compared hierarchical analyses of molecular variance (AMOVAs), using Greenberg’s (1987) classification and a more conservative one (Campbell 1997) that is widely accepted by specialists in historical linguistics of Native American languages (Golla 2000; Hill and Hill 2000). The AMOVAs were based on population frequencies of the haplogroups known to be pre–European contact Native American lineages (Q-M19, Q-M3*, Q-M242*, and C-M130). All calculations were performed by Arlequin 2.000 (Schneider et al. 2000). Table 1 Populations and Language Classifications Used in AMOVAs The AMOVAs show that differences among Greenberg’s three groups could account for some genetic variance (ΦCT=0.319; P=.027), but the more generally accepted linguistic classification (as given in Campbell [1997]) of the same populations (17 groups) explainsa greater proportion of the total genetic variance (ΦCT=0.448; P<.001). The magnitude of ΦCT increases 40.4% when the accepted language classification is used, which indicates that it is important to consider language classifications other than that of Greenberg (1987) when evaluating the relationship between genes and language in the Americas. Other factors, such as geography, have likely influenced patterns of genetic variation more than language, but accepted language groupings should, nonetheless, be used when exploring these relationships. Thus, in future studies comparing genetic and linguistic variation in the Americas, we recommend use of the consensus linguistic classification, as given in Campbell (1997), Goddard (1996), and Mithun (1999), rather than Greenberg’s tripartite classification (Greenberg et al. 1986; Greenberg 1987). In addition, since there is no legitimate reason to believe that “Amerind” is a unified group (linguistic or otherwise), it has been essentially abandoned in linguistics and should not be used in genetic analyses. Finally, because synthetic studies provide such important insights into human prehistory, we advocate continued collaboration between geneticists and linguists (and other anthropologists) to ensure accurate comparisons of genetic, linguistic, and cultural variation.

[1]  Ellen Woolford,et al.  The Settlement of the Americas: A Comparison of the Linguistic, Dental, and Genetic Evidence [and Comments and Reply] , 1986, Current Anthropology.

[2]  Joseph H. Greenberg,et al.  Language in the Americas , 1987 .

[3]  R. Lewin American Indian Language Dispute: Using a methodology not generally favored among linguists, a Stanford researcher has provoked outrage by proposing a revolutionary classification of American Indian languages. , 1988, Science.

[4]  Victor K. Golla Linguistic Anthropology: Language in the Americas. Joseph H. Green‐berg , 1988 .

[5]  M. Mithun Studies of North American Indian Languages , 1990 .

[6]  R. Rankin Language in the Americas. Joseph H. Greenberg , 1992 .

[7]  Geoffrey Kimball A Critique of Muskogean, "Gulf," and Yukian Material in "Language in the Americas" , 1992, International Journal of American Linguistics.

[8]  H. Berman A Comment on the Yurok and Kalapuya Data in Greenberg's Language in the Americas , 1992, International Journal of American Linguistics.

[9]  W. Poser The Salinan and Yurumanguí Data in Language in the Americas , 1992, International Journal of American Linguistics.

[10]  J V Neel,et al.  Asian affinities and continental radiation of the four founding Native American mtDNAs. , 1993, American journal of human genetics.

[11]  E. Szathmary,et al.  mtDNA and the peopling of the Americas. , 1993, American journal of human genetics.

[12]  F. Rothhammer,et al.  Distribution of the four founding lineage haplotypes in Native Americans suggests a single wave of migration for the New World. , 1995, American journal of physical anthropology.

[13]  R. McMahon,et al.  LINGUISTICS, GENETICS AND ARCHAEOLOGY: INTERNAL AND EXTERNAL EVIDENCE IN THE AMERIND CONTROVERSY* , 1995 .

[14]  Johanna Nichols,et al.  THE AMERIND PERSONAL PRONOUNS , 1996 .

[15]  Lyle Campbell,et al.  American Indian languages : the historical linguistics of Native America , 1999 .

[16]  Kenneth C. Hill,et al.  American Indian Languages , 2000 .

[17]  Victor K. Golla Lyle Campbell, American Indian languages: The historical linguistics of Native America. (Oxford studies in anthropological linguistics, 4.) Oxford & New York: Oxford University Press, 1997. Pp. xiv, 512. Hb $75.00. , 2000, Language in Society.

[18]  M. Kinkade,et al.  The Languages of Native North America , 2000 .

[19]  D. Ringe Indo-European and Its Closest Relatives (Book) , 2002 .

[20]  G. Bedoya,et al.  Y-chromosome biallelic polymorphisms and Native American population structure. , 2001, Annals of human genetics.

[21]  C. Ryschkewitsch,et al.  Strains of JC virus in Amerind-speakers of North America (Salish) and South America (Guaraní), Na-Dene-speakers of New Mexico (Navajo), and modern Japanese suggest links through an ancestral Asian population. , 2002, American journal of physical anthropology.

[22]  P. Underhill,et al.  The dual origin and Siberian affinities of Native American Y chromosomes. , 2002, American journal of human genetics.

[23]  D. Ringe Joseph H. Greenberg, Indo-European and its closest relatives: the Eurasiatic language family, vol. 1: Grammar. Stanford, CA: Stanford University Press, 2000. Pp. xiv+326. , 2002, Journal of Linguistics.

[24]  E. Lowy,et al.  Origin of Mayans according to HLA genes and the uniqueness of Amerindians. , 2003, Tissue antigens.

[25]  G. Bedoya,et al.  Y-chromosome evidence for differing ancient demographic histories in the Americas. , 2003, American journal of human genetics.

[26]  M. Hurles,et al.  High level of male-biased Scandinavian admixture in Greenlandic Inuit shown by Y-chromosomal analysis , 2003, Human Genetics.

[27]  M. Hammer,et al.  High-resolution SNPs and microsatellite haplotypes point to a single, recent entry of Native American Y chromosomes into the Americas. , 2003, Molecular biology and evolution.