Computational methods are invaluable for typology, but the models must match the questions

Linguistic Typology 15 (2011), 393–399 1430–0532/2011/015-0393 DOI 10.1515/LITY.2011.026 ©Walter de Gruyter The primary goal of Dunn et al. (2011) is to understand the role that lineage plays in linguistic typology in general, and word order feature correlations in particular. The authors approach this goal through a combination of a large amount of linguistic data – lexical and typological-feature – and statistical analysis. Approaches based on the use of such data, especially as the complexity of available data grows, and analysis are likely to be central to the future of linguistic typology. They allow facts about diverse languages to speak directly to major theoretical questions about feature correlation and historical change, while accounting for how evidence is aggregated across languages known to be related by common ancestry. We applaud Dunn et al. for applying such methods to study word order universals. However, in order to get meaningful answers, one needs to ask the right statistical questions. For each of many feature pairs, the authors pose two central theoretical questions: (Q1) “Is there evidence for correlation in the evolution of the features, taking into account common ancestry?” and (Q2) “Is there evidence that feature correlation is lineage specific?”. The evidence they adduce to answer these questions is based on statistical tests for correlation within each lineage, and thus bears fairly directly on Q1. Additionally, Dunn et al. propose an affirmative answer to Q2, noting that for no feature pair was evidence for correlated evolution found in every lineage. Unfortunately, this result does not necessarily support their affirmative answer to Q2. In this commentary we explain in somewhat more detail the precise nature of the statistical tests conducted by Dunn et al. and why their results do not bear directly on Q2. To foreshadow the conclusion, Dunn et al. have presented the strength of evidence for an effect but interpreted it as the strength of the effect itself. The difference between the two often boils down to sample size and sample diversity, as we hope to convey in the following discussion. The statistical method used by Dunn et al., the Bayesian hypothesis test, is very simple at heart: it involves constructing two hypotheses – formally speaking, generative processes that might underlie the observed data – and