The universal "shape" of human languages: spectral analysis beyond speech

A common recommendation for linguists is that, when analyzing language, they should take a "man-from-mars", know-nothing, perspective to describe the structures that they observe, if possible limiting themselves to the mechanical application of a set of well-defined techniques and criteria. Unfortunately, the complex nature of linguistic structures makes it difficult to adopt such a detached perspective. A language, as represented by a corpus of text, can be described macroscopically by the symbolic periodogram of the corpus, analogous to the spectrograms commonly used for describing speech. Here, I show that the periodogram exhibits a universal "shape" from human languages, and this shape originates in known properties of the human mind. Despite the universality of the overall pattern, subtle differences also reveal particularities of individual languages. These differences demonstrate the long-held --but unproven-- hypothesis that human languages balance the amount of structure contained in different levels of description so that the total amount of linguistic structure remains fairly constant across languages. The universal pattern found in the periodograms illustrates how the biological properties of the mind constrain the structure of human languages.