Hubiness, length, crossings and their relationships in dependency trees

Here tree dependency structures are studied from three different perspectives: their degree variance (hubiness), the mean dependency length and the number of dependency crossings. Bounds that reveal pairwise dependencies among these three metrics are derived. Hubiness (the variance of degrees) plays a central role: the mean dependency length is bounded below by hubiness while the number of crossings is bounded above by hubiness. Our findings suggest that the online memory cost of a sentence might be determined not just by the ordering of words but also by the hubiness of the underlying structure. The 2nd moment of degree plays a crucial role that is reminiscent of its role in large complex networks.

[1]  SO HIRANUMA,et al.  Syntactic difficulty in English and Japanese: A textual study , 2022 .

[2]  Béla Bollobás,et al.  Modern Graph Theory , 2002, Graduate Texts in Mathematics.

[3]  Reinhard Köhler,et al.  Patterns in syntactic dependency networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  R. F. Cancho Euclidean distance between syntactically linked words. , 2004 .

[5]  D. G. Hays Dependency Theory: A Formalism and Some Observations , 1964 .

[6]  Jan Andres,et al.  On a Conjecture about the Fractal Structure of Language , 2010, J. Quant. Linguistics.

[7]  Gabriel Altmann,et al.  On stratification in poetry , 2011, Glottometrics.

[8]  Peter Zörnig Statistical simulation and the distribution of distances between identical elements in a random sequence , 2010, Comput. Stat. Data Anal..

[9]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[10]  Edward Gibson,et al.  Consequences of the Serial Nature of Linguistic Input for Sentenial Complexity , 2005, Cogn. Sci..

[11]  Richard Hudson,et al.  Language Networks: The New Word Grammar , 2007 .

[12]  Gabriel Altmann,et al.  The Art of Quantitative Linguistics , 1997, J. Quant. Linguistics.

[13]  R. Ferrer i Cancho Why do syntactic links not cross , 2006 .

[14]  Ronald Rousseau,et al.  George Kingsley Zipf: life, ideas, his law and informetrics , 2002, Glottometrics.

[15]  Haitao Liu,et al.  Dependency Distance as a Metric of Language Comprehension Difficulty , 2008 .

[16]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[17]  Jan Andres,et al.  On de Saussure’s principle of linearity and visualization of language structures , 2009 .

[18]  Emília Nemcová,et al.  Diversifikation deutscher morphologischer Klassen in SMS , 2009, Glottometrics.

[19]  Glyn Morrill,et al.  Incremental processing and acceptability , 2000, CL.

[20]  D. Corson Using English Words , 1995 .

[21]  Gabriel Altmann,et al.  Stratification in musical texts based on rank-frequency distribution of tone pitches , 2009, Glottometrics.

[22]  Morten H. Christiansen,et al.  Similar neural correlates for language and sequential learning: Evidence from event-related brain potentials , 2012, Language and cognitive processes.

[23]  Alessandro Vespignani,et al.  Evolution and Structure of the Internet: A Statistical Physics Approach , 2004 .

[24]  Alessandro Vespignani,et al.  Evolution and structure of the Internet , 2004 .

[25]  Howard Jackson,et al.  Words, Meaning and Vocabulary: An Introduction to Modern English Lexicology , 2002 .

[26]  Laurel J. Brinton,et al.  The Structure of Modern English: A linguistic introduction , 2000 .

[27]  Gabriel Altmann,et al.  Some problems of musical texts , 2008, Glottometrics.

[28]  Laura M. Chihara,et al.  Mathematical Statistics with Resampling and R , 2011 .

[29]  Robert Hochberg,et al.  Optimal one-page tree embeddings in linear time , 2003, Inf. Process. Lett..

[30]  Deniz Yuret Lexical Attraction Models of Language , 2007 .

[31]  Anton Markoš,et al.  Language Metaphors of Life , 2010, Biosemiotics.

[32]  Andrew Wilson Vocabulary richness and thematic concentration in internet fetish fantasies and literary short stories , 2009 .

[33]  Jirí Havelka Beyond Projectivity: Multilingual Evaluation of Constraints and Measures on Non-Projective Structures , 2007, ACL.

[34]  J. Crow,et al.  A Semantic Field Approach to Passive Vocabulary Acquisition for Reading Comprehension. , 1985 .

[35]  N B Todd,et al.  Methodological Note. , 1964, Science.

[36]  Reinhard Köhler,et al.  The distribution of parts-of-speach in Russian texts , 2010, Glottometrics.

[37]  V. Kubon,et al.  On complexity of word order , 2000 .

[38]  Gabriel Altmann Zipfian linguistics , 2002, Glottometrics.

[39]  Maria Antònia Martí,et al.  Cat3LB and Cast3LB: From Constituents to Dependencies , 2006, FinTAL.

[40]  Gabriel Altmann,et al.  The Lambda-structure of Texts , 2012 .

[41]  J. Wolfowitz,et al.  An Introduction to the Theory of Statistics , 1951, Nature.

[42]  Carlo Strapparava,et al.  Semantic Domains in Computational Linguistics , 2009 .

[43]  Marc Noy,et al.  Enumeration of noncrossing trees on a circle , 1998, Discret. Math..

[44]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[45]  Ramon Ferrer i Cancho,et al.  When language breaks into pieces. A conflict between communication through isolated signals and language. , 2006, Bio Systems.

[46]  David Temperley,et al.  Dependency-length minimization in natural and artificial languages* , 2008, J. Quant. Linguistics.

[47]  Gabriel Altmann,et al.  Zipf’s law—another view , 2010 .

[48]  J. Hawkins Efficiency and complexity in grammars , 2004 .