Are crossing dependencies really scarce?

The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Here we quantify the amount of crossings of real sentences and compare it to the predictions of a series of baselines. We conclude that crossings are really scarce in real sentences. Their scarcity is unexpected by the hubiness of the trees. Indeed, real sentences are close to linear trees, where the potential number of crossings is maximized.

[1]  A. Cayley A theorem on trees , 2009 .

[2]  Alessandro Vespignani,et al.  Evolution and structure of the Internet , 2004 .

[3]  Alessandro Flammini,et al.  Optimal traffic networks , 2006, ArXiv.

[4]  Carlos Gómez-Rodríguez Restricted Non-Projectivity: Coverage vs. Efficiency , 2016, Computational Linguistics.

[5]  Ramon Ferrer-i-Cancho,et al.  Quantifying the Semantic Contribution of Particles , 2002, J. Quant. Linguistics.

[6]  Andrei Z. Broder,et al.  Generating random spanning trees , 1989, 30th Annual Symposium on Foundations of Computer Science.

[8]  Ramon Ferrer-i-Cancho,et al.  Non-crossing dependencies: least effort, not grammar , 2014, ArXiv.

[9]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[10]  Joakim Nivre,et al.  Divisible Transition Systems and Multiplanar Dependency Parsing , 2013, CL.

[11]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[13]  Roy Schwartz,et al.  Learnability-Based Syntactic Annotation Design , 2012, COLING.

[14]  Rudolf Rosa,et al.  HamleDT 2.0: Thirty Dependency Treebanks Stanfordized , 2014, LREC.

[15]  David Aldous,et al.  The Random Walk Construction of Uniform Spanning Trees and Uniform Labelled Trees , 1990, SIAM J. Discret. Math..

[16]  Marc Noy,et al.  Enumeration of noncrossing trees on a circle , 1998, Discret. Math..

[17]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[18]  Ines Rehbein,et al.  Universal Dependencies are Hard to Parse - or are They? , 2017, DepLing.

[19]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[20]  Rudolf Rosa Multi-source Cross-lingual Delexicalized Parser Transfer: Prague or Stanford? , 2015, DepLing.

[21]  Reuven Cohen,et al.  Self-similarity in complex networks , 2010 .

[22]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[23]  Carlos Gómez-Rodríguez,et al.  The scarcity of crossing dependencies: a direct outcome of a specific constraint? , 2016, Physical review. E.

[24]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[25]  Ramon Ferrer-i-Cancho,et al.  A stronger null hypothesis for crossing dependencies , 2014, ArXiv.

[26]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[27]  A. Annibale,et al.  Constrained Markovian Dynamics of Random Graphs , 2009, 0905.4155.

[28]  Ramon Ferrer-i-Cancho,et al.  Random crossings in dependency trees , 2013, Glottometrics.

[29]  Haitao Liu,et al.  Dependency direction as a means of word-order typology: A method based on dependency treebanks , 2010 .

[30]  Reuven Cohen,et al.  Complex Networks: Structure, Robustness and Function , 2010 .

[31]  H. Herne,et al.  How to Lie with Statistics , 1973 .

[32]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[33]  R. F. Cancho Euclidean distance between syntactically linked words. , 2004 .

[34]  Roger Levy,et al.  Minimal-length linearizations for mildly context-sensitive dependency trees , 2009, NAACL.

[35]  Ramon Ferrer-i-Cancho,et al.  Crossings as a side effect of dependency lengths , 2015, Complex..

[36]  Ondrej Dusek,et al.  HamleDT: Harmonized multi-language dependency treebank , 2014, Lang. Resour. Evaluation.

[37]  Richard Futrell,et al.  Large-scale evidence of dependency length minimization in 37 languages , 2015, Proceedings of the National Academy of Sciences.

[38]  Daniel Gildea,et al.  Do Grammars Minimize Dependency Length? , 2010, Cogn. Sci..

[39]  Carlos Gómez-Rodríguez,et al.  The scaling of the minimum sum of edge lengths in uniformly random trees , 2016, ArXiv.

[40]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[41]  E. Todeva Networks , 2007 .

[42]  Edward A. Bender,et al.  The Asymptotic Number of Labeled Graphs with Given Degree Sequences , 1978, J. Comb. Theory A.

[43]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[44]  Edward Gibson,et al.  The processing of extraposed structures in English , 2012, Cognition.

[45]  Haitao Liu,et al.  The risks of mixing dependency lengths from sequences of different length , 2013, ArXiv.

[46]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[47]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[48]  Ramon Ferrer-i-Cancho,et al.  Hubiness, length, crossings and their relationships in dependency trees , 2013, ArXiv.

[49]  D. G. Hays Dependency Theory: A Formalism and Some Observations , 1964 .

[50]  Michael T. Gastner,et al.  The spatial structure of networks , 2006 .

[51]  R. Ferrer i Cancho Why do syntactic links not cross , 2006 .

[52]  E S Roberts,et al.  Unbiased degree-preserving randomization of directed binary networks. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.