Complex networks analysis of language complexity

Methods from statistical physics, such as those involving complex networks, have been increasingly used in the quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. Furthermore, in less complex texts the distance between concepts, represented as nodes, tended to decrease. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest paths and diversity. Also, the discrimination of complex texts was improved with higher hierarchical network metrics, thus pointing to the usefulness of considering wider contexts around the concepts. Though the accuracy rate in the distinction was not as high as in methods using deep linguistic knowledge, the complex network approach is still useful for a rapid screening of texts whenever assessing complexity is essential to guarantee accessibility to readers with limited reading ability.

[1]  Claudia Leacock,et al.  Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications , 2010 .

[2]  Luciano da Fontoura Costa,et al.  Supplementary Information-Identification of Literary Movements Using Complex Networks to Represent Texts , 2012 .

[3]  I. Ial,et al.  Nature Communications , 2010, Nature Cell Biology.

[4]  Pedro Carpena,et al.  Keyword detection in natural languages and DNA , 2002 .

[5]  D. Saad Europhysics Letters , 1997 .

[6]  G. Wergen,et al.  Records in stochastic processes—theory and applications , 2012, 1211.6005.

[7]  Haitao Liu,et al.  Can syntactic networks indicate morphological complexity of a language , 2011 .

[8]  Luciano da Fontoura Costa,et al.  Seeking for simplicity in complex networks , 2007, physics/0702102.

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[11]  Haitao Liu,et al.  What role does syntax play in a language network , 2008 .

[12]  R. Ferrer i Cancho Why do syntactic links not cross , 2006 .

[13]  J. Urry Complexity , 2006, Interpreting Art.

[14]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[15]  Proceedings of the American Philosophical Society , 2022 .

[16]  J. Nadal,et al.  International Journal of Modern Physics C C World Scientiic Publishing Company Neural Networks as Optimal Information Processors , 1994 .