Comparison of directed and weighted co-occurrence networks of six languages

To study commonalities and differences among different languages, we select 100 reports from the documents of the United Nations, each of which was written in Arabic, Chinese, English, French, Russian and Spanish languages, separately. Based on these corpora, we construct 6 weighted and directed word co-occurrence networks. Besides all the networks exhibit scale-free and small-world features, we find several new non-trivial results, including connections among English words are denser, and the expression of English language is more flexible and powerful; the connection way among Spanish words is more stringent and this indicates that the Spanish grammar is more rigorous; values of many statistical parameters of the French and Spanish networks are very approximate and this shows that these two languages share many commonalities; Arabic and Russian words have many varieties, which result in rich types of words and a sparse connection among words; connections among Chinese words obey a more uniform distribution, and one inclines to use the least number of Chinese words to express the same complex information as those in other five languages. This shows that the expression of Chinese language is quite concise. In addition, several topics worth further investigating by the complex network approach have been observed in this study.

[1]  Yong-Zhou Chen,et al.  A study on some urban bus transport networks , 2007 .

[2]  Alexander Mehler,et al.  Automatic Language Classification by means of Syntactic Dependency Networks , 2011, J. Quant. Linguistics.

[3]  Jonathon N. Cummings,et al.  Structural properties of work groups and their consequences for performance , 2003, Soc. Networks.

[4]  Giles,et al.  Searching the world wide Web , 1998, Science.

[5]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[6]  Haitao Liu,et al.  Dependency direction as a means of word-order typology: A method based on dependency treebanks , 2010 .

[7]  Haitao Liu,et al.  Statistical properties of Chinese semantic networks , 2009 .

[8]  Mariano Sigman,et al.  Global organization of the Wordnet lexicon , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Chi K. Tse,et al.  Comparison of co-occurrence networks of the Chinese and English languages , 2009 .

[10]  Wei Liang,et al.  Study on co-occurrence character networks from Chinese essays in different periods , 2011, Science China Information Sciences.

[11]  Haitao Liu,et al.  Language clustering with word co-occurrence networks based on parallel texts , 2013 .

[12]  Peter Nather,et al.  Language as a Small World Network , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[13]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[14]  Long Sheng,et al.  English and Chinese languages as weighted complex networks , 2009 .

[15]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[16]  Ricard V. Solé,et al.  Language networks: Their structure, function, and evolution , 2010 .

[17]  Patrick Thiran,et al.  Layered complex networks. , 2006, Physical review letters.

[18]  Haitao Liu,et al.  Can syntactic networks indicate morphological complexity of a language , 2011 .

[19]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[20]  Haitao Liu,et al.  Central nodes of the Chinese syntactic networks , 2011 .

[21]  Haitao Liu,et al.  Language clusters based on linguistic complex networks , 2010 .

[22]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[23]  Reinhard Köhler,et al.  Patterns in syntactic dependency networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Barbara F. Grimes Ethnologue Languages of the World , 1988 .

[25]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[26]  D. Fell,et al.  The small world of metabolism , 2000, Nature Biotechnology.

[27]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[28]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[29]  Albert,et al.  Topology of evolving networks: local events and universality , 2000, Physical review letters.

[30]  Luc Steels,et al.  Language as a Complex Adaptive System , 2000, PPSN.

[31]  Partha Dasgupta,et al.  Topology of the conceptual network of language. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.