Initial Comparison of Linguistic Networks Measures for Parallel Texts

This paper presents preliminary results of Croatian syllable networks analysis. Syllable network is a network in which nodes are syllables and links between them are constructed according to their connections within words. In this paper we analyze networks of syllables generated from texts collected from the Croatian Wikipedia and Blogs. As a main tool we use complex network analysis methods which provide mechanisms that can reveal new patterns in a language structure. We aim to show that syllable networks have much higher clustering coefficient in comparison to Erd\"os-Renyi random networks. The results indicate that Croatian syllable networks exhibit certain properties of a small world networks. Furthermore, we compared Croatian syllable networks with Portuguese and Chinese syllable networks and we showed that they have similar properties.

[1]  Haitao Liu,et al.  Language clustering with word co-occurrence networks based on parallel texts , 2013 .

[2]  Long Sheng,et al.  English and Chinese languages as weighted complex networks , 2009 .

[3]  Marko Tadić,et al.  Building the Croatian Morphological Lexicon , 2003 .

[4]  Haluk Bingol,et al.  Complex Networks in Different Languages: A Study of an Emergent Multilingual Encyclopedia , 2010 .

[5]  Amir H. Darooneh,et al.  The complex networks approach for authorship attribution of books , 2012 .

[6]  Haitao Liu,et al.  Can syntactic networks indicate morphological complexity of a language , 2011 .

[7]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[8]  Miquel Barceló,et al.  Inteligencia Artificial , 2001 .

[9]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[10]  Gilberto Corso,et al.  The network of syllables in Portuguese , 2005 .

[11]  B. Bollobás The evolution of random graphs , 1984 .

[12]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[13]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[14]  Ricard V. Solé,et al.  Language networks: Their structure, function, and evolution , 2007, Complex..

[15]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[16]  Michael S. Vitevitch,et al.  The Structure of Phonological Networks across Multiple Languages , 2009, Int. J. Bifurc. Chaos.

[17]  Reinhard Köhler,et al.  Patterns in syntactic dependency networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[19]  Michael S. Vitevitch,et al.  Comparative Analysis of Networks of Phonologically Similar Words in English and Spanish , 2010, Entropy.

[20]  Ana Mestrovic,et al.  A preliminary study of Croatian language syllable networks , 2013, 2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[21]  G. J. Rodgers,et al.  Network properties of written human language. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Alexandre Arenas,et al.  Semantic Networks: Structure and Dynamics , 2010, Entropy.