Approaching textual coherence of machine translation with complex network

This study analyzes topological properties of complex networks of textual coherence, and investigates the textual coherence of machine translation by contrasting these properties in machine-translated texts with those in a human-translated text. The complex networks of textual coherence are built by drawing on the knowledge from Systemic Functional Linguistics, with Themes and Rhemes denoted as vertices and the semantic connections between them as edges. It is found that the coherence networks are small-world, assortatively mixed, scale-free with an exponential cut-off, and hub-dependent. The basic building blocks consist of fully-connected triads and fully-connected squares, with the latter playing a more significant role in the network construction. Compared with the complex network of human translation, the networks of machine translations have fewer vertices and edges, lower average degree, smaller network diameter, shorter average path length, larger cluster coefficient, bigger assortativeness coefficient and more types of motifs. Thus, we suggest that the machine-translated texts are sparsely, locally, unevenly and monotonously connected, which may account for why and how machine translation is weak in coherence. This study is the first effort ever to employ complex networks to explore textual coherence of machine translations. It may hopefully promote the cross-disciplinary interaction between linguistics, computer science and network science.

[1]  Lucas Antiqueira,et al.  COMPLEX NETWORKS ANALYSIS OF MANUAL AND MACHINE TRANSLATIONS , 2008 .

[2]  Jenny Bangham,et al.  Mouse genomic technologies: Crossing the map , 2007, Nature Reviews Genetics.

[3]  Sebastian Wernicke,et al.  A Faster Algorithm for Detecting Network Motifs , 2005, WABI.

[4]  Gene expression and cancer: getting it together , 2002, Nature Genetics.

[5]  Monica Turci,et al.  Introducing Functional Grammar , 2009 .

[6]  Niloy Ganguly,et al.  Self-organization of the Sound Inventories: Analysis and Synthesis of the Occurrence and Co-occurrence Networks of Consonants* , 2006, J. Quant. Linguistics.

[7]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[8]  Peng Zhou,et al.  Dynamical properties of a logistic growth model with cross-correlated noises , 2011 .

[9]  Evandro Eduardo Seron Ruiz,et al.  Thesaurus as a complex network , 2004 .

[10]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[11]  Lin Zhu,et al.  Complex dynamics of text analysis , 2014 .

[12]  刘磊,et al.  GaInP/GaAs/Ge三结叠层光电池光谱响应的温度特性 , 2009 .

[13]  Luciano da Fontoura Costa,et al.  Using complex networks for text classification: Discriminating informative and imaginative documents , 2016 .

[14]  U. Alon,et al.  Spontaneous evolution of modularity and network motifs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  A. J. Roberts,et al.  Normal form transforms separate slow and fast modes in stochastic dynamical systems , 2008 .

[16]  Marcelo Magalhães Sales,et al.  Type 2 Diabetes Elicits Lower Nitric Oxide, Bradykinin Concentration and Kallikrein Activity Together with Higher DesArg9-BK and Reduced Post-Exercise Hypotension Compared to Non-Diabetic Condition , 2013, PloS one.

[17]  Yue Wang,et al.  Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network , 2016 .

[18]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[19]  de Nooy Wouter,et al.  Exploratory Social Network Analysis with Pajek. Revised and Expanded Edition for Updated Software. Third edition. , 2018 .

[20]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[21]  彭宣维,et al.  An Introduction to Functional Grammar的“集大成”地位 , 2009 .

[22]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[23]  Yuming Shi,et al.  Co-occurrence network analysis of modern Chinese poems , 2015 .

[24]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[25]  Vladimir Batagelj,et al.  Exploratory Social Network Analysis with Pajek: Revised and Expanded Edition for Updated Software , 2018 .

[26]  Diego R. Amancio,et al.  Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks , 2016, PloS one.

[27]  George L. Malcolm,et al.  Eye Movements and Visual Encoding During Scene Perception , 2009, Psychological science.

[28]  Thomas T. Hills,et al.  Longitudinal Analysis of Early Semantic Networks , 2009, Psychological science.

[29]  Mariano Sigman,et al.  Global organization of the Wordnet lexicon , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[31]  Shouhuai Xu,et al.  Spatiotemporal Patterns and Predictability of Cyberattacks , 2015, PloS one.

[32]  Jiajun Liu,et al.  Analysis of co-occurrence toponyms in web pages based on complex networks , 2017 .

[33]  Zoe Leviston,et al.  Climate Change From a Distance: An Analysis of Construal Level and Psychological Distance From Climate Change , 2019, Front. Psychol..

[34]  Chi K. Tse,et al.  Comparison of co-occurrence networks of the Chinese and English languages , 2009 .

[35]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[36]  Simon Baron-Cohen,et al.  Differences in change blindness to real-life scenes in adults with autism spectrum conditions , 2017, PloS one.

[37]  Kunpeng Wang,et al.  Relationships among the statistical parameters in evolving modern Chinese linguistic co-occurrence networks , 2019, Physica A: Statistical Mechanics and its Applications.