COMPLEX NETWORKS ANALYSIS OF MANUAL AND MACHINE TRANSLATIONS

Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by MT tools can be distinguished from their manual counterparts by means of metrics such as in- (ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better MT tools and automatic evaluation metrics.

[1]  Jean Véronis,et al.  From the Rosetta stone to the information society , 2000 .

[2]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  Paul J. Flory,et al.  Molecular Size Distribution in Three Dimensional Polymers. I. Gelation1 , 1941 .

[5]  Mariano Sigman,et al.  Global organization of the Wordnet lexicon , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  S N Dorogovtsev,et al.  Language as an evolving word web , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[7]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[8]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[9]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[10]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[11]  Mikel L. Forcada,et al.  Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts , 2005, Proces. del Leng. Natural.

[12]  Jean V ronis Parallel Text Processing: Alignment and Use of Translation Corpora , 2002 .

[13]  Susanne Heizmann,et al.  Review of Machine translation: an introductory guide by D. Arnold, L. Balkan, R. Lee Humphreys, S. Meijer, and L. Sadler. NCC Blackwell 1994. , 1995 .

[14]  Lucas Antiqueira,et al.  Correlations between structure and random walk dynamics in directed complex networks , 2007, Applied physics letters.

[15]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[16]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[17]  Lucas Antiqueira,et al.  Strong correlations between text quality and complex networks features , 2007 .

[18]  Rosane Minghim,et al.  Normalized compression distance for visual analysis of document collections , 2007, Comput. Graph..

[19]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.