Automatically extracted parallel corpora enriched with highly useful metadata? A Wikipedia case study combining machine learning and social technology

[1]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[2]  Alexander Koplenig,et al.  The impact of lacking metadata for the measurement of cultural and linguistic change using the Google Ngram data sets - Reconstructing the composition of the German corpus in times of WWII , 2015, Digit. Scholarsh. Humanit..

[3]  Josef van Genabith,et al.  An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification , 2017, IEEE Journal of Selected Topics in Signal Processing.

[4]  Y. Gambier Translation strategies and tactics , 2010 .

[5]  Pascale Fung,et al.  Building and Using Comparable Corpora , 2014, Springer Berlin Heidelberg.

[6]  Minako O'Hagan Translations| Massively Open Translation: Unpacking the Relationship Between Technology and Translation in the 21st Century , 2016 .

[7]  Thierry Etchegoyhen,et al.  A Portable Method for Parallel and Comparable Document Alignment , 2016, EAMT.

[8]  Jane Greenberg,et al.  Big Metadata, Smart Metadata, and Metadata Capital: Toward Greater Synergy Between Data Science and Metadata , 2017, J. Data Inf. Sci..

[9]  Ahmet Aker,et al.  Cross-Language Comparability and Its Applications for MT , 2019, Using Comparable Corpora for Under-Resourced Areas of Machine Translation.

[10]  Using Comparable Corpora for Under-Resourced Areas of Machine Translation , 2019, Theory and Applications of Natural Language Processing.

[11]  M. Shuttleworth Translation and the Production of Knowledge in "Wikipedia": Chronicling the Assassination of Boris Nemtsov , 2018 .

[12]  Heather Ford,et al.  ‘Anyone can edit’, not everyone does: Wikipedia’s infrastructure and the gender gap , 2017, Social studies of science.