Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora

The beginning of the 1990s marked a radical turn in various NLP applications towards using large collections of texts.

[1]  Ulrich Heid,et al.  Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure , 2010, LREC.

[2]  Berthold Lausen,et al.  Advances in Data Analysis, Data Handling and Business Intelligence - Proceedings of the 32nd Annual Conference of the Gesellschaft für Klassifikation e.V., Joint Conference with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), Helmut-Schmidt-University, Ha , 2010, GfKl.

[3]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[4]  Satoshi Sato,et al.  Compiling French-Japanese Terminologies from the Web , 2006, EACL.

[5]  Yulia Tsvetkov,et al.  Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content , 2010, LREC.

[6]  Holger Schwenk,et al.  Exploiting Comparable Corpora with TER and TERp , 2009, BUCC@ACL/IJCNLP.

[7]  Fatiha Sadat,et al.  An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction , 2002, COLING.

[8]  Pascale Fung Extracting Key Terms from Chinese and Japanese texts , 1998 .

[9]  András Kornai,et al.  Recent Advances in Natural Language Processing V , 2009, Current Issues in Linguistic Theory.

[10]  Rada Mihalcea,et al.  Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge , 2009, EMNLP.

[11]  Li Bo MEASURING AND IMPROVING COMPARABLE CORPUS QUALITY , 2012 .

[12]  E. Budge The Rosetta Stone , 1989 .

[13]  Bart Defrancq Establishing cross-linguistic semantic relatedness through monolingual corpora: Verbs governing embedded interrogatives , 2008 .

[14]  Kun Yu,et al.  Extracting Bilingual Dictionary from Comparable Corpora with Dependency Heterogeneity , 2009, HLT-NAACL.

[15]  Heng Ji,et al.  Mining Name Translations from Comparable Corpora by Creating Bilingual Information Networks , 2009, BUCC@ACL/IJCNLP.

[16]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[17]  M. Rey Learning a Translation Lexicon from Monolingual Corpora , 2002 .

[18]  Jian-Yun Nie,et al.  Parallel Web text mining for cross-language IR , 2000, RIAO.

[19]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[20]  Serge Sharoff In the Garden and in the Jungle , 2011, Genres on the Web.

[21]  Tiejun Zhao,et al.  Extracting parallel phrases from comparable corpora , 2014, 2014 International Conference on Asian Language Processing (IALP).

[22]  Kevin Knight,et al.  The Copiale Cipher , 2011, BUCC@ACL.

[23]  Thiago Alexandre Salgueiro Pardo,et al.  Computational Processing of the Portuguese Language - 11th International Conference, PROPOR 2014, São Carlos/SP, Brazil, October 6-8, 2014. Proceedings , 2014, Lecture Notes in Computer Science.

[24]  Jessica Enright,et al.  A Fast Method for Parallel Document Identification , 2007, HLT-NAACL.

[25]  Frances Rock Policy and practice in the anonymisation of linguistic data , 2001 .

[26]  Pascale Fung,et al.  Building and Using Comparable Corpora , 2014, Springer Berlin Heidelberg.

[27]  András Kornai,et al.  Parallel corpora for medium density languages , 2007 .

[28]  Raivis Skadiņš,et al.  Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation , 2010 .

[29]  Aimée Lahaussois,et al.  A viewing and processing tool for the analysis of a comparable corpus of Kiranti mythology , 2012 .

[30]  Barbara Plank,et al.  Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) , 2010 .

[31]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[32]  Annelies Braffort,et al.  Toward Categorization of Sign Language Corpora , 2009, BUCC@ACL/IJCNLP.

[33]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[34]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[35]  Mona T. Diab,et al.  A statistical translation model using comparable corpora , 2000, RIAO.

[36]  François Yvon,et al.  Two Ways to Use a Noisy Parallel News Corpus for Improving Statistical Machine Translation , 2011, BUCC@ACL.

[37]  José Ramom Pichel Campos,et al.  An Approach to Acquire Word Translations from Non-parallel Texts , 2005, EPIA.

[38]  Srinivas Bangalore,et al.  Crawling Back and Forth: Using Back and Out Links to Locate Bilingual Sites , 2011, IJCNLP.

[39]  Kumiko Tanaka-Ishii,et al.  Extraction of Lexical Translations from Non-Aligned Corpora , 1996, COLING.

[40]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[41]  Silvia Bernardini,et al.  BootCaT: Bootstrapping Corpora and Terms from the Web , 2004, LREC.

[42]  Ari Rappoport,et al.  Bilingual Lexicon Generation Using Non-Aligned Signatures , 2010, ACL.

[43]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[44]  Andrew Trotman,et al.  The Noisier the Better: Identifying Multilingual Word Translations Using a Single Monolingual Corpus , 2010 .

[45]  Jean-Michel Renders,et al.  A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora , 2004, ACL.

[46]  Emmanuel Morin,et al.  Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora , 2011, BUCC@ACL.

[47]  U. Germann Aligned Hansards of the 36th Parliament of Canada , 2001 .

[48]  Andreas Eisele,et al.  MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[49]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[50]  Pablo Gamallo,et al.  Extraction of Bilingual Cognates from Wikipedia , 2012, PROPOR.

[51]  Reinhard Rapp,et al.  Identifying Word Translations from Comparable Documents Without a Seed Lexicon , 2012, LREC.

[52]  Viktor Pekar,et al.  Finding translations for low-frequency words in comparable corpora , 2006, Machine Translation.

[53]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[54]  Serge Sharoff,et al.  Translating from under-resourced languages: comparing direct transfer against pivot translation , 2007, MTSUMMIT.

[55]  Mikel L. Forcada,et al.  Combining Content-Based and URL-Based Heuristics to Harvest Aligned Bitexts from Multilingual Sites with Bitextor , 2010, Prague Bull. Math. Linguistics.

[56]  A. Kilgarriff Comparing Corpora , 2001 .

[57]  Michael Zock,et al.  Automatic Dictionary Expansion Using Non-parallel Corpora , 2008, GfKl.

[58]  Philippe Langlais,et al.  Identifying Parallel Documents from a Large Bilingual Collection of Texts: Application to Parallel Article Extraction in Wikipedia. , 2011, BUCC@ACL.

[59]  Ulrich Heid,et al.  User-centred Views on Terminology Extraction Tools: Usage Scenarios and Integration into MT and CAT Tools , 2011 .

[60]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[61]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[62]  David Yarowsky,et al.  Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages , 2002, CoNLL.

[63]  Serge Sharo Creating General-Purpose Corpora Using Automated Search Engine Queries , 2006 .

[64]  Martin Cmejrek,et al.  Automatic extraction of terminological translation lexicon from Czech-English parallel texts , 2001 .

[65]  Serge Sharoff,et al.  Assisting Translators in Indirect Lexical Transfer , 2007, ACL.

[66]  David Yarowsky,et al.  Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences , 2009, CoNLL.

[67]  Dragos Stefan Munteanu,et al.  Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora , 2006, ACL.

[68]  Jakob Uszkoreit,et al.  Large Scale Parallel Document Mining for Machine Translation , 2010, COLING.

[69]  Emmanuel Morin,et al.  QAlign: A New Method for Bilingual Lexicon Extraction from Comparable Corpora , 2012, CICLing.

[70]  Jun'ichi Tsujii,et al.  Learning the Optimal Use of Dependency-parsing Information for Finding Translations with Comparable Corpora , 2011, BUCC@ACL.

[71]  Andreas Eisele,et al.  Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System , 2008, WMT@ACL.

[72]  Stephan Vogel,et al.  Adaptive parallel sentences mining from web bilingual news collection , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[73]  Yun-Chuang Chiao,et al.  A Novel Approach to Improve Word Translations Extraction from Non-Parallel , Comparable Corpora , 2004 .

[74]  Pierre Zweigenbaum,et al.  Paraphrase Detection in Monolingual Specialized/Lay Comparable Corpora , 2013, Building and Using Comparable Corpora.

[75]  Serge Sharoff,et al.  In the Garden and in the Jungle Comparing Genres in the BNC and Internet , 2010 .

[76]  Kyo Kageura,et al.  Bilingual Terminology Mining - Using Brain, not brawn comparable corpora , 2007, ACL.

[77]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[78]  Adam Kilgarriff,et al.  Comparable Corpora Within and Across Languages, Word Frequency Lists and the KELLY Project , 2010 .

[79]  C. Andrews,et al.  The Rosetta Stone , 1981 .

[80]  Alexander Mehler,et al.  Genres on the Web: Computational Models and Empirical Studies , 2010 .

[81]  José Ramom Pichel Campos,et al.  Automatic Generation of Bilingual Dictionaries Using Intermediary Languages and Comparable Corpora , 2010, CICLing.

[82]  Olivier Galibert,et al.  Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers , 2012, LAW@ACL.

[83]  Pascale Fung,et al.  Rare Word Translation Extraction from Aligned Comparable Documents , 2011, ACL.

[84]  Chris Callison-Burch,et al.  Paraphrase Fragment Extraction from Monolingual Comparable Corpora , 2011, BUCC@ACL.

[85]  Emmanuel Morin,et al.  Bilingual Lexicon Extraction from Comparable Corpora by Combining Contextual Representations (Extraction de lexiques bilingues à partir de corpus comparables par combinaison de représentations contextuelles) [in French] , 2013, TALN.

[86]  Philippe Langlais,et al.  Translating Unknown Words by Analogical Learning , 2007, EMNLP.

[87]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[88]  Pierre Zweigenbaum,et al.  ACL-IJCNLP 2009 BUCC 2009 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora , 2009 .

[89]  Antonio Toral,et al.  Mining and Exploiting Domain-Specific Corpora in the PANACEA Platform , 2013, ArXiv.

[90]  Carol Peters,et al.  Using Linguistic Tools and Resources in Cross-Language Retrieval , 1997 .

[91]  Carol Peters,et al.  Exploiting lexical resources aud linguistic tools in cross-language information retrieval: the EuroSearch approach , 1998 .

[92]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[93]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[94]  Maarten de Rijke,et al.  Finding Similar Sentences across Multiple Languages in Wikipedia , 2006 .

[95]  Christian Scheible,et al.  A Graph-Theoretic Algorithm for Automatic Extension of Translation Lexicons , 2009 .

[96]  Christoph Tillmann,et al.  A Beam-Search Extraction Algorithm for Comparable Data , 2009, ACL.