Merging Comparable Data Sources for the Discrimination of Similar Languages : The DSL Corpus Collection
暂无分享,去创建一个
Jörg Tiedemann | Nikola Ljubešić | Marcos Zampieri | Nikola Ljubesic | Marcos Zampieri | J. Tiedemann
[1] Chengqing Zong,et al. Domain Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual Corpora , 2008, COLING.
[2] Eiichiro Sumita,et al. Building a Bilingual Dictionary from a Japanese-Chinese Patent Corpus , 2013, CICLing.
[3] Atsushi Fujita,et al. FUN-NRC: Paraphrase-augmented Phrase-based SMT Systems for NTCIR-10 PatentMT , 2013, NTCIR.
[4] Timothy Baldwin,et al. Language Identification: The Long and the Short of the Matter , 2010, NAACL.
[5] Pierre Zweigenbaum,et al. 8. Contextual acquisition of information categories: What has been done and what can be done automatically? , 2002 .
[6] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.
[7] Preslav Nakov,et al. Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages , 2012, ACL.
[8] Iryna Gurevych,et al. Subcat-LMF: Fleshing out a standardized format for subcategorization frame interoperability , 2012, EACL.
[9] James Pustejovsky,et al. Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank and Coreference , 2005, FCA@ACL.
[10] Marianna Apidianaki,et al. Vector Disambiguation for Translation Extraction from Comparable Corpora , 2013, Informatica.
[11] Bing Liang,et al. Semi-Automatic Identification of Bilingual Synonymous Technical Terms from Phrase Tables and Parallel Patent Sentences , 2011, PACLIC.
[12] Mikio Yamamoto,et al. Integrating a Phrase-based SMT Model and a Bilingual Lexicon for Semi-Automatic Acquisition of Technical Term Translation Lexicons , 2008, AMTA.
[13] Marcos Zampieri,et al. Automatic identification of language varieties: The case of Portuguese , 2012, KONVENS.
[14] Evon M. O. Abu-Taieh,et al. Comparative Study , 2020, Definitions.
[15] Reinhard Rapp,et al. Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.
[16] Erich Steiner. Translated Texts: Properties, Variants, Evaluations , 2004 .
[17] Pierre Zweigenbaum,et al. Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge , 2013, EMNLP.
[18] Benjamin Ka-Yin T'sou,et al. Towards Bilingual Term Extraction in Comparable Patents , 2009, PACLIC.
[19] Hal Daumé,et al. Extracting Multilingual Topics from Unaligned Comparable Corpora , 2010, ECIR.
[20] Timothy Baldwin,et al. langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.
[21] Sivaji Bandyopadhyay,et al. MWE Alignment in Phrase Based Statistical Machine Translation , 2013, MTSUMMIT.
[22] David Y. W. Lee,et al. Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .
[23] Regina Barzilay,et al. Paraphrasing for Automatic Evaluation , 2006, NAACL.
[24] Christopher D. Manning,et al. Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines , 2008 .
[25] Jian-Yun Nie,et al. Parallel Web text mining for cross-language IR , 2000, RIAO.
[26] Pascale Fung,et al. Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.
[27] Regina Barzilay,et al. Extracting Paraphrases from a Parallel Corpus , 2001, ACL.
[28] Éric Gaussier,et al. Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora , 2010, COLING.
[29] Stefanie Anstein,et al. Computational approaches to the comparison of regional variety corpora : prototyping a semi-automatic system for German , 2013 .
[30] Timothy Baldwin,et al. Multilingual Language Identification: ALTW 2010 Shared Task Data , 2010, ALTA.
[31] N. Mikelic,et al. Language Indentification: How to Distinguish Similar Languages? , 2007, 2007 29th International Conference on Information Technology Interfaces.
[32] Andrew Y. Ng,et al. Parsing with Compositional Vector Grammars , 2013, ACL.
[33] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[34] Takashi Tsunakawa,et al. Bilingual Synonym Identification with Spelling Variations , 2008, IJCNLP.
[35] Erich Steiner,et al. Cross-Linguistic Corpora for the Study of Translations: Insights from the Language Pair English-German , 2012 .
[36] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.
[37] Alexander Mehler,et al. Riding the Rough Waves of Genre on the Web , 2011, Genres on the Web.
[38] Georges Linarès,et al. Post-édition statistique pour l’adaptation aux domaines de spécialité en traduction automatique (Statistical Post-Editing of Machine Translation for Domain Adaptation) [in French] , 2012, JEP/TALN/RECITAL.
[39] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.
[40] Jing Sun,et al. Can Word Segmentation be Considered Harmful for Statistical Machine Translation Tasks between Japanese and Chinese? , 2012, PACLIC.
[41] Lucia Specia,et al. Multilingual WSD-like Constraints for Paraphrase Extraction , 2013, CoNLL.
[42] Yves Peirsman,et al. Cross-lingual Induction of Selectional Preferences with Bilingual Vector Spaces , 2010, NAACL.
[43] Douglas Biber,et al. Dimensions of Register Variation , 1995 .
[44] Chu-Ren Huang,et al. Contrastive Approach towards Text Source Classification based on Top-Bag-of-Word Similarity , 2008, PACLIC.
[45] Susan T. Dumais,et al. Automatic cross-linguistic information retrieval using latent semantic indexing , 2007 .
[46] Hal Daumé,et al. Frustratingly Easy Domain Adaptation , 2007, ACL.
[47] ˇ IvanaLu. Efficient Discrimination Between Closely Related Languages , 2012 .
[48] Andreas Eisele,et al. Improving Machine Translation Performance Using Comparable Corpora , 2010 .
[49] Serge Sharoff,et al. Document dissimilarity within and across languages: A benchmarking study , 2014, Lit. Linguistic Comput..
[50] Ming Zhou,et al. Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.
[51] Bogdan Babych,et al. Development and Application of a Cross-language Document Comparability Metric , 2012, LREC.
[52] Kenneth Ward Church,et al. Work on Statistical Methods for Word Sense Disambiguation , 1992 .
[53] Holger Schwenk,et al. Exploiting Comparable Corpora with TER and TERp , 2009, BUCC@ACL/IJCNLP.
[54] Fatiha Sadat,et al. An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction , 2002, COLING.
[55] Emmanuel Morin,et al. Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora , 2012, LREC.
[56] Stefan Th. Gries,et al. What is Corpus Linguistics? , 2009, Lang. Linguistics Compass.
[57] Daniel Jurafsky,et al. A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.
[58] Pablo Gamallo,et al. Is singular value decomposition useful for word similarity extraction? , 2011, Lang. Resour. Evaluation.
[59] Jörg Tiedemann,et al. Efficient Discrimination Between Closely Related Languages , 2012, COLING.
[60] Montserrat Marimon,et al. Towards the automatic merging of language resources , 2011 .
[61] Gregory Grefenstette,et al. Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.
[62] Sivaji Bandyopadhyay,et al. Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora , 2013, BUCC@ACL.
[63] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[64] Emmanuel Morin,et al. Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora , 2011, BUCC@ACL.
[65] M. Utiyama,et al. A Japanese-English patent parallel corpus , 2007, MTSUMMIT.
[66] A. Kilgarriff. Comparing Corpora , 2001 .
[67] Takako Aikawa,et al. Automatic validation of terminology translation consistenscy with statistical method , 2007, MTSUMMIT.
[68] Yves Peirsman,et al. The automatic identification of lexical variation between language varieties , 2010, Natural Language Engineering.
[69] Kristina Toutanova,et al. Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment , 2010, NAACL.
[70] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.
[71] Pascale Fung,et al. Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E , 2004, EMNLP.
[72] Marcos Zampieri,et al. N-gram Language Models and POS Distribution for the Identification of Spanish Varieties (Ngrammes et Traits Morphosyntaxiques pour la Identification de Variétés de l’Espagnol) [in French] , 2013, JEP/TALN/RECITAL.
[73] Philippe Langlais,et al. Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora , 2010, COLING.
[74] Geoff Holmes,et al. Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.
[75] Reinhard Rapp,et al. Identifying Word Translations in Non-Parallel Texts , 1995, ACL.
[76] Pablo Gamallo Otero. Learning bilingual lexicons from comparable English and Spanish corpora , 2007, MTSUMMIT.
[77] Pierre Zweigenbaum,et al. Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora , 2013, ACL.
[78] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.
[79] Bing Liang,et al. Identifying Bilingual Synonymous Technical Terms from Phrase Tables and Parallel Patent Sentences , 2011 .
[80] Christian Boitet,et al. Online production of HQ parallel corpora and permanent task-based evaluation of multiple MT systems: both can be obtained through iMAGs with no added cost , 2013, MTSUMMIT.
[81] Jean-Michel Renders,et al. A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora , 2004, ACL.
[82] P. Nather. N-Gram based Text Categorization , 2005 .
[83] Marco Lui,et al. Classifying English Documents by National Dialect , 2013, ALTA.
[84] Z. Harris. A Theory of Language and Information: A Mathematical Approach , 1991 .
[85] Oi Yee Kwong,et al. The Construction of a Chinese-English Patent Parallel Corpus , 2009, MTSUMMIT.
[86] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[87] Marianna Apidianaki. Translation-oriented Word Sense Induction Based on Parallel Corpora , 2008, LREC.
[88] Pierre Zweigenbaum,et al. The Effect of a General Lexicon in Corpus-Based Identification of French-English Medical Word Translations , 2003, MIE.
[89] Yun-Chuang Chiao,et al. A Novel Approach to Improve Word Translations Extraction from Non-Parallel , Comparable Corpora , 2004 .
[90] Svenja Kranich,et al. Changing conventions in English-German translations of popular scientific texts , 2012 .
[91] Eric Gaussier,et al. Une nouvelle approche à l'extraction de lexiques bilingues à partir de corpus comparables , 2007 .
[92] Chris Callison-Burch,et al. Expectations of Word Sense in Parallel Corpora , 2012, NAACL.
[93] Pierre Zweigenbaum,et al. Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection , 2013, 2013 IEEE International Conference on Healthcare Informatics.
[94] Stefan Riezler,et al. Analyzing Parallelism and Domain Similarities in the MAREC Patent Corpus , 2012, IRFC.
[95] Tomaz Erjavec,et al. The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.
[96] Bali Ranaivo-Malancon,et al. Automatic Identification of Close Languages - Case study: Malay and Indonesian , 1970 .
[97] Jakob Uszkoreit,et al. Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.
[98] Dragos Stefan Munteanu,et al. Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.
[99] Yuji Matsumoto,et al. Lexical Knowledge Acquisition , 2005 .
[100] M. Halliday,et al. Language, Context, and Text: Aspects of Language in a Social-Semiotic Perspective , 1989 .
[101] Darja Fiser,et al. Bilingual lexicon extraction from comparable corpora for closely related languages , 2011, RANLP.
[102] Wei Xu,et al. Gathering and Generating Paraphrases from Twitter with Application to Normalization , 2013, BUCC@ACL.
[103] Zellig S. Harris,et al. Language and information , 1988 .
[104] Chris Callison-Burch,et al. Paraphrase Fragment Extraction from Monolingual Comparable Corpora , 2011, BUCC@ACL.
[105] Pascale Fung,et al. An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.
[106] Pierre Zweigenbaum,et al. Translating medical terminologies through word alignment in parallel text corpora , 2009, J. Biomed. Informatics.
[107] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.
[108] András Kornai,et al. Parallel corpora for medium density languages , 2007 .
[109] Stella Neumann,et al. Contrastive Register Variation: A Quantitative Approach to the Comparison of English and German , 2013, Modern Language Review.
[110] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.
[111] Jan Svartvik,et al. A __ comprehensive grammar of the English language , 1988 .
[112] Nitin Madnani,et al. Using Paraphrases for Parameter Tuning in Statistical Machine Translation , 2007, WMT@ACL.
[113] Elke Teich,et al. Cross-linguistic variation in system and text , 2003 .
[114] Iñaki San Vicente,et al. Automatic Extraction of Bilingual Terms from Comparable Corpora in a Popular Science Domain , 2008 .