HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation
暂无分享,去创建一个
Ondrej Bojar | Daniel Zeman | Ales Tamchyna | Pavel Rychlý | Vit Suchomel | Pavel Stranák | Vojtech Diatka | Ondrej Bojar | Daniel Zeman | A. Tamchyna | P. Rychlý | P. Stranák | Vít Suchomel | Vojtech Diatka
[1] Jan Hajič,et al. The Best of Two Worlds: Cooperation of Statistical and Rule-Based Taggers for Czech , 2007, ACL 2007.
[2] Ondrej Dusek,et al. The Joy of Parallelism with CzEng 1.0 , 2012, LREC.
[3] Alexandr Rosen,et al. The case of InterCorp, a multilingual parallel corpus , 2012 .
[4] Matt Post,et al. Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing , 2012, WMT@NAACL-HLT.
[5] Fabienne Braune,et al. Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora , 2010, COLING.
[6] Pushpak Bhattacharyya,et al. Hindi POS Tagger Using Naive Stemming : Harnessing Morphological Information Without Extensive Linguistic Knowledge , 2008 .
[7] Ondrej Bojar,et al. Data Issues in English-to-Hindi Machine Translation , 2010, LREC.
[8] Daniel Zeman,et al. English–Hindi Translation in 21 Days , 2008 .
[9] Tony McEnery,et al. EMILLE, A 67-Million Word Corpus of Indic Languages: Data Collection, Mark-up and Harmonisation , 2002, LREC.
[10] Zdenek Zabokrtský,et al. TectoMT: Modular NLP Framework , 2010, IceTAL.
[11] András Kornai,et al. Parallel corpora for medium density languages , 2007 .
[12] Jan Pomikálek. Removing Boilerplate and Duplicate Content from Web Corpora , 2011 .
[13] Ondrej Bojar,et al. TrTok: A Fast and Trainable Tokenizer for Natural Languages , 2012, Prague Bull. Math. Linguistics.
[14] Rico Sennrich,et al. Iterative, MT-based Sentence Alignment of Parallel Texts , 2011, NODALIDA.
[15] Rico Sennrich,et al. Extrinsic evaluation of sentence alignment systems , 2012 .
[16] Zdenek Zabokrtský,et al. Language Richness of the Web , 2012, LREC.
[17] Vít Suchomel,et al. Efficient Web Crawling for Large Text Corpora , 2012 .