Balancing SoNaR: IPR versus Processing Issues in a 500-Million-Word Written Dutch Reference Corpus
暂无分享,去创建一个
[1] Adam Kilgarriff,et al. Large Linguistically-Processed Web Corpora for Multiple Languages , 2006, EACL.
[2] Stefan Evert. A Lightweight and Efficient Tool for Cleaning Web Pages , 2008, LREC.
[3] Orphée De Clercq,et al. Dutch Parallel Corpus , 2011 .
[4] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.
[5] Nelleke Oostdijk,et al. From D-Coi to SoNaR: a reference corpus for Dutch , 2008, LREC.
[6] Stefan Evert,et al. How Random is a Corpus? The Library Metaphor , 2006 .
[7] Maarten Marx,et al. DutchParl. The Parliamentary Documents in Dutch , 2010, LREC.
[8] Véronique Hoste,et al. Interacting Semantic Layers of Annotation in SoNaR, a Reference Corpus of Contemporary Written Dutch , 2010, LREC.
[9] Orphée De Clercq,et al. Data Collection and IPR in Multilingual Parallel Corpora. Dutch Parallel Corpus , 2010, LREC.
[10] Klaus U. Schulz,et al. Orthographic Errors in Web Pages: Toward Cleaner Web Corpora , 2006, Computational Linguistics.
[11] Martin Reynaert,et al. Non-interactive OCR Post-correction for Giga-Scale Digitization Projects , 2008, CICLing.