hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene
暂无分享,去创建一个
[1] Nikola Ljubešić,et al. Language Identification of Web Data for Building Linguistic Corpora , 2011 .
[2] Simon Krek,et al. The JOS Morphosyntactically Tagged Corpus of Slovene , 2008, LREC.
[3] Bruno Pouliquen,et al. Massive multi lingual corpus compilation: Acquis Communautaire and totale , 2005 .
[4] Zeljko Agic,et al. Evaluating Morphosyntactic Tagging of Croatian Texts , 2006, LREC.
[5] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.
[6] Emiliano Raúl Guevara,et al. NoWaC: a large web-based corpus for Norwegian , 2010, WAC@NAACL-HLT.
[7] Pavel Pecina,et al. Building a Web Corpus of Czech , 2010, LREC.
[8] Tomaz Erjavec,et al. MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora , 2004, LREC.
[9] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.