FoLiA in Practice. The Infrastructure of a Linguistic Annotation Format
暂无分享,去创建一个
A.P.J. van den Bosch | Antal van den Bosch | Martin Reynaert | M. van Gompel | K. van der Sloot | K. Sloot | M. V. Gompel | Martin Reynaert
[1] W. Spooren,et al. Diachronic changes in subjectivity and stance: A corpus linguistic study of Dutch news texts , 2012 .
[2] Andreas Witt,et al. A pragmatic approach to XML interoperability — the Component Metadata Infrastructure (CMDI) , 2011 .
[3] Martin Reynaert,et al. FoLiA: A practical XML Format for Linguistic Annotation - a descriptive and comparative study , 2014, CLIN 2014.
[4] Antal van den Bosch. Ucto: Unicode Tokeniser , 2012 .
[5] Erhard W. Hinrichs,et al. A Corpus Representation Format for Linguistic Web Services: The D-SPIN Text Corpus Format and its Relationship with ISO Standards , 2010, LREC.
[6] Martin Reynaert. Synergy of Nederlab and @Philos TEI: diachronic and multilingual Text- Induced Corpus Clean-up , 2014, LREC 2014.
[7] Martin Reynaert. Character confusion versus focus word-based correction of spelling and OCR variants in corpora , 2010, International Journal on Document Analysis and Recognition (IJDAR).
[8] Oliver Christ,et al. A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.
[9] Hennie Brugman,et al. Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora , 2016, LREC.
[10] Amir Zeldes,et al. PAULA XML Documentation , 2013 .
[11] Gertjan van Noord,et al. Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.
[12] Piek T. J. M. Vossen,et al. Computer Assisted Semantic Annotation in the DutchSemCor Project , 2010, LREC.
[13] Nancy Ide,et al. International Standard for a Linguistic Annotation Framework , 2003, Natural Language Engineering.
[14] Thomas M. Breuel. The hOCR Microformat for OCR Workflow and Results , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).
[15] Antal van den Bosch,et al. T-Scan: a new tool for analyzing Dutch text , 2014, CLIN 2014.
[16] Nelleke Oostdijk,et al. The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch , 2013, Essential Speech and Language Technology for Dutch.
[17] A.P.J. van den Bosch,et al. BasiLex: An 11.5 million words corpus of Dutch texts written for children , 2014, CLIN 2014.
[18] A.P.J. van den Bosch,et al. PICCL: Philosophical Integrator of Computational and Corpus Libraries , 2015 .
[19] Menno van Zaanen,et al. OpenSoNaR: user-driven development of the SoNaR corpus interfaces , 2014, COLING.