Keeping Properties with the Data CL-MetaHeaders - An Open Specification
暂无分享,去创建一个
Corpus researchers, along with many other disciplines in science are being put under continual pressure to show accountability and reproducibility in their work. This is unsurprisingly difficult when the researcher is faced with a wide array of methods and tools through which to do their work; simply tracking the operations done can be problematic, especially when toolchains are often configured by the developers, but left largely as a black box to the user. Here we present a scheme for encoding this ‘meta data’ inside the corpus files themselves in a structured data format, along with a proof-of-concept tool to record the operations performed on a file.
[1] Georgios Gousios,et al. A note on rigour and replicability , 2012, SOEN.
[2] Nicola Ferro,et al. Unfolding Off-the-shelf IR Systems for Reproducibility , 2015, SIGIR 2015.
[3] Matthias Hagen,et al. Who Wrote the Web? Revisiting Influential Author Identification Research Applicable to Information Retrieval , 2016, ECIR.
[4] Laurence Anthony,et al. A critical look at software tools in corpus linguistics , 2013 .