Closing a gap in the language resources landscape : Groundwork and best practices from projects on computer-mediated communication in four European countries.
暂无分享,去创建一个
Tomaž Erjavec | Darja Fišer | Ciara R. Wigham | Nikola Ljubešić | Harald Lüngen | Thierry Chanier | Egon Stemle | Céline Poudat | Angelika Storrer | Michael Beißwenger | Axel Herold | Isabella Chiari | Egon W. Stemle | T. Erjavec | Nikola Ljubesic | Axel Herold | Céline Poudat | Angelika Storrer | Darja Fišer | M. Beißwenger | T. Chanier | H. Lüngen | I. Chiari
[1] Wessel Stoop,et al. Collecting Facebook Posts and WhatsApp Chats - Corpus Compilation of Private Social Media Messages , 2016, TSD.
[2] C. M. Sperberg-McQueen,et al. Guidelines for electronic text encoding and interchange , 1994 .
[3] Stefan Thater,et al. Improving the Performance of Standard Part-of-Speech Taggers for Computer-Mediated Communication , 2014, KONVENS.
[4] Ciara R. Wigham,et al. Interactions between text chat and audio modalities for L2 communication and feedback in the synthetic world Second Life , 2015 .
[5] Eric N. Forsyth. Improving automated lexical and discourse analysis of online chat dialog , 2007 .
[6] Harald Lüngen,et al. *Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN , 2016, KONVENS.
[7] Tomaž Erjavec,et al. Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev , 2016 .
[8] Marie-Josée Hamel,et al. Language-Learner Computer Interactions: Theory, methodology and CALL applications , 2016 .
[9] Paul Rayson,et al. Children Online: A survey of child language and CMC corpora , 2012 .
[10] Tomaž Erjavec,et al. Normalising Slovene data: historical texts vs. user-generated content , 2016, KONVENS.
[11] Iryna Gurevych,et al. WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations , 2013, ACL.
[12] Elisabeth Stark,et al. sms4science: An international corpus-based texting project and the specific challenges for multilingual Switzerland , 2011 .
[13] Eliza Margaretha,et al. Building Linguistic Corpora from Wikipedia Articles and Discussions , 2014, J. Lang. Technol. Comput. Linguistics.
[14] Angelika Storrer,et al. A TEI Schema for the Representation of Computer-mediated Communication , 2012 .
[15] S. M. García,et al. 2014: , 2020, A Party for Lazarus.
[16] Jennifer-Carmen Frey,et al. The DiDi Corpus of South Tyrolean CMC Data: A multilingual corpus of Facebook texts , 2016, CLiC-it/EVALITA.
[17] Stefan Evert,et al. EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora , 2016, WAC@ACL.
[18] Tomaz Erjavec,et al. MULTEXT-East: morphosyntactic resources for Central and Eastern European languages , 2011, Language Resources and Evaluation.
[19] Craig H. Martell,et al. Lexical and Discourse Analysis of Online Chat Dialog , 2007, International Conference on Semantic Computing (ICSC 2007).
[20] Adam Kilgarriff,et al. The Sketch Engine: ten years on , 2014 .
[21] Tomaž Erjavec,et al. JANES v0.4: Korpus slovenskih spletnih uporabniških vsebin , 2016 .
[22] Nelleke Oostdijk,et al. The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch , 2013, Essential Speech and Language Technology for Dutch.
[23] Tomaz Erjavec,et al. Corpus-Based Diacritic Restoration for South Slavic Languages , 2016, LREC.
[24] Angelika Storrer,et al. DeRiK: A German reference corpus of computer-mediated communication , 2013, Lit. Linguistic Comput..
[25] Tomaz Erjavec,et al. Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene , 2016, LREC.
[26] Florence March,et al. 2016 , 2016, Affair of the Heart.
[27] Brook Bolander,et al. Doing sociolinguistic research on computer-mediated data : a review of four methodological issues , 2014 .
[28] Tomaz Erjavec,et al. The IMP historical Slovene language resources , 2015, Lang. Resour. Evaluation.
[29] Swantje Westpfahl,et al. FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German , 2016, LREC.
[30] Natalia Grabar,et al. Wikiconflits : un corpus de discussions éditoriales conflictuelles du Wikipédia francophone , 2017 .
[31] Harald Lüngen,et al. Building and Annotating a Corpus of German-Language Newsgroups , 2015 .
[32] Angelika Storrer,et al. Corpora of computer-mediated communication , 2008 .
[33] Benoît Sagot,et al. The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres , 2014, J. Lang. Technol. Comput. Linguistics.