Some Issues on the Normalization of a Corpus of Products Reviews in Portuguese
暂无分享,去创建一个
Sandra M. Aluísio | Maria das Graças Volpe Nunes | Magali Sanches Duran | Thiago A. S. Pardo | Lucas Avanço | T. Pardo | S. Aluísio | M. G. V. Nunes | M. Duran | L. Avanco
[1] Tullio De Mauro,et al. Guida all'uso delle parole , 1980 .
[2] Hercules Dalianis,et al. Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike , 2009, ACL.
[3] Adam Kilgarriff,et al. Introduction to the Special Issue on the Web as Corpus , 2003, CL.
[4] Alexander Mehler,et al. Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems , 2008, LREC.
[5] Tanja Schultz,et al. Text normalization based on statistical machine translation and internet user support , 2010, INTERSPEECH.
[6] Sabine Buchholz,et al. CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.
[7] Klaus U. Schulz,et al. Orthographic Errors in Web Pages: Toward Cleaner Web Corpora , 2006, Computational Linguistics.
[8] Silvia Bernardini,et al. BootCaT: Bootstrapping Corpora and Terms from the Web , 2004, LREC.
[9] Felice Dell'Orletta,et al. ULISSE: an Unsupervised Algorithm for Detecting Reliable Dependency Parses , 2011, CoNLL.
[10] Tomaz Erjavec,et al. hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene , 2011, TSD.
[11] Bernd Bohnet,et al. Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.
[12] Neville Ryant,et al. A large-scale classification of English verbs , 2008, Lang. Resour. Evaluation.
[13] L. Venkata Subramaniam,et al. Unsupervised cleansing of noisy text , 2010, COLING.
[14] Lucian Vlad Lita,et al. tRuEcasIng , 2003, ACL.
[15] Sandra M. Aluísio,et al. An Account of the Challenge of Tagging a Reference Corpus for Brazilian Portuguese , 2003, PROPOR.
[16] Timothy Baldwin,et al. langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.
[17] Nathan Hartmann,et al. A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words , 2014, LREC.
[18] Felice Dell'Orletta,et al. Accurate Dependency Parsing with a Stacked Multilayer Perceptron , 2009 .
[19] Verena Lyding,et al. xLDD: Extended Linguistic Dependency Diagrams , 2011, 2011 15th International Conference on Information Visualisation.
[20] Marina Santini,et al. Genres in formation? An exploratory study of web pages using cluster analysis , 2005 .
[21] Nuno Cardoso. Rembrandt - a named-entity recognition framework , 2012, LREC.
[22] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[23] Adrien Barbaresi,et al. The Good, the Bad, and the Hazy: Design Decisions in Web Corpus Construction , 2013 .
[24] Raymond J. Mooney,et al. Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.
[25] Alexander Mehler,et al. Riding the Rough Waves of Genre on the Web , 2011, Genres on the Web.
[26] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.
[27] Eugene Charniak,et al. Reranking and Self-Training for Parser Adaptation , 2006, ACL.
[28] Felice Dell'Orletta,et al. Ensemble system for Part-of-Speech tagging , 2009 .
[29] Jian Su,et al. A Phrase-Based Statistical Model for SMS Text Normalization , 2006, ACL.
[30] Nikola Ljubesic,et al. Lemmatization and Morphosyntactic Tagging of Croatian and Serbian , 2013, BSNLP@ACL.
[31] Roland Schäfer,et al. Building Large Corpora from the Web Using a New Efficient Tool Chain , 2012, LREC.
[32] Eric Laporte,et al. UNITEX-PB, a set of flexible language resources for Brazilian Portuguese , 2005 .
[33] Serge Sharo. Creating General-Purpose Corpora Using Automated Search Engine Queries , 2006 .
[34] Michel Généreux,et al. A Large Portuguese Corpus On-Line: Cleaning and Preprocessing , 2012, PROPOR.
[35] Adam Kilgarriff,et al. Cleaneval: a Competition for Cleaning Web Pages , 2008, LREC.
[36] Slav Petrov,et al. Overview of the 2012 Shared Task on Parsing the Web , 2012 .
[37] Egon Stemle,et al. Open Corpus Interface for Italian Language Learning , 2013 .
[38] Felice Dell'Orletta,et al. Unsupervised Linguistically-Driven Reliable Dependency Parses Detection and Self-Training for Adaptation to the Biomedical Domain , 2013, BioNLP@ACL.
[39] Sara Castagnoli,et al. I testi del web: una proposta di classificazione sulla base del corpus PAISÀ , 2011 .
[40] Alessandro Lenci,et al. LexIt: A Computational Resource on Italian Argument Structure , 2012, LREC.
[41] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.
[42] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.
[43] Fernando Batista,et al. Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news , 2008, Speech Commun..
[44] Zeljko Agic,et al. Parsing Croatian and Serbian by Using Croatian Dependency Treebanks , 2013, SPMRL@EMNLP.
[45] Françoise Beaufays,et al. Language model capitalization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[46] Stefan Evert,et al. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium , 2011 .