Devising a Sketch Grammar for Academic Portuguese

This paper presents the development of a new sketch grammar designed specifically for CoPEP, a newly compiled 40-million corpus comprising texts from academic journals, tagged with Freeling v3, the default tagger available in the Sketch Engine for corpora of Portuguese. We first provide an overview and evaluation of existing sketch grammars for Portuguese, followed by a detailed description of the development of a new sketch grammar, and the presentation of some of the problems encountered. We conclude by summarizing the main findings, highlighting important implications, and offering suggestions for further improvement of the sketch grammar. More accurate and varied word sketch results than those offered by the current default sketch grammar indicate that our sketch grammar can be used for advanced lexicographic tasks such as automatic extraction of lexical data from CoPEP, the methodology of knowledge acquisition planned for the compilation of the proposed dictionary of Portuguese for university students. Moreover, this new sketch grammar can be used with any other corpus of Portuguese tagged with Freeling v3, which makes it an important resource for lexicographic and corpus linguistic research of the Portuguese language.