O que é e como se constrói um corpus? Lições aprendidas na compilação de vários corpora para pesquisa lingüística What is a corpus and how to build it? Lessons learned from developing several linguistic corpora

The research based on corpus has had in the last decade an ample development in the Brazilian context. Its relevancy is noticed in the Linguistics, Applied Linguistics and Computational Linguistics research areas. The approach of Corpus Linguistics comes out to systematize procedures and to give account of this new way to make research. The development of Brazilian Portuguese natural language processing tools can help Corpus Linguistics to reach a great development in Brazil. However, the advances in Corpus Linguistics in the international scenery have not happened yet in many of the research carried out in Brazil. The reasons for this is that the procedures and concepts world-wide accepted are not still settled here, in spite of having researchers developing extraordinary projects based on corpus in Brazil. Thus, this article has the intention to discuss several definitions of corpus, the requirements and procedures for its elaboration, the available corpora and tools and, finally, to present four projects involving corpus whose description and detailing can assist other researchers in the corpus building and processing.