Medicor: A corpus of contemporary American medical texts

Medical texts are the medium in which new medical hypotheses are formulated. They are also a means of distributing medical knowledge to the general public, and they can be a form of linguistic manipulation which tries to influence the reader’s future action. These aspects make medical discourse an interesting research area for a linguist. Using corpus linguistic methods, it is possible to study quantitatively the interplay between form and function in medical texts. This paper introduces a new computer corpus which will enable quantitative study of medical discourse. The new corpus, Medicor, contains contemporary American medical texts, and its size is 397,311 words. It is being compiled at the University of Helsinki by Minna Vihla. Medicor represents different types of medical writing, both professional and popular: samples taken from medical textbooks, professional handbook samples, research and editorial articles published in professional medical journals, samples from a popular medical guidebook, and newspaper/magazine articles intended for the general public. In what follows, section 2 briefly presents the background of the corpus and its place in the corpus linguistic setting, whereas later sections describe the corpus itself.