Towards new information resources for public health - From WordNet to MedicalWordNet

In the last two decades, WordNet has evolved as the most comprehensive computational lexicon of general English. In this article, we discuss its potential for supporting the creation of an entirely new kind of information resource for public health, viz. MedicalWordNet. This resource is not to be conceived merely as a lexical extension of the original WordNet to medical terminology; indeed, there is already a considerable degree of overlap between WordNet and the vocabulary of medicine. Instead, we propose a new type of repository, consisting of three large collections of (1) medically relevant word forms, structured along the lines of the existing Princeton WordNet; (2) medically validated propositions, referred to here as medical facts, which will constitute what we shall call MedicalFactNet; and (3) propositions reflecting laypersons' medical beliefs, which will constitute what we shall call the MedicalBeliefNet. We introduce a methodology for setting up the MedicalWordNet. We then turn to the discussion of research challenges that have to be met to build this new type of information resource. We build a database of sentences relevant to the medical domain. The sentences are generated from WordNet via its relations as well as from medical statements broken down into elementary propositions. Two subcorpora of sentences are distinguished, MedicalBeliefNet and MedicalFactNet. The former is rated for assent by laypersons; the latter for correctness by medical experts. The sentence corpora will be valuable for a variety of applications in information retrieval as well as in research in linguistics and psychology with respect to the study of expert and non-expert beliefs and their linguistic expressions. Our work has to meet several considerable challenges. These include accounting for the distinction between medical experts and laypersons, the social issues of expert-layperson communication in different media, the linguistic aspects of encoding medical knowledge, and the reliability, volume, and emergence of medical knowledge. The work described here has been tested in a small pilot experiment and awaits large-scale implementation.

[1]  S. Griffis EDITOR , 1997, Journal of Navigation.

[2]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[3]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[4]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[5]  J. Deese The associative structure of some common english adjectives , 1964 .

[6]  Martin Romacker,et al.  Discourse structures in medical reports - Watch out! The generation of referentially coherent and valid text knowledge bases in the medSYNDIKATE system , 1999, Int. J. Medical Informatics.

[7]  Koen Lamberts,et al.  Knowledge, Concepts, and Categories , 1997 .

[8]  Vimla L. Patel,et al.  Patients' and physicians' understanding of health and biomedical concepts: relationship to the design of EMR systems , 2002, J. Biomed. Informatics.

[9]  P.J.T.M. Vossen Introduction to the Special Issue on the BalkaNet Project , 2004 .

[10]  Dietmar Rösner,et al.  Finding High-Frequent Synonyms of A Domain-Specific Verb in English Sub-Language of MEDLINE Abstracts Using WordNet , 2004 .

[11]  Volker Haarslev,et al.  Description Logic Systems , 2003, Description Logic Handbook.

[12]  J J Cimino,et al.  Mapping medical vocabularies to the Unified Medical Language System. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[13]  Werner Nutt,et al.  Basic Description Logics , 2003, Description Logic Handbook.

[14]  J. Kruger,et al.  Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. , 1999, Journal of personality and social psychology.

[15]  S. Wyatt,et al.  The Role of the Internet in Patient-Practitioner Relationships: Findings from a Qualitative Research Study , 2004, Journal of medical Internet research.

[16]  W. A. Newman Dorland,et al.  Dorland's Illustrated Medical Dictionary , 1974 .

[17]  Barry Smith,et al.  On the Application of Formal Principles to Life Science Data: a Case Study in the Gene Ontology , 2004, DILS.

[18]  Christiane Fellbaum,et al.  Medical WordNet: A New Methodology for the Construction and Validation of Information Resources for Consumer Health , 2004, COLING.

[19]  M. Rosenzweig INTERNATIONAL KENT ROSANOFF WORD ASSOCIATION NORMS, EMPHASIZING THOSE OF FRENCH MALE AND FEMALE STUDENTS AND FRENCH WORKMEN, , 1970 .

[20]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[21]  Pierre Zweigenbaum,et al.  Towards a Medical Question-Answering System: a Feasibility Study , 2003, MIE.

[22]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[23]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[24]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[25]  Hector J. Levesque,et al.  Knowledge Representation and Reasoning , 2004 .

[26]  Marius Fieschi,et al.  MEDINFO 2004 - Proceedings of the 11th World Congress on Medical Informatics, San Francisco, California, USA, September 7-11, 2004 , 2004, MEDINFO.

[27]  Derrick Vail,et al.  Dorland's Illustrated Medical Dictionary , 1957 .

[28]  Olivier Bodenreider,et al.  Characterizing the definitions of anatomical concep ts in WordNet and specialized sources , 2002 .

[29]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[30]  Veda C. Storey,et al.  A methodology for creating user views in database design , 1988, TODS.

[31]  P. Gosling DORLAND’S ILLUSTRATED MEDICAL DICTIONARY , 2003, Australasian Chiropractic & Osteopathy.

[32]  Barry Smith,et al.  The Role of Foundational Relations in the Alignment of Biomedical Ontologies , 2004, MedInfo.

[33]  C. Lindberg The Unified Medical Language System (UMLS) of the National Library of Medicine. , 1990, Journal.

[34]  R. J. Cline,et al.  Consumer health information seeking on the Internet: the state of the art. , 2001, Health education research.

[35]  Keith Denny,et al.  Evidence-Based Medicine and Medical Authority , 1999, The Journal of medical humanities.

[36]  Paul Buitelaar,et al.  Extending Synsets with Medical Terms , 2002 .

[37]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[38]  C. Fellbaum An Electronic Lexical Database , 1998 .

[39]  Mary Beth Plane,et al.  Themes of holism, empowerment, access, and legitimacy define complementary, alternative, and integrative medicine in relation to conventional biomedicine. , 2003, Journal of alternative and complementary medicine.

[40]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[41]  Billy G. Claybrook,et al.  Defining Database Views as Data Abstractions , 1985, IEEE Transactions on Software Engineering.

[42]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[43]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[44]  A. Gerber,et al.  Evidence-based Medicine: Why do Opponents and Proponents use the same Arguments? , 2005, Health Care Analysis.

[45]  Olivier Bodenreider,et al.  Investigating subsumption in DL-based terminologies: A Case Study in SNOMED CT , 2004, KR-MED.

[46]  Mark Stefik,et al.  Introduction to knowledge systems , 1995 .

[47]  Ramanathan V. Guha,et al.  Building large knowledge-based systems , 1989 .

[48]  D. Lindberg,et al.  Unified Medical Language System , 2020, Definitions.

[49]  Olivier Bodenreider,et al.  Evaluation of WordNet as a source of lay knowledge for molecular biology and genetic diseases: A feasibility study , 2003, MIE.

[50]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[51]  Kevin Knight,et al.  Building a Large-Scale Knowledge Base for Machine Translation , 1994, AAAI.

[52]  A MillerGeorge,et al.  Using corpus statistics and WordNet relations for sense identification , 1998 .

[53]  Lynette Hirschman,et al.  Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.

[54]  斉藤 康己,et al.  Douglas B. Lenat and R. V. Guha : Building Large Knowledge-Based Systems, Representation and Inference in the Cyc Project, Addison-Wesley (1990). , 1990 .

[55]  P. Gorman,et al.  A taxonomy of generic clinical questions: classification study , 2000, BMJ : British Medical Journal.

[56]  Gio Wiederhold,et al.  Partitioning and composing knowledge , 1990, Inf. Syst..

[57]  L. Postman,et al.  Norms of word association , 1970 .

[58]  Douglas B. Lenat,et al.  Mapping Ontologies into Cyc , 2002 .

[59]  Gregory L. Murphy,et al.  Hierarchical structure in concepts and the basic level of categorization. , 1997 .

[60]  Edgardo Abalos,et al.  The tools and techniques of evidence-based medicine. , 2005, Best practice & research. Clinical obstetrics & gynaecology.

[61]  Wendy W. Chapman,et al.  In their own words? A terminological analysis of e-mail to a cancer information service , 2002, AMIA.

[62]  Olivier Bodenreider,et al.  Comparing terms, concepts and semantic classes in WordNet and the Unified Medical Language System , 2001 .