NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information

Over the last 8 years, the National Cancer Institute (NCI) has launched a major effort to integrate molecular and clinical cancer-related information within a unified biomedical informatics framework, with controlled terminology as its foundational layer. The NCI Thesaurus is the reference terminology underpinning these efforts. It is designed to meet the growing need for accurate, comprehensive, and shared terminology, covering topics including: cancers, findings, drugs, therapies, anatomy, genes, pathways, cellular and subcellular processes, proteins, and experimental organisms. The NCI Thesaurus provides a partial model of how these things relate to each other, responding to actual user needs and implemented in a deductive logic framework that can help maintain the integrity and extend the informational power of what is provided. This paper presents the semantic model for cancer diseases and its uses in integrating clinical and molecular knowledge, more briefly examines the models and uses for drug, biochemical pathway, and mouse terminology, and discusses limits of the current approach and directions for future work.

[1]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.

[2]  Roberta Vitali,et al.  c‐Kit is preferentially expressed in MYCN‐amplified neuroblastoma and its effect on cell proliferation is inhibited in vitro by STI‐571 , 2003, International journal of cancer.

[3]  F. Hirsch,et al.  Biomarkers for prediction of sensitivity to EGFR inhibitors in non-small cell lung cancer , 2005, Current opinion in oncology.

[4]  S. Hunger,et al.  Chromosomal translocations involving the E2A gene in acute lymphoblastic leukemia: clinical features and molecular pathogenesis. , 1996, Blood.

[5]  S. Raimondi,et al.  Philadelphia chromosome-positive acute lymphoblastic leukemia in children: durable responses to chemotherapy associated with low initial white blood cell counts , 1997, Leukemia.

[6]  W. K. Alfred Yung PATHOLOGY AND GENETICS OF TUMOURS OF THE NERVOUS SYSTEM , 2002 .

[7]  Ming-Qing Du,et al.  Resistance of t(11;18) positive gastric mucosa-associated lymphoid tissue lymphoma to Helicobacter pylori eradication therapy , 2001, The Lancet.

[8]  Roger Gaedigk,et al.  Discovery of a novel nonfunctional cytochrome P450 2D6 allele, CYP2D6*42, in African American subjects , 2003, Clinical pharmacology and therapeutics.

[9]  C. Pui,et al.  Acute lymphoblastic leukemia. , 1998, The New England journal of medicine.

[10]  F. Mertens,et al.  World Health Organization Classification of Tumours. Pathology and Genetics of Tumours of Soft Tissue and Bone , 2002 .

[11]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[12]  P Paras,et al.  College of American Pathologists "Series X" program. , 1983, International journal of nuclear medicine and biology.

[13]  Sherri de Coronado,et al.  Information Standards Within the National Cancer Institute , 2002 .

[14]  I. Bleiweiss,et al.  Frequency and carrier risk associated with common BRCA1 and BRCA2 mutations in Ashkenazi Jewish breast cancer patients. , 1998, American journal of human genetics.

[15]  James A. Hendler,et al.  The National Cancer Institute's Thésaurus and Ontology , 2003, J. Web Semant..

[16]  Gilberto Fragoso,et al.  Enhancing Quality of Retrieval Through Concept Edit History , 2003, AMIA.

[17]  H P Koeffler,et al.  TEL/AML-1 dimerizes and is associated with a favorable outcome in childhood acute lymphoblastic leukemia. , 1996, Blood.

[18]  Jonathan J. Shuster,et al.  Poor prognosis of children with pre-B acute lymphoblastic leukemia is associated with the t(1;19)(q23;p13): a Pediatric Oncology Group study , 1990 .

[19]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[20]  R. Poland,et al.  Analysis of the CYP2D6 gene polymorphism and enzyme activity in African-Americans in southern California. , 2001, Pharmacogenetics.

[21]  Jörg Kalla,et al.  Gastric marginal zone B-cell lymphomas of MALT type develop along 2 distinct pathogenetic pathways. , 2002, Blood.

[22]  Maria Grazia Valsecchi,et al.  Incidence and clinical relevance of TEL/AML1 fusion genes in children with acute lymphoblastic leukemia enrolled in the German and Italian multicenter therapy trials , 1997 .

[23]  V. Devita,et al.  Cancer : Principles and Practice of Oncology , 1982 .

[24]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[25]  C. Wolf‐peeters,et al.  Marginal zone cell lymphoma—an update on recent advances , 2002, Histopathology.

[26]  Kent A. Spackman,et al.  Compositional concept representation using SNOMED: towards further convergence of clinical terminologies , 1998, AMIA.

[27]  J E Rogers,et al.  Using the GRAIL language for classification management. , 1997, Studies in health technology and informatics.

[28]  Sakari Knuutila,et al.  Favorable outcome in 20‐year follow‐up of children with very‐low‐risk ALL and minimal standard therapy, with special reference to TEL–AML1 fusion , 2004, Pediatric blood & cancer.

[29]  Peter L. Elkin,et al.  Initializing the VA medication reference terminology using UMLS metathesaurus co-occurrences , 2002, AMIA.

[30]  J J Shuster,et al.  Clinical characteristics and treatment outcome of childhood acute lymphoblastic leukemia with the t(4;11)(q21;q23): a collaborative study of 40 cases. , 1991, Blood.

[31]  A L Rector,et al.  Goals for concept representation in the GALEN project. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[32]  J. Bard,et al.  Ontologies in biology: design, applications and future challenges , 2004, Nature Reviews Genetics.

[33]  Dean F. Sittig Grand challenges in medical informatics? , 1994, J. Am. Medical Informatics Assoc..

[34]  Mark S. Tuttle,et al.  NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results , 2004, MedInfo.

[35]  P S Gaynon,et al.  Expression of TEL-AML1 Fusion Transcripts and Response to Induction Therapy in Standard Risk Acute Lymphoblastic Leukemia , 2001, Leukemia & lymphoma.

[36]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[37]  John E. Mattison,et al.  Kaiser Permanente's Convergent Medical Terminology , 2004, MedInfo.

[38]  Alan L. Rector,et al.  Ontological Issues in using a Description Logic to Represent Medical Concepts: Experience from GALEN , 2004 .

[39]  Meland,et al.  THE USE OF MOLECULAR PROFILING TO PREDICT SURVIVAL AFTER CHEMOTHERAPY FOR DIFFUSE LARGE-B-CELL LYMPHOMA , 2002 .

[40]  Christopher G. Chute,et al.  Integrating Pharmacokinetics Knowledge into a Drug Ontology As an Extension to Support Pharmacogenomics , 2003, AMIA.

[41]  A T Look,et al.  Molecular Genetics of Childhood Leukemias , 1998, Journal of pediatric hematology/oncology.

[42]  P S Gaynon,et al.  Prognostic impact of trisomies of chromosomes 10, 17, and 5 among children with acute lymphoblastic leukemia and high hyperdiploidy (> 50 chromosomes). , 2000, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[43]  Jonathan J. Shuster,et al.  Clinical Characteristics and Treatment Outcome of Childhood Acute Lymphoblastic Leukemia With the t ( 4 ; l l ) ( q 21 ; q 23 ) : A Collaborative Study of 40 Cases , 2003 .

[44]  Mary E. Mangan,et al.  The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data , 2005, Genome Biology.

[45]  José L. V. Mejino,et al.  The role of definitions in biomedical concept representation , 2001, AMIA.

[46]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[47]  Scott Gustafson,et al.  caCORE: A common infrastructure for cancer informatics , 2003, Bioinform..

[48]  Manuel Hidalgo,et al.  An Epidermal Growth Factor Receptor Intron 1 Polymorphism Mediates Response to Epidermal Growth Factor Receptor Inhibitors , 2004, Cancer Research.

[49]  S. Trent Rosenbloom,et al.  VA National Drug File Reference Terminology: A Cross-Institutional Content Coverage Study , 2004, MedInfo.

[50]  F Lampert,et al.  Philadelphia chromosome-positive (Ph+) childhood acute lymphoblastic leukemia: good initial steroid response allows early prediction of a favorable treatment outcome. , 1998, Blood.

[51]  C. Pui,et al.  Outcome of treatment in children with Philadelphia chromosome-positive acute lymphoblastic leukemia. , 2000, The New England journal of medicine.

[52]  Jennifer Golbeck,et al.  Modeling a description logic vocabulary for cancer research , 2005, J. Biomed. Informatics.

[53]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[54]  P. Marynen,et al.  The apoptosis inhibitor gene API2 and a novel 18q gene, MLT, are recurrently rearranged in the t(11;18)(q21;q21) associated with mucosa-associated lymphoid tissue lymphomas. , 1999, Blood.

[55]  Alberto L. Horenstein,et al.  c-kit is expressed in soft tissue sarcoma of neuroectodermic origin and its ligand prevents apoptosis of neoplastic cells. , 1998, Blood.