The Common Data Elements for Cancer Research: Remarks on Functions and Structure

OBJECTIVES The National Cancer Institute (NCI) has developed the Common Data Elements (CDE) to serve as a controlled vocabulary of data descriptors for cancer research, to facilitate data interchange and inter-operability between cancer research centers. We evaluated CDE's structure to see whether it could represent the elements necessary to support its intended purpose, and whether it could prevent errors and inconsistencies from being accidentally introduced. We also performed automated checks for certain types of content errors that provided a rough measure of curation quality. METHODS Evaluation was performed on CDE content downloaded via the NCI's CDE Browser, and transformed into relational database form. Evaluation was performed under three categories: 1) compatibility with the ISO/IEC 11179 metadata model, on which CDE structure is based, 2) features necessary for controlled vocabulary support, and 3) support for a stated NCI goal, set up of data collection forms for cancer research. RESULTS Various limitations were identified both with respect to content (inconsistency, insufficient definition of elements, redundancy) as well as structure--particularly the need for term and relationship support, as well as the need for metadata supporting the explicit representation of electronic forms that utilize sets of common data elements. CONCLUSIONS While there are numerous positive aspects to the CDE effort, there is considerable opportunity for improvement. Our recommendations include review of existing content by diverse experts in the cancer community; integration with the NCI thesaurus to take advantage of the latter's links to nationally used controlled vocabularies, and various schema enhancements required for electronic form support.

[1]  J. Cimino Desiderata for Controlled Medical Vocabularies in the Twenty-First Century , 1998, Methods of Information in Medicine.

[2]  G. Tomlinson,et al.  Screening for depression in head and neck cancer , 2004, Psycho-oncology.

[3]  A. M. van Ginneken,et al.  Considerations for the representation of meta-data for the support of structured data entry. , 2003 .

[4]  Olivier Bodenreider,et al.  Circular hierarchical relationships in the UMLS: etiology, diagnosis, treatment, complications and prevention , 2001, AMIA.

[5]  Ian Horrocks,et al.  The GRAIL concept modelling language for medical terminology , 1997, Artif. Intell. Medicine.

[6]  H R Solbrig,et al.  Metadata and the reintegration of clinical information: ISO 11179. , 2000, M.D. computing : computers in medical practice.

[7]  Perry L. Miller,et al.  Application of Technology: Managing Attribute-Value Clinical Trials Data Using the ACT/DB Client-Server Database System , 1998, J. Am. Medical Informatics Assoc..

[8]  D A Evans,et al.  Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[9]  Christopher G Chute,et al.  The Open Terminology Services (OTS) project. , 2003, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[10]  Toshihisa Takagi,et al.  [Introduction to gene ontology]. , 2003, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[11]  Luis N. Marenco,et al.  Reengineering a database for clinical trials management: lessons for system architects. , 2000, Controlled clinical trials.

[12]  Ivar Jacobson,et al.  The Unified Modeling Language User Guide , 1998, J. Database Manag..

[13]  Mark S. Tuttle,et al.  NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results , 2004, MedInfo.

[14]  Christopher G. Chute,et al.  Terminology Access Methods Leveraging LDAP Resources , 2004, MedInfo.

[15]  A M van Ginneken,et al.  Considerations for the Representation of Meta-Data for the Support of Structured Data Entry , 2003, Methods of Information in Medicine.

[16]  D. Lindberg,et al.  Unified Medical Language System , 2020, Definitions.

[17]  Alan L. Rector,et al.  OpenGALEN: Open Source Medical Terminology and Tools , 2003, AMIA.

[18]  Stefan Schulz,et al.  Towards a Broad-Coverage Biomedical Ontology Based on Description Logics , 2002, Pacific Symposium on Biocomputing.

[19]  Thomas M. White,et al.  Model Formation: Extending the LOINC Conceptual Schema to Support Standardized Assessment Instruments , 2002, J. Am. Medical Informatics Assoc..

[20]  J J Cimino,et al.  Formal Descriptions and Adaptive Mechanisms for Changes in Controlled Medical Vocabularies , 1996, Methods of Information in Medicine.

[21]  G. Hordijk,et al.  Sociodemographic factors and quality of life as prognostic indicators in head and neck cancer. , 2001, European journal of cancer.

[22]  Christopher G. Chute,et al.  Combining Rule-Based Methods and Latent Semantic Analysis for Ontology Structure Construction , 2004 .

[23]  Martin Romacker,et al.  Part-whole reasoning in medical ontologies revisited-introducing SEP triplets into classification-based description logics , 1998, AMIA.

[24]  Cynthia Brandt,et al.  WebEAV: automatic metadata-driven generation of web interfaces to entity-attribute-value databases. , 2000, Journal of the American Medical Informatics Association : JAMIA.