Computer Algorithm for Automated Work Group Classification From Free Text: The DREAM Technique

Objective: This study developed and tested a computer method to automatically assign subjects to aggregate work groups based on their free text work descriptions. Methods: The Double Root Extended Automated Matcher (DREAM) algorithm classifies individuals based on pairs of subjects’ free text word roots in common with those of standard classification systems and several explicitly defined linkages between term roots and aggregates. Results: DREAM effectively analyzed free text from 5887 participants in a multisite chronic obstructive pulmonary disease prevention study (Lung Health Study). For a test set of 533 cases, DREAMs classifications compared favorably with those of a four-human panel. The humans rated the accuracy of DREAM as good or better in 80% of the test cases. Conclusions: Automated text interpretation is a promising tool for analyzing large data sets for applications in data mining, research, and surveillance. Work descriptive information is most useful when it can link an individual to aggregate entities that have occupational health relevance. Determining the appropriate group requires considerable expertise. This article describes a new method for making such assignments using a computer algorithm to reduce dependence on the limited number of occupational health experts. In addition, computer algorithms foster consistency of assignments.

[1]  J Siemiatycki,et al.  Costs and statistical power associated with five methods of collecting occupation exposure information for population-based case-control studies. , 1989, American journal of epidemiology.

[2]  H Kromhout,et al.  Performance of two general job-exposure matrices in a study of lung cancer morbidity in the Zutphen cohort. , 1992, American journal of epidemiology.

[3]  W. Bailey,et al.  Design of the Lung Health Study: a randomized clinical trial of early intervention for chronic obstructive pulmonary disease. , 1993, Controlled clinical trials.

[4]  W K Sieber,et al.  Job tasks, potential exposures, and health risks of laborers employed in the construction industry. , 1993, American journal of industrial medicine.

[5]  A. Buist,et al.  Chronic Obstructive Pulmonary Disease Early Intervention Trial (Lung Health Study). Baseline characteristics of randomized participants. , 1993, Chest.

[6]  P. Stewart,et al.  Occupational case-control studies: I. Collecting information on work histories and work-related exposures. , 1994, American journal of industrial medicine.

[7]  Naomi Sager,et al.  Research Paper: Natural Language Processing and the Representation of Clinical Data , 1994, J. Am. Medical Informatics Assoc..

[8]  P A Stewart,et al.  Occupational case-control studies: II. Recommendations for exposure assessment. , 1994, American journal of industrial medicine.

[9]  Primary care role in preventing occupational and environmental respiratory disease. , 1994, Primary care.

[10]  G W Moore,et al.  SNOMED-encoded surgical pathology databases: a tool for epidemiologic investigation. , 1996, Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc.

[11]  P. Stewart,et al.  A novel approach to data collection in a case-control study of cancer and occupational exposures. , 1996, International journal of epidemiology.

[12]  Christian Lovis,et al.  Automatic Extraction of Linguistic Knowledge from an International Classification , 1998, MedInfo.

[13]  P A Stewart,et al.  Questionnaires for collecting detailed occupational information for community-based case control studies. , 1998, American Industrial Hygiene Association journal.

[14]  George Hripcsak,et al.  Research Paper: A Reliability Study for Evaluating Information Extraction from Radiology Reports , 1999, J. Am. Medical Informatics Assoc..

[15]  James Geller,et al.  A methodology for partitioning a vocabulary hierarchy into trees , 1999, Artif. Intell. Medicine.

[16]  Robert H. Baud,et al.  Galen : a third generation terminology tool to support a multipurpose national coding system for surgical procedures , 1999, MIE.

[17]  P. Bakke,et al.  Performance of population specific job exposure matrices (JEMs): European collaborative analyses on occupational risk factors for chronic obstructive pulmonary disease with job exposure matrices (ECOJEM) , 2000, Occupational and environmental medicine.

[18]  Ricky K. Taira,et al.  Evaluation of SNOMED3.5 in representing concepts in chest radiology reports: integration of a SNOMED mapper with a radiology reporting workstation , 2000, AMIA.

[19]  Daniel R. Luna,et al.  Development of the Spanish version of the Systematized Nomenclature of Medicine: methodology and main issues , 2000, AMIA.

[20]  Peter J. Haug,et al.  A Comparison of Classification Algorithms to Automatically Identify Chest X-Ray Reports That Support Pneumonia , 2001, J. Biomed. Informatics.

[21]  Friedrich Steimann,et al.  On the use and usefulness of fuzzy sets in medical AI , 2001, Artif. Intell. Medicine.

[22]  Michael Krauthammer,et al.  A knowledge model for the interpretation and visualization of NLP-parsed discharged summaries , 2001, AMIA.

[23]  P. Harber,et al.  Time and Knowledge Barriers to Recognizing Occupational Disease , 2001, Journal of occupational and environmental medicine.

[24]  Joel D. Martin,et al.  Getting to the (c)ore of knowledge: mining biomedical literature , 2002, Int. J. Medical Informatics.

[25]  D Hüske-Kraus,et al.  Text Generation in Clinical Medicine – a Review , 2003, Methods of Information in Medicine.

[26]  H Kromhout,et al.  The use of occupation and industry classifications in general population studies. , 2003, International journal of epidemiology.

[27]  Ki Moon Bang,et al.  Airflow obstruction attributable to work in industry and occupation among U.S. race/ethnic groups: a study of NHANES III data. , 2004, American journal of industrial medicine.

[28]  P. Harber,et al.  Working Words: Real-Life Lexicon of North American Workers , 2005, Journal of occupational and environmental medicine.

[29]  P. Harber,et al.  Influence of Residency Training on Occupational Medicine Practice Patterns , 2005, Journal of occupational and environmental medicine.