Assessing the Representation of Occupation Information in Free-Text Clinical Documents Across Multiple Sources

There has been increasing recognition of the key role of social determinants like occupation on health. Given the relatively poor understanding of occupation information in electronic health records (EHRs), we sought to characterize occupation information within free-text clinical document sources. From six distinct clinical sources, 868 total occupation-related sentences were identified for the study corpus. Building off approaches from previous studies, refined annotation guidelines were created using the National Institute for Occupational Safety and Health Occupational Data for Health data model with elements added to increase granularity. Our corpus generated 2,005 total annotations representing 39 of 41 entity types from the enhanced data model. Highest frequency entities were: Occupation Description (17.7%); Employment Status - Not Specified (12.5%); Employer Name (11.0%); Subject (9.8%); Industry Description (6.2%). Our findings support the value of standardizing entry of EHR occupation information to improve data quality for improved patient care and secondary uses of this information.