Modeling public health interventions for improved access to the gray literature.

OBJECTIVE Much of the useful information in public health (PH) is considered gray literature, literature that is not available through traditional, commercial pathways. The diversity and nontraditional format of this information makes it difficult to locate. The aim of this Robert Wood Johnson Foundation-funded project is to improve access to PH gray literature reports through established natural language processing (NLP) techniques. This paper summarizes the development of a model for representing gray literature documents concerning PH interventions. METHODS The authors established a model-based approach for automatically analyzing and representing the PH gray literature through the evaluation of a corpus of PH gray literature from seven PH Websites. Input from fifteen PH professionals assisted in the development of the model and prioritization of elements for NLP extraction. RESULTS Of 365 documents collected, 320 documents were used for analysis to develop a model of key text elements of gray literature documents relating to PH interventions. Survey input from a group of potential users directed the selection of key elements to include in the document summaries. CONCLUSIONS A model of key elements relating to PH interventions in the gray literature can be developed from the ground up through document analysis and input from members of the PH workforce. The model provides a framework for developing a method to identify and store key elements from documents (metadata) as document surrogates that can be used for indexing, abstracting, and determining the shape of the PH gray literature.