Automatic Extension of Medical Subject Headings (MeSH) Thesaurus to Emerging Research

The proliferation of information technology infrastructure in recent decades has allowed for unprecedented ease of access to centrally-aggregated scholarly literature and scientific knowledge. This massive aggregation of knowledge requires an information retrieval infrastructure, to include formalized ontologies, that is engineered with careful consideration. A number of domains benefit from the use of hierarchical controlled vocabularies, which may be used to provide a rich set of descriptive terms for characterizing entities in a consistent manner. There are clear benefits to the creation and maintenance of these ontologies: search and retrieval is made easier and analyses of the contained entities are enabled that would not otherwise be possible. However, there may be the opportunity to decrease the manual burden of ontology creation and maintenance with automated methods that leverage natural language processing and other computational techniques. This work presents an automated ontology creation methodology, adapted and expanded from prior work [1], that can produce a topic hierarchy from natural language and may be used to assist in the creation of a novel ontology or the expansion of existing ontologies. The effectiveness of the proposed method is studied using two examples: immunology, an established biomedical domain and a prominent topic in MeSH, and graphene, from the 2D materials domain with wide-ranging biomedical applications, which also has a sparse presence in MeSH