Discovery in an age of artificial intelligence

The value of scholarly literature today is assessed based on use – whether it is authors citing an article, readers downloading an article, editors seeking a higher impact factor, funders seeking a broad audience, or librarians calculating a cost-per-use. Much of the use today begins with a search on Google Scholar or Google which is often the single largest source of links to articles and comprises far more links than most publisher platforms. Given that search tools are the primary means of discovery today, how is discovery changing as artificial intelligence evolves? In the print environment journal publishers created the content, and other organizations developed the indexes that identified articles by using words in structured vocabularies. When Eugene Garfield created the Citation Indexes in the 1950s, he demonstrated that two articles are more likely to be related when they share many of the same references. This approach to discovering related works is especially useful in emerging fields before a common language is developed. During the 1980s Don Swanson, who was a professor of information science at the University of Chicago, recognized the hidden connection between two biomedical articles based on their hypotheses. Subsequent research validated his approach which pioneered the field of “text-based informatics” also known as “literature-based discovery” and served as the precursor for text mining. While there is great interest in data and text mining today, discussions on this topic indicate that progress is incremental and a surprisingly small number of researchers are taking advantage of the opportunities enabled by publishers.