Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE

The Arrowsmith two-node search is a strategy that is designed to assist biomedical investigators in formulating and assessing scientific hypotheses. More generally, it allows users to identify biologically meaningful links between any two sets of articles A and C in PubMed, even when these share no articles or authors in common and represent disparate topics or disciplines. The key idea is to relate the two sets of articles via title words and phrases (B-terms) that they share. We have created a free, public web-based version of the two-node search tool (http://arrowsmith.psych.uic.edu), have described its development and implementation, and have presented analyses of individual two-node searches. In this paper, we provide an updated tutorial intended for end-users, that covers the use of the tool for a variety of potential scientific use case scenarios. For example, one can assess a recent experimental, clinical or epidemiologic finding that connects two disparate fields of inquiry--identifying likely mechanisms to explain the finding, and choosing promising follow-up lines of investigation. Alternatively, one can assess whether the existing scientific literature lends indirect support to a hypothesis posed by the user that has not yet been investigated. One can also employ two-node searches to search for novel hypotheses. Arrowsmith provides a service that cannot be carried out feasibly via standard PubMed searches or by other available text mining tools.

[1]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation: Research Articles , 2005 .

[2]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[3]  Ronald N. Kostoff,et al.  Literature-Related Discovery (LRD): Introduction and background , 2008 .

[4]  Tanja Bekhuis Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy , 2006, Biomedical digital libraries.

[5]  Neil R. Smalheiser,et al.  ADAM: another database of abbreviations in MEDLINE , 2006, Bioinform..

[6]  Barend Mons,et al.  Online tools to support literature-based discovery in the life sciences , 2005, Briefings Bioinform..

[7]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[8]  Wei Zhou,et al.  Anne O'Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results , 2008, Journal of biomedical discovery and collaboration.

[9]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[10]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation , 2005, J. Assoc. Inf. Sci. Technol..

[11]  C. Cotman,et al.  Protease nexin-II, a potent antichymotrypsin, shows identity to amyloid beta-protein precursor. , 1989, Nature.

[12]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[13]  Marc Weeber,et al.  Literature-based Discovery , 2008 .

[14]  Jonathan D. Wren,et al.  Extending the mutual information measure to rank inferred literature relationships , 2004, BMC Bioinformatics.

[15]  Neil R. Smalheiser,et al.  A Quantitative Model for Linking Two Disparate Sets of Articles in Medline , 2022 .

[16]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[17]  Neil R. Smalheiser,et al.  Information discovery from complementary literatures: categorizing viruses as potential weapons , 2001 .

[18]  Neil R. Smalheiser The Arrowsmith Project: 2005 Status Report , 2005, Discovery Science.

[19]  Jonathan D. Wren,et al.  Knowledge discovery by automated identification and ranking of implicit relationships , 2004, Bioinform..

[20]  D. Swanson,et al.  Linking estrogen to Alzheimer's disease , 1996, Neurology.

[21]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[22]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[23]  Ramin Homayouni,et al.  Collaborative development of the Arrowsmith two node search interface designed for laboratory investigators , 2006, Journal of biomedical discovery and collaboration.

[24]  Neil R Smalheiser,et al.  Regulation of mammalian microRNA processing and function by cellular signaling and subcellular localization. , 2008, Biochimica et biophysica acta.

[25]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[26]  Arnold R. Kriegstein,et al.  Dividing Precursor Cells of the Embryonic Cortical Ventricular Zone Have Morphological and Molecular Characteristics of Radial Glia , 2002, The Journal of Neuroscience.

[27]  Chunqiang Tang,et al.  Answering relationship queries on the web , 2007, WWW '07.

[28]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[29]  Carl W. Cotman,et al.  Protease nexin-II, a potent anti-chymotrypsin, shows identity to amyloid β-protein precursor , 1989, Nature.

[30]  N. R Smalheiser,et al.  Predicting emerging technologies with the aid of text-based data mining: the micro approach , 2001 .