Using text mining to link journal articles to neuroanatomical databases

The electronic linking of neuroscience information, including data embedded in the primary literature, would permit powerful queries and analyses driven by structured databases. This task would be facilitated by automated procedures that can identify biological concepts in journals. Here we apply an approach for automatically mapping formal identifiers of neuroanatomical regions to text found in journal abstracts, applying it to a large body of abstracts from the Journal of Comparative Neurology (JCN). The analyses yield over 100,000 brain region mentions, which we map to 8,225 brain region concepts in multiple organisms. Based on the analysis of a manually annotated corpus, we estimate mentions are mapped at 95% precision and 63% recall. Our results provide insights into the patterns of publication on brain regions and species of study in JCN but also point to important challenges in the standardization of neuroanatomical nomenclatures. We find that many terms in the formal terminologies never appear in a JCN abstract, and, conversely, many terms that authors use are not reflected in the terminologies. To improve the terminologies, we deposited 136 unrecognized brain regions into the Neuroscience Lexicon (NeuroLex). The training data, terminologies, normalizations, evaluations, and annotated journal abstracts are freely available at http://www.chibi.ubc.ca/WhiteText/. J. Comp. Neurol. 520:1772–1783, 2012. © 2011 Wiley Periodicals, Inc.

[1]  Martone Maryann A multi-scale parts list for the brain: community-based ontology curation for neuroinformatics with NeuroLex.org , 2010 .

[2]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[3]  Leon French,et al.  Informatics in neuroscience , 2007, Briefings Bioinform..

[4]  Larry W. Swanson,et al.  Brain architecture management system , 2007, Neuroinformatics.

[5]  Michael Gertz,et al.  Neuroanatomical term generation and comparison between two terminologies , 2003, Neuroinformatics.

[6]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[7]  Allan R. Jones,et al.  Genome-wide atlas of gene expression in the adult mouse brain , 2007, Nature.

[8]  Perry L. Miller,et al.  The Human Brain Project: neuroinformatics tools for integrating, searching and modeling multidisciplinary neuroscience data , 1998, Trends in Neurosciences.

[9]  Hans-Michael Müller,et al.  The Neuroscience Information Framework: A Data and Knowledge Environment for Neuroscience , 2008, Neuroinformatics.

[10]  Larry W. Swanson,et al.  BAMS Neuroanatomical Ontology: Design and Implementation , 2008, Frontiers Neuroinformatics.

[11]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[12]  Jack Park,et al.  Creating neuroscience ontologies. , 2007, Methods in molecular biology.

[13]  S. Koslow,et al.  Discovery and Integrative Neuroscience , 2005, Clinical EEG and neuroscience.

[14]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[15]  Perry L. Miller,et al.  Text mining neuroscience journal articles to populate neuroscience databases , 2007, Neuroinformatics.

[16]  Leon French,et al.  Neuroinformatics Original Research Article , 2022 .

[17]  R. Fredriksson,et al.  C6ORF192 Forms a Unique Evolutionary Branch Among Solute Carriers (SLC16, SLC17, and SLC18) and Is Abundantly Expressed in Several Brain Regions , 2010, Journal of Molecular Neuroscience.

[18]  Nello Cristianini,et al.  Comparison of vector space model methodologies to reconcile cross-species neuroanatomical concepts , 2005, Neuroinformatics.

[19]  Finn Årup Nielsen,et al.  The Brede database: a small database for functional neuroimaging , 2000 .

[20]  Mark A. Musen,et al.  Creating Mappings For Ontologies in Biomedicine: Simple Methods Work , 2009, AMIA.

[21]  Larry W. Swanson,et al.  Brain Maps: Structure of the Rat Brain , 1992 .

[22]  Russell A. Poldrack,et al.  Large-scale automated synthesis of human functional neuroimaging data , 2011, Nature Methods.

[23]  Mark A. Musen,et al.  The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.

[24]  S. Hayasaka,et al.  A Network of Genes, Genetic Disorders, and Brain Areas , 2011, PloS one.

[25]  Jessica A. Turner,et al.  The NIFSTD and BIRNLex Vocabularies: Building Comprehensive Ontologies for Neuroscience , 2008, Neuroinformatics.

[26]  Douglas M. Bowden,et al.  BrainInfo: An Online Interactive Brain Atlas and Nomenclature , 2003 .

[27]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[28]  Hans-Michael Müller,et al.  Textpresso for Neuroscience: Searching the Full Text of Thousands of Neuroscience Research Papers , 2008, Neuroinformatics.