BRret: Retrieval of Brain Research Related Literature.

Research Domain Criteria (RDoC), which is a recently introduced framework for mental illness, utilizes various units of analysis from genetics, neural circuits, etc., for accurate multi-dimensional classification of mental illnesses. Due to the large amount of relevant biomedical research available, automating the process of extracting evidence from the literature to assist with the curation of the RDoC matrix is essential for processing the full breadth of data in an accurate and cost-effective manner. In this work, we formulate the task of information retrieval of brain research literature from general PubMed abstracts. We develop BRret (Brain Research retriever), a novel algorithm for brain research related article retrieval. We use a large dataset of PubMed abstracts annotated with RDoC concepts to demonstrate the effectiveness of BRret. To the best of our knowledge, this is the first study aimed at automated retrieval of brain research related literature.

[1]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[2]  Indika Kahanda,et al.  Automated Biomedical Text Classification with Research Domain Criteria , 2018 .

[3]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[4]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[5]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[8]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[10]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[11]  T. Insel The NIMH Research Domain Criteria (RDoC) Project: precision medicine for psychiatry. , 2014, The American journal of psychiatry.

[12]  Viv Bewick,et al.  Statistics review 13: Receiver operating characteristic curves , 2004, Critical care.

[13]  Jiawei Han,et al.  Text classification from positive and unlabeled documents , 2003, CIKM '03.

[14]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[15]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[16]  T. Insel,et al.  Toward the future of psychiatric diagnosis: the seven pillars of RDoC , 2013, BMC Medicine.

[17]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[18]  Martin H. Schaefer,et al.  MedlineRanker: flexible ranking of biomedical literature , 2009, Nucleic Acids Res..

[19]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[20]  N. Craddock,et al.  DSM-5 and RDoC: progress in psychiatry research? , 2013, Nature Reviews Neuroscience.

[21]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[22]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[23]  Bruce N Cuthbert,et al.  The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology , 2014, World psychiatry : official journal of the World Psychiatric Association.

[24]  J. Wakefield Wittgenstein's nightmare: why the RDoC grid needs a conceptual dimension , 2014, World psychiatry : official journal of the World Psychiatric Association.

[25]  Florian Schäfer,et al.  Word importance-based similarity of documents metric (WISDM): Fast and scalable document similarity metric for analysis of scientific documents , 2017, WOSP@JCDL.

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..