EVIDENCEMINER: Textual Evidence Discovery for Life Sciences

Traditional search engines for life sciences (e.g., PubMed) are designed for document retrieval and do not allow direct retrieval of specific statements. Some of these statements may serve as textual evidence that is key to tasks such as hypothesis generation and new finding validation. We present EVIDENCEMINER, a web-based system that lets users query a natural language statement and automatically retrieves textual evidence from a background corpora for life sciences. EVIDENCEMINER is constructed in a completely automated way without any human effort for training data annotation. It is supported by novel data-driven methods for distantly supervised named entity recognition and open information extraction. The entities and patterns are pre-computed and indexed offline to support fast online evidence retrieval. The annotation results are also highlighted in the original document for better visualization. EVIDENCEMINER also includes analytic functionalities such as the most frequent entity and relation summarization. EVIDENCEMINER can help scientists uncover important research issues, leading to more effective research and more in-depth quantitative analysis. The system of EVIDENCEMINER is available at https://evidenceminer.firebaseapp.com/.

[1]  Chris Callison-Burch,et al.  PerspectroScope: A Window to the World of Diverse Perspectives , 2019, ACL.

[2]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[3]  Qi Li,et al.  Distantly Supervised Biomedical Named Entity Recognition with Dictionary Expansion , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  Benno Stein,et al.  Building an Argument Search Engine for the Web , 2017, ArgMining@EMNLP.

[5]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[6]  Matthias Hagen,et al.  TARGER: Neural Argument Mining at Your Fingertips , 2019, ACL.

[7]  Cathy H. Wu,et al.  Pattern Discovery for Wide-Window Open Information Extraction in Biomedical Literature , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[9]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[10]  Yu Zhang,et al.  Open Information Extraction with Meta-pattern Discovery in Biomedical Literature , 2018, BCB.

[11]  Paolo Torroni,et al.  MARGOT: A web server for argumentation mining , 2016, Expert Syst. Appl..

[12]  Yu Zhang,et al.  PENNER: Pattern-enhanced Nested Named Entity Recognition in Biomedical Literature , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[13]  Jiawei Han,et al.  TruePIE: Discovering Reliable Patterns in Pattern-Based Information Extraction , 2018, KDD.

[14]  Weili Liu,et al.  Automatic Textual Evidence Mining in COVID-19 Literature , 2020, ArXiv.

[15]  Xuan Wang,et al.  Life-iNet: A Structured Network-Based Knowledge Exploration and Analytics System for Life Sciences , 2017, ACL.

[16]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[17]  Jiawei Han,et al.  MetaPAD: Meta Pattern Discovery from Massive Text Corpora , 2017, KDD.

[18]  Noam Slonim,et al.  Unsupervised corpus–wide claim detection , 2017, ArgMining@EMNLP.

[19]  Damian Jimenez,et al.  ClaimPortal: Integrated Monitoring, Searching, Checking, and Analytics of Factual Claims on Twitter , 2019, ACL.

[20]  Donald C. Comeau,et al.  LitSense: making sense of biomedical literature at sentence level , 2019, Nucleic Acids Res..

[21]  Teng Ren,et al.  Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.

[22]  Jiawei Han,et al.  Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach , 2018, SIGIR.

[23]  Xuan Wang,et al.  Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision , 2020, ArXiv.

[24]  Iryna Gurevych,et al.  ArgumenText: Searching for Arguments in Heterogeneous Sources , 2018, NAACL.