‘HypothesisFinder:’ A Strategy for the Detection of Speculative Statements in Scientific Text

Speculative statements communicating experimental findings are frequently found in scientific articles, and their purpose is to provide an impetus for further investigations into the given topic. Automated recognition of speculative statements in scientific text has gained interest in recent years as systematic analysis of such statements could transform speculative thoughts into testable hypotheses. We describe here a pattern matching approach for the detection of speculative statements in scientific text that uses a dictionary of speculative patterns to classify sentences as hypothetical. To demonstrate the practical utility of our approach, we applied it to the domain of Alzheimer's disease and showed that our automated approach captures a wide spectrum of scientific speculations on Alzheimer's disease. Subsequent exploration of derived hypothetical knowledge leads to generation of a coherent overview on emerging knowledge niches, and can thus provide added value to ongoing research activities.

[1]  Halil Kilicoglu,et al.  Recognizing speculative language in biomedical research articles: a linguistically motivated perspective , 2008, BMC Bioinformatics.

[2]  E. Stone,et al.  The genetics of quantitative traits: challenges and prospects , 2009, Nature Reviews Genetics.

[3]  David J. Reiss,et al.  BioNetBuilder: automatic integration of biological networks , 2006, Bioinform..

[4]  Roser Morante,et al.  Learning the Scope of Hedge Cues in Biomedical Texts , 2009, BioNLP@HLT-NAACL.

[5]  Ben Medlock,et al.  Exploring hedge identification in biomedical literature , 2008, J. Biomed. Informatics.

[6]  Tim Clark,et al.  Alzforum and SWAN: the present and future of scientific web communities , 2007, Briefings Bioinform..

[7]  Chaomei Chen,et al.  CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature , 2006, J. Assoc. Inf. Sci. Technol..

[8]  Jean-Pierre Desclés,et al.  BioExcom : Automatic Annotation and categorization of speculative sentences in biological literature by a Contextual Exploration processing , 2009 .

[9]  Martin Hofmann-Apitius,et al.  Knowledge environments representing molecular entities for the virtual physiological human , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[10]  dkk Donald Ary Introduction to research in education , 1972 .

[11]  Juliane Fluck,et al.  A Semantic Platform for Information Retrieval from E-Health Records , 2011, TREC.

[12]  János Csirik,et al.  The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts , 2008, BioNLP.

[13]  Guodong Zhou,et al.  Exploiting Rich Syntactic Features for Hedge Detection and Scope Finding ∗ , 2010 .

[14]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[15]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[16]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[17]  Martin Hofmann-Apitius,et al.  ADO: A disease ontology representing the domain knowledge specific to Alzheimer's disease , 2014, Alzheimer's & Dementia.

[18]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[19]  Andreas Vlachos,et al.  Detecting Speculative Language Using Syntactic Dependencies and Logistic Regression , 2010, CoNLL Shared Task.

[20]  György Szarvas,et al.  Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords , 2008, ACL.