Mining arguments from cancer documents using Natural Language Processing and ontologies

In the medical domain, the continuous stream of scientific research contains contradictory results supported by arguments and counter-arguments. As medical expertise occurs at different levels, part of the human agents have difficulties to face the huge amount of studies, but also to understand the reasons and pieces of evidences claimed by the proponents and the opponents of the debated topic. To better understand the supporting arguments for new findings related to current state of the art in the medical domain we need tools able to identify arguments in scientific papers. Our work here aims to fill the above technological gap. We rely on the well-known interleaving of domain knowledge with natural language processing. To formalise the existing medical knowledge, we rely on ontologies. To structure the argumentation model we use also the expressivity and reasoning capabilities of Description Logics. To perform argumentation mining we formalise various linguistic patterns in a rule-based language. We tested our solution against a corpus of scientific papers related to breast cancer. The run experiments show a F-measure between 0.71 and 0.86 for identifying conclusions of an argument and between 0.65 and 0.86 for identifying premises of an argument.

[1]  Nancy Green,et al.  Identifying Argumentation Schemes in Genetics Research Articles , 2015, ArgMining@HLT-NAACL.

[2]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[3]  Wim Peters,et al.  Argument Discovery and Extraction with the Argument Workbench , 2015, ArgMining@HLT-NAACL.

[4]  Volker Haarslev,et al.  The RacerPro knowledge representation and reasoning system , 2012, Semantic Web.

[5]  Jodi Schneider,et al.  Identifying Consumers' Arguments in Text , 2012, SWAIE.

[6]  Marie-Francine Moens,et al.  Argumentation mining , 2011, Artificial Intelligence and Law.

[7]  Ioan Alfred Letia,et al.  Arguing with Justifications between Collaborating Agents , 2011, ArgMAS.

[8]  Graeme Hirst,et al.  Classifying arguments by scheme , 2011, ACL.

[9]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[10]  Pawel Kaplanski,et al.  Semantic rules representation in controlled natural language in FluentEditor , 2013, 2013 6th International Conference on Human System Interactions (HSI).

[11]  R. Doll,et al.  The causes of cancer: quantitative estimates of avoidable risks of cancer in the United States today. , 1981, Journal of the National Cancer Institute.

[12]  B. Vogelstein,et al.  Variation in cancer risk among tissues can be explained by the number of stem cell divisions , 2015, Science.

[13]  J. Couzin-Frankel Biomedicine. The bad luck of cancer. , 2015, Science.

[14]  A. Zauber,et al.  Cancer: Risk factors and random chances , 2015, Nature.

[15]  Dekang Lin,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 , 2011 .

[16]  Diego Calvanese,et al.  The description logic handbook: theory , 2003 .

[17]  Ioan Alfred Letia,et al.  Agreeing on Defeasible Commitments , 2006, DALT.

[18]  Neha Sehgal,et al.  The 6 th International Conference on Applied Energy - ICAE2014 The Drivers of Oil Prices - A MI 3 Algorithm approach , 2014 .

[19]  Manfred Stede,et al.  From Argument Diagrams to Argumentation Mining in Texts: A Survey , 2013, Int. J. Cogn. Informatics Nat. Intell..

[20]  Chris Reed,et al.  Argumentation Schemes , 2008 .

[21]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.