Developing an Arabic Infectious Disease Ontology to Include Non-Standard Terminology

Building ontologies is a crucial part of the semantic web endeavour. In recent years, research interest has grown rapidly in supporting languages such as Arabic in NLP in general but there has been very little research on medical ontologies for Arabic. We present a new Arabic ontology in the infectious disease domain to support various important applications including the monitoring of infectious disease spread via social media. This ontology meaningfully integrates the scientific vocabularies of infectious diseases with their informal equivalents. We use ontology learning strategies with manual checking to build the ontology. We applied three statistical methods for term extraction from selected Arabic infectious diseases articles: TF-IDF, C-value, and YAKE. We also conducted a study, by consulting around 100 individuals, to discover the informal terms related to infectious diseases in Arabic. In future work, we will automatically extract the relations for infectious disease concepts but for now these are manually created. We report two complementary experiments to evaluate the ontology. First, a quantitative evaluation of the term extraction results and an additional qualitative evaluation by a domain expert.

[1]  Cécile Paris,et al.  Survey of Text-based Epidemic Intelligence , 2019, ACM Comput. Surv..

[2]  Barry Smith,et al.  Infectious Disease Ontology , 2010 .

[3]  Ahmed Atwan,et al.  Semantic Relations Extraction and Ontology Learning from Arabic Texts—A Survey , 2018 .

[4]  Saeed Al-Bukhitan,et al.  Automatic Ontology-based Annotation of Food, Nutrition and Health Arabic Web Content , 2013, ANT/SEIT.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Iyad AlAgha,et al.  AR2SPARQL: An Arabic Natural Language Interface for the Semantic Web , 2015 .

[7]  Evon M. O. Abu-Taieh,et al.  Comparative Study , 2020, Definitions.

[8]  Lilac Al-Safadi,et al.  Developing Ontology for Arabic Blogs Retrieval , 2011 .

[9]  Miaad Raisan,et al.  Building a Core Arabic Ontology for Iraqi News , 2017 .

[10]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[11]  Steffen Staab,et al.  Ontology Learning from Text , 2000, International Conference on Applications of Natural Language to Data Bases.

[12]  Ricardo Campos,et al.  YAKE! Keyword extraction from single documents using multiple local features , 2020, Inf. Sci..

[13]  Mark A. Musen,et al.  The protégé project: a look back and a look forward , 2015, SIGAI.

[14]  Bhaskar Kapoor,et al.  A Comparative Study of Ontology building Tools in Semantic Web Applications , 2010 .

[15]  Abdel-Rahman Hedar,et al.  Sentiment Analysis of Arabic Slang Comments on Facebook , 2014, BIOINFORMATICS 2014.

[16]  Paul Rayson,et al.  Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases , 2019 .

[17]  Son Doan,et al.  An ontology-driven system for detecting global health events , 2010, COLING.

[18]  Christiane Fellbaum,et al.  Introducing the Arabic WordNet project , 2006 .

[19]  Abdulmohsen Al-Thubaity,et al.  Automatic Arabic term extraction from special domain corpora , 2014, 2014 International Conference on Asian Language Processing (IALP).

[20]  Waqar Mahmood,et al.  A survey of ontology learning techniques and applications , 2018, Database J. Biol. Databases Curation.

[21]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[22]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..