CFinder: An intelligent key concept finder from text for ontology development

Key concept extraction is a major step for ontology learning that aims to build an ontology by identifying relevant domain concepts and their semantic relationships from a text corpus. The success of ontology development using key concept extraction strongly relies on the degree of relevance of the key concepts identified. If the identified key concepts are not closely relevant to the domain, the constructed ontology will not be able to correctly and fully represent the domain knowledge. In this paper, we propose a novel method, named CFinder, for key concept extraction. Given a text corpus in the target domain, CFinder first extracts noun phrases using their linguistic patterns based on Part-Of-Speech (POS) tags as candidates for key concepts. To calculate the weights (or importance) of these candidates within the domain, CFinder combines their statistical knowledge and domain-specific knowledge indicating their relative importance within the domain. The calculated weights are further enhanced by considering an inner structural pattern of the candidates. The effectiveness of CFinder is evaluated with a recently developed ontology for the domain of 'emergency management for mass gatherings' against the state-of-the-art methods for key concept extraction including-Text2Onto, KP-Miner and Moki. The comparative evaluation results show that CFinder statistically significantly outperforms all the three methods in terms of F-measure and average precision.

[1]  Ahmed A. Rafea,et al.  TextOntoEx: Automatic ontology construction from natural English text , 2008, Expert Syst. Appl..

[2]  Paola Velardi,et al.  Text Mining Techniques to Automatically Enrich a Domain Ontology , 2003, Applied Intelligence.

[3]  Dunja Mladenic,et al.  OntoPlus: Text-driven ontology extension using ontology content, structure and co-occurrence information , 2011, Knowl. Based Syst..

[4]  Xing Jiang,et al.  Testing the trade-off between productivity and quality in research activities , 2010 .

[5]  Feiyu Xu,et al.  A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and their Relations with Bootstrapping , 2002, LREC.

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[7]  Wolf-Tilo Balke,et al.  The Semantic GrowBag Algorithm: Automatically Deriving Categorization Systems , 2007, ECDL.

[8]  Soh-Khim Ong,et al.  GRAONTO: A graph-based approach for automatic construction of domain ontology , 2011, Expert Syst. Appl..

[9]  Pari Delir Haghighi,et al.  Development and evaluation of ontology for intelligent decision support in medical emergency management for mass gatherings , 2013, Decis. Support Syst..

[10]  John R. Josephson,et al.  What Are They? Why Do We Need Them? , 1999 .

[11]  Yi-fang Brook Wu,et al.  Identifying important concepts from medical documents , 2006, J. Biomed. Informatics.

[12]  Analía Amandi,et al.  Supporting the discovery and labeling of non-taxonomic relationships in ontology learning , 2009, Expert Syst. Appl..

[13]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[14]  Ah-Hwee Tan,et al.  Mining ontological knowledge from domain-specific text documents , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[15]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[16]  Yuh-Min Chen,et al.  Enhancement of domain ontology construction using a crystallizing approach , 2011, Expert Syst. Appl..

[17]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[18]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[19]  Mohammed Bennamoun,et al.  Ontology learning from text: A look back and into the future , 2012, CSUR.

[20]  Siew Fan Wong,et al.  Automatic keyphrase extraction techniques: A review , 2013, 2013 IEEE Symposium on Computers & Informatics (ISCI).

[21]  Balakrishnan Chandrasekaran,et al.  What are ontologies, and why do we need them? , 1999, IEEE Intell. Syst..

[22]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[23]  Emanuele Pianta,et al.  Boosting Collaborative Ontology Building with Key-Concept Extraction , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[24]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[25]  Ahmed A. Rafea,et al.  KP-Miner: A keyphrase extraction system for English and Arabic documents , 2009, Inf. Syst..

[26]  陳榮靜,et al.  Using Recursive ART Network to Construction Domain Ontology Based on Term Frequency and Inverse Document Frequency , 2008 .

[27]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[28]  Hiroshi Nakagawa,et al.  A Simple but Powerful Automatic Term Extraction Method , 2002, COLING 2002.

[29]  Andrzej Bargiela,et al.  Probabilistic Topic Models for Learning Terminological Ontologies , 2010, IEEE Transactions on Knowledge and Data Engineering.

[30]  Kamal Sarkar A Hybrid Approach to Extract Keyphrases from Medical Documents , 2013, ArXiv.

[31]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[32]  Falk Scholer,et al.  User performance versus precision measures for simple search tasks , 2006, SIGIR.

[33]  Danuta Zakrzewska,et al.  Automatic Keyphrase Extraction , 2006, Ann. UMCS Informatica.

[34]  Yacine Rezgui,et al.  Text-based domain ontology building using Tf-Idf and metric clusters techniques , 2007, The Knowledge Engineering Review.

[35]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[36]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[37]  Emanuele Pianta,et al.  KX: A Flexible System for Keyphrase eXtraction , 2010, *SEMEVAL.