Domain Specific Ontology Extractor For Indian Languages

We present a k-partite graph learning algorithm for ontology extraction from unstructured text. The algorithm divides the initial set of terms into different partitions based on information content of the terms and then constructs ontology by detecting subsumption relation between terms in different partitions. This approach not only reduces the amount of computation required for ontology construction but also provides an additional level of term filtering. The experiments are conducted for Hindi and English and the performance is evaluated by comparing resulting ontology with manually constructed ontology for Health domain. We observe that our approach significantly improves the precision. The proposed approach does not require sophisticated NLP tools such as NER and parser and can be easily adopted for any language.

[1]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[2]  Paola Velardi,et al.  TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities , 2007, IESA.

[3]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[4]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[5]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[6]  Lina Zhou,et al.  Ontology learning: state of the art and open issues , 2007, Inf. Technol. Manag..

[7]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Eugene Charniak,et al.  Determining the specificity of nouns from text , 1999, EMNLP.

[10]  Aldo de Moor,et al.  Context-driven Disambiguation in Ontology Elicitation ∗ , 2005 .

[11]  Steffen Staab,et al.  Learning Taxonomic Relations from Heterogeneous Sources of Evidence , 2005 .

[12]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[13]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[14]  Ralf Steinmetz,et al.  Automatic Taxonomy Extraction in Different Languages Using Wikipedia and Minimal Language-Specific Information , 2012, CICLing.

[15]  Sophia Ananiadou,et al.  The C-value/NC-value Method of Automatic Recognition for Multi-Word Terms , 1998, ECDL.

[16]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[17]  Barbara Plank,et al.  Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) , 2010 .

[18]  Peter Sawyer,et al.  BEST PAPERS OF RE’10: REQUIREMENTS ENGINEERING IN A MULTI-FACETED WORLD Relevance-based abstraction identification: technique and evaluation , 2022 .

[19]  Key-Sun Choi,et al.  Taxonomy Learning using Term Specificity and Similarity , 2006, OntologyLearning@COLING/ACL.

[20]  Yurdaer N. Doganata,et al.  Glossary extraction and utilization in the information search and delivery system for IBM Technical Support , 2004, IBM Syst. J..

[21]  Lee Gillam,et al.  University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER) , 1999, TREC.

[22]  Mirella Lapata,et al.  Taxonomy Induction Using Hierarchical Random Graphs , 2012, NAACL.

[23]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[24]  Pushpak Bhattacharyya,et al.  IndoWordNet , 2010, LREC.

[25]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[26]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.