Extracting Occupational Therapy Concepts to Develop Domain Ontology

Recently, unstructured data on the World Wide Web has generated significant interest in the extraction of text, emails, web pages, reports and research papers in their raw form. Far more interestingly, extracting information from a specific domain using distributed corpora from the World Wide Web is a vital step towards creating corpus annotation. This paper describes a method of annotation, based on Occupational Therapy (OT) concepts, to build domain ontology using Natural Language Programming (NLP) technology. We used Java Annotation Patterns Engine (JAPE) grammar to support regular expression matching and thus annotate OT concepts using a GATE developer tool. This speeds up the time-consuming development of the ontology, which is important for experts in the domain facing time constraints and high workloads. The rules provide significant results: the pattern matching of OT concepts based on the lookup list produced 403 correct concepts and the accuracy was generally higher. Using NLP technique is a good approach to reducing the domain expert's work, and the results can be evaluated . Keywords-Ontology; Information extracting; Regular expression; Natural Language Programming.

[1]  Joan Lu,et al.  Ontology of Information Science Based On OWL for the Semantic Web , 2010 .

[2]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[3]  Thomas C. Jepsen,et al.  Just What Is an Ontology, Anyway? , 2009, IT Professional.

[4]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[5]  Vladan Devedzic,et al.  Model driven architecture and ontology development , 2006 .

[6]  Wei Li,et al.  Information Extraction Supported Question Answering , 1999, TREC.

[7]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[8]  Steffen Staab,et al.  Authoring and annotation of web pages in CREAM , 2002, WWW.

[9]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language , 2009 .

[10]  Joan Lu,et al.  Ontocop: A Virtual Community of Practice to Create Ontology of Information Science (IS) , 2010, International Conference on Internet Computing.

[11]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Asunción Gómez-Pérez,et al.  METHONTOLOGY: From Ontological Art Towards Ontological Engineering , 1997, AAAI 1997.

[13]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[14]  H. Cunningham,et al.  Developing Language Processing Components with GATE , 2001 .

[15]  Kalina Bontcheva,et al.  Developing Language Processing Components with GATE (a User Guide) , 2003 .

[16]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language Manchester Syntax (Second Edition) , 2012 .