Automatic Construction of Lightweight Domain Ontologies for Chemical Engineering Risk Management

Curtin University of Technology, Department of Chemical Engineering, GPO Box U1987,Perth, WA 6001, Australia; tel. +61-8-9266 7581, fax. +61-8-92662681, email:{N.Balliu,H.Wu,M.O.Tade}@curtin.edu.auThe need for domain ontologies in mission critical applications such as risk management andhazard identification is becoming more and more pressing. Mo st research on ontology learningconducted in the academia remains unrealistic for real-world applications. One of the mainproblems is the dependence on non-incremental, rare knowledge and textual resources, andmanually-crafted patterns and rules. This paper reports work in progress aiming to address suchundesirable dependencies during ontology construction. Initial experiments using a workingprototypeof the systemrevealedpromisingpotentialsin automaticallyconstructinghigh-qualitydomain ontologies using real-world texts.1. IntroductionHazard identification is a crucial aspect of risk management . The identification of hazards isthe prerequisite step to the analysis and treatment of risks. As such, clear definitions on the typeof risks and the processes involved for hazard avoidance and treatment are necessary. Unam-biguous definition enables effective communication, which is crucial in passing on experiencesand expertise to trainees and students dealing with dangerous chemicals and products. However,very often such knowledge is embedded in the domain experts’ mind, or scattered in variousformat, e.g. operation notes, online resources, scientific publications or technical reports. Anintegrated knowledge structure known as an ontology is therefore becoming necessary for de-scribing the concepts and processes to ease the process of information sharing and reuse. Somepossible applications of domain ontologies include conceptual document retrieval and decisionsupport system. The importance of ontologies to knowledge-based applications has promptedan increase in efforts to construct and maintain such knowledge structures. Generally, there aretwo ways of constructing ontologies, namely, manual crafting and automatic discovery.Manual construction and maintenance of ontology is often critised for being labour intensive,biased and static. Such manual process typically requires multiple domain experts to identifythe key concepts and processes, and then collaborate with knowledge engineers for effectivedigital representation. The neutrality and representativeness of manually-crafted ontologies isalso disputable when the domain experts are unable to reach consensus during the knowledgeengineering process. New changes to the domain are often ignored and cannot be incorporated

[1]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[2]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[3]  Mohammed Bennamoun,et al.  Featureless Data Clustering , 2009 .

[4]  Mohammed Bennamoun,et al.  Tree-Traversing Ant Algorithm for term clustering based on featureless similarities , 2007, Data Mining and Knowledge Discovery.

[5]  Mohammed Bennamoun,et al.  Determining Termhood for Learning Domain Knowledge using Domain Prevalence and Tendency , 2007 .

[6]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Wei Liu,et al.  Determination of Unithood and Termhood for Term Recognition , 2009 .

[8]  Mohammed Bennamoun,et al.  Enhanced Integrated Scoring for Cleaning Dirty Texts , 2008, ArXiv.

[9]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[10]  Wei Liu,et al.  Determining Termhood for Learning Domain Knowledge in a Probabilistic Framework , 2007 .

[11]  Mohammed Bennamoun,et al.  Integrated Scoring For Spelling Error Correction, Abbreviation Expansion and Case Restoration in Dirty Text , 2006, AusDM.

[12]  Mohammed Bennamoun,et al.  Determining the Unithood of Word Sequences Using a Probabilistic Approach , 2008, IJCNLP.

[13]  T. Katerina,et al.  Automatic Term Recognition using Contextual Cues , 1997 .

[14]  Mohammed Bennamoun,et al.  Featureless similarities for terms clustering using tree-traversing ants , 2006, PCAR '06.

[15]  Amit P. Sheth,et al.  A Framework for Schema-Driven Relationship Discovery from Unstructured Text , 2006, SEMWEB.

[16]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[17]  Ronen Feldman,et al.  Clustering for unsupervised relation identification , 2007, CIKM '07.

[18]  Steffen Staab,et al.  Learning Concept Hierarchies from Text with a Guided Agglomerative Clustering Algorithm , 2005, ICML 2005.

[19]  Gilad Mishne,et al.  Learning domain ontologies for Web service descriptions: an experiment in bioinformatics , 2005, WWW '05.

[20]  Steffen Staab,et al.  Measuring Similarity between Ontologies , 2002, EKAW.

[21]  Paola Velardi,et al.  Evaluation of OntoLearn, a Methodology for Automatic Learning of Domain Ontologies , 2005 .

[22]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[23]  Marco Dorigo,et al.  Ant-based clustering: a comparative study of its relative performance with respect to k-means, average link and 1d-som , 2003 .

[24]  L. Stein,et al.  Plant Ontology (PO): a Controlled Vocabulary of Plant Structures and Growth Stages , 2005, Comparative and functional genomics.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  Rebekah Gilmmour An ontology for Hazard Identification in Risk Management , 2004 .

[27]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[28]  Ke Wang,et al.  Mining Generalized Associations of Semantic Relations from Textual Web Content , 2007, IEEE Transactions on Knowledge and Data Engineering.

[29]  Mohammed Bennamoun,et al.  Determining the Unithood of Word Sequences using Mutual Information and Independence Measure , 2008, ArXiv.

[30]  Roberto Basili,et al.  A Contrastive Approach to Term Extraction , 2001 .

[31]  Mark Stevenson,et al.  The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.

[32]  M. Teresa Cabré Castellví,et al.  Automatic term detection: A review of current systems , 2001 .