BNOSA: A Bayesian network and ontology based semantic annotation framework

Abstract The paper presents a semantic annotation framework that is capable of extracting relevant information from unstructured, ungrammatical and incoherent data sources. The framework, named BNOSA, uses ontology to conceptualize a problem domain and to extract data from the given corpora, and Bayesian networks to resolve conflicts and to predict missing data. The framework is extensible as it is capable of dynamically extracting data from any problem domain given a pre-defined ontology and a corresponding Bayesian network. Experiments have been conducted to analyze the performance of BNOSA on several problem domains. The sets of corpora used in the experiments belong to selling–purchasing websites where product information is entered by ordinary web users in a structure-free format. The results show that BNOSA performs reasonably well to find location of the data of interest using context keywords provided as part of the domain ontology. In case of more than one value being extracted for an attribute or if the value is missing, Bayesian networks identify the most appropriate value for that attribute.

[1]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[2]  A. Johannes Pretorius Lexon visualization: visualizing binary fact types in ontology bases , 2004, Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004..

[3]  David W. Embley,et al.  Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies , 2006, ASWC.

[4]  A Min Tjoa,et al.  Semantic Web challenges and new requirements , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[5]  Chunnian Liu,et al.  A Bayesian Network Learning Algorithm Based on Independence Test and Ant Colony Optimization , 2009 .

[6]  Andreas Hotho,et al.  Towards Semantic Web Mining , 2002, SEMWEB.

[7]  Hongjun Lu,et al.  iASA: Learning to Annotate the Semantic Web , 2005, J. Data Semant..

[8]  Giacomo Fiumara,et al.  Automated Information Extraction from Web Sources : a Survey , 2007 .

[9]  Richard E. Neapolitan,et al.  Probabilistic Methods for Bioinformatics: with an Introduction to Bayesian Networks , 2009 .

[10]  Quratulain Rajput,et al.  A comparison of ontology-based and reference-set-based semantic annotation frameworks , 2011, WCIT.

[11]  Olivier Pourret,et al.  Bayesian networks : a practical guide to applications , 2008 .

[12]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[13]  Quratulain Rajput,et al.  Use of Bayesian Network in Information Extraction from Unstructured Data Sources , 2009 .

[14]  Paola Sebastiani,et al.  Learning Bayesian Networks from Incomplete Databases , 1997, UAI.

[15]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[16]  Peter Mika,et al.  Social Networks and the Semantic Web , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[17]  Jun-Zhong Ji,et al.  A Bayesian Network Learning Algorithm Based on Independence Test and Ant Colony Optimization: A Bayesian Network Learning Algorithm Based on Independence Test and Ant Colony Optimization , 2009 .

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[19]  Craig A. Knoblock,et al.  Creating Relational Data from Unstructured and Ungrammatical Data Sources , 2008, J. Artif. Intell. Res..

[20]  Atanas Kiryakov,et al.  KIM – a semantic platform for information extraction and retrieval , 2004, Natural Language Engineering.

[21]  M. Wilson,et al.  The semantic Web: prospects and challenges , 2006, 2006 7th International Baltic Conference on Databases and Information Systems.

[22]  Quratulain Rajput,et al.  Information Extraction from Unstructured and Ungrammatical Data Sources for Semantic Annotation , 2009 .

[23]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[24]  Quratulain Rajput,et al.  A Comparison of Two Ontology-Based Semantic Annotation Frameworks , 2010, AIAI.

[25]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[26]  Dunja Mladenic,et al.  A Roadmap for Web Mining: From Web to Semantic Web , 2003, EWMF.

[27]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[28]  Craig A. Knoblock,et al.  Semantic annotation of unstructured and ungrammatical text , 2005, IJCAI.

[29]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[30]  Silvia Miksch,et al.  ontoX - A Method for Ontology-Driven Information Extraction , 2007, ICCSA.

[31]  Silvia Miksch,et al.  Motivating Ontology-Driven Information Extraction , 2011 .

[32]  Ji Jun A Bayesian Network Learning Algorithm Based on Independence Test and Ant Colony Optimization , 2009 .

[33]  Cui Tao,et al.  Automating the extraction of data from HTML tables with unknown structure , 2005, Data Knowl. Eng..

[34]  Ankush Mittal,et al.  Bayesian Network Technologies: Applications and Graphical Models , 2007 .

[35]  Quan Pan,et al.  Learning Dynamic Bayesian Networks Structure Based on Bayesian Optimization Algorithm , 2007, ISNN.

[36]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[37]  David W. Embley,et al.  Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages , 1999, Data Knowl. Eng..

[38]  Frank van Harmelen,et al.  A semantic web primer , 2004 .