Annotation for Query Result Records based on Domain-Specific Ontology

The World Wide Web is enriched with a large collection of data, scattered in deep web databases and web pages in unstructured or semi structured formats. Recently evolving customer friendly web applications need special data extraction mechanisms to draw out the required data from these deep web, according to the end user query and populate to the output page dynamically at the fastest rate. In existing research areas web data extraction methods are based on the supervised learning (wrapper induction) methods. In the past few years researchers depicted on the automatic web data extraction methods based on similarity measures. Among automatic data extraction methods our existing Combining Tag and Value similarity method, lags to identify an attribute in the query result table. A novel approach for data extracting and label assignment called Annotation for Query Result Records based on domain specific ontology. First, an ontology domain is to be constructed using information from query interface and query result pages obtained from the web. Next, using this domain ontology, a meaning label is assigned automatically to each column of the extracted query result records.

[1]  Tobias Dönz Extracting Structured Data from Web Pages , 2003 .

[2]  Chia-Hui Chang,et al.  IEPAD: information extraction based on pattern discovery , 2001, WWW '01.

[3]  Calton Pu,et al.  A fully automated object extraction system for the World Wide Web , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[4]  Frederick H. Lochovsky,et al.  Data-rich section extraction from HTML pages , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[5]  Robert L. Grossman,et al.  Mining data records in Web pages , 2003, KDD '03.

[6]  Georg Lausen,et al.  ViPER: augmenting automatic information extraction with visual perceptions , 2005, CIKM '05.

[7]  Bing Liu,et al.  Structured Data Extraction from the Web Based on Partial Tree Alignment , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Bing Liu,et al.  NET - A System for Extracting Web Data from Flat and Nested Data Records , 2005, WISE.

[9]  Frederick H. Lochovsky,et al.  Data extraction and label assignment for web databases , 2003, WWW '03.

[10]  Valter Crescenzi,et al.  RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.

[11]  Wang Hui,et al.  Multi-source Automatic Annotation for Deep Web , 2008, CSSE 2008.

[12]  Clement T. Yu,et al.  Annotating Structured Data of the Deep Web , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Vijay V. Raghavan,et al.  Fully automatic wrapper generation for search engines , 2005, WWW '05.

[14]  Cui Tao,et al.  Automatic hidden-web table interpretation, conceptualization, and semantic annotation , 2009, Data Knowl. Eng..

[15]  Yi Liu,et al.  Combining Tag and Value Similarity for Data Extraction and Alignment , 2012, IEEE Transactions on Knowledge and Data Engineering.

[16]  Cui Tao,et al.  Automatic Hidden-Web Table Interpretation by Sibling Page Comparison , 2007, ER.

[17]  Jian-Yun Nie Heterogeneous Web Data Extraction using Ontology , 2001 .