Research on Extract the Schema of Query Interfaces

As the main approach to obtain the Deep Web data is to fill query interface provided by the pages, and then obtain them by submitting a query request to the Deep Web server, so an important step to access the Deep Web resources is to analyse the query request of Deep Web server effectively. However, the query interface is designed under different schemas and uses different language, thus it makes the extraction work of high-precision query interface schema changeable. To improve accuracy of schema extraction and to achieve interpretation of the query interfaces at semantic level, this paper proposes a new definition of query interface schema, and designs a kind of schema extraction method which based on query interface visual information and page information. The experiment adopts TEL-8 data sets of UIUC, and the experimental results show that the method of this paper has reached over 90% accuracy in different areas, in some areas even more than 95% accuracy, thus it has good feasibility and practicability.

[1]  Mohamed Nazih Omri,et al.  VIQI: A new approach for visual interpretation of deep web query interfaces , 2012, 2012 International Conference on Information Technology and e-Services.

[2]  Hai Jin,et al.  Schema adaptive modeling and incremental matching for web interface , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[3]  Guo Li,et al.  Deep Web Integrated Query Interface Construction Method Based on Apriori Algorithm , 2013 .

[4]  Tao Peng,et al.  Schema Extraction of Deep Web Query Interface , 2009, 2009 International Conference on Web Information Systems and Mining.

[5]  Qian He,et al.  Associating Labels and Elements of Deep Web Query Interface Based on DOM , 2012, WISM.

[6]  Michael K. Bergman White Paper: The Deep Web: Surfacing Hidden Value , 2001 .

[7]  Jayant Madhavan,et al.  Google's Deep Web crawl , 2008, Proc. VLDB Endow..

[8]  Peter C. Lockemann,et al.  Proceedings of the 29th international conference on Very large data bases - Volume 29 , 2003 .

[9]  Juliana Freire,et al.  Learning to extract form labels , 2008, Proc. VLDB Endow..

[10]  Rui Wang,et al.  Ontology-Based Deep Web Data Interface Schemas Integration Method , 2010, 2010 2nd International Conference on E-business and Information System Security.

[11]  Zhiqing Shao,et al.  An approach to automatical semantic analysis of Web query interfaces , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[12]  Clement T. Yu,et al.  Constructing Interface Schemas for Search Interfaces of Web Databases , 2005, WISE.

[13]  Xiaojie Yuan,et al.  Understanding the Search Interfaces of the Deep Web Based on Domain Model , 2009, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[14]  Kevin Chen-Chuan Chang,et al.  Understanding Web query interfaces: best-effort parsing with hidden syntax , 2004, SIGMOD '04.

[15]  Clement T. Yu,et al.  Modeling and Extracting Deep-Web Query Interfaces , 2009, Advances in Information and Intelligent Systems.

[16]  Fei Ren,et al.  Extracting Attributes from Deep Web Interface Using Instances , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[17]  Clement T. Yu,et al.  WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce , 2003, VLDB.

[18]  Conghua Xie,et al.  Research on the Integration of Deep Web Query Interfaces , 2014, 2014 International Symposium on Computer, Consumer and Control.

[19]  Clement T. Yu,et al.  Towards Deeper Understanding of the Search Interfaces of the Deep Web , 2006, World Wide Web.

[20]  Mitesh Patel,et al.  Accessing the deep web , 2007, CACM.

[21]  Derong Shen,et al.  Layout Object Model for Extracting the Schema of Web Query Interfaces , 2011, APWeb.