Multi-objective optimization integration of query interfaces for the Deep Web based on attribute constraints

Abstract In order to query and retrieve the rich and useful information hidden in the Deep Web efficiently, extensive research on domain-specific Deep Web Data Integration Systems (DWDIS) has been carried out in recent years. In DWDIS, large-scale automatic integration of query interfaces of domain-specific Web Databases (WDBs) remains a serious challenge due to the scale of the problem and the great diversity of the WDBs' query interfaces. To address this challenge, in this paper, we first give a definition of the constraint matrix which can accurately describe three types of constraints (hierarchical constraints, group constraints and precedence constraints) and the strengths of attributes of a query interface, and then prove that the schema tree of the query interface corresponds to only one constraint matrix, and vice versa. Furthermore, we transform the problem of integrating domain-specific query interfaces into a problem of integrating the constraint matrices and set up a multi-objective optimization problem model. To effectively solve the optimization model, some strategies to extend and merge the constraint matrices are designed. A method for automatically detecting and filtering abnormal data (noises) in the query interfaces is also proposed. More importantly, a novel and efficient algorithm applicable to large-scale automatic integration of domain-specific query interfaces is developed. Finally, the proposed algorithm is evaluated by experiments on the real query interface data set. Our theoretical analysis and experimental results show that the proposed algorithm outperforms existing state-of-the-art integration algorithms of domain-specific query interfaces.

[1]  Kevin Chen-Chuan Chang,et al.  A holistic paradigm for large scale schema matching , 2004, SGMD.

[2]  Munindar P. Singh The pragmatic web , 2002, IEEE Internet Computing.

[3]  Clement T. Yu,et al.  A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration , 2009, Proc. VLDB Endow..

[4]  Clement T. Yu,et al.  Stop Word and Related Problems in Web Interface Integration , 2009, Proc. VLDB Endow..

[5]  Maguelonne Teisseire,et al.  Data & Knowledge Engineering , 2015 .

[6]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[7]  Clement T. Yu,et al.  An interactive clustering-based approach to integrating source query interfaces on the deep Web , 2004, SIGMOD '04.

[8]  Jiawei Han,et al.  Mining complex matchings across Web query interfaces , 2004, DMKD '04.

[9]  Kevin Chen-Chuan Chang,et al.  Knocking the door to the deep Web: integrating Web query interfaces , 2004, SIGMOD '04.

[10]  Mitesh Patel,et al.  Structured databases on the web: observations and implications , 2004, SGMD.

[11]  Giles,et al.  Searching the world wide Web , 1998, Science.

[12]  Michael K. Bergman White Paper: The Deep Web: Surfacing Hidden Value , 2001 .

[13]  Clement T. Yu,et al.  WebIQ: Learning from the Web to Match Deep-Web Query Interfaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  Clement T. Yu,et al.  WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce , 2003, VLDB.

[15]  Kevin Chen-Chuan Chang,et al.  Automatic complex schema matching across Web query interfaces: A correlation mining approach , 2006, TODS.

[16]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[17]  Liu Wei A Survey of Deep Web Data Integration , 2007 .

[18]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.

[19]  Yuan An,et al.  Understanding deep web search interfaces: a survey , 2010, SGMD.

[20]  Kevin Chen-Chuan Chang,et al.  Understanding Web query interfaces: best-effort parsing with hidden syntax , 2004, SIGMOD '04.

[21]  Liang Han,et al.  A Deep Web Interface Integration Approach Based on Keyword Matching and Similarity Computing , 2009 .

[22]  Werner Winiwarter,et al.  Deep web integrated systems: current achievements and open issues , 2011, iiWAS '11.

[23]  Lois M. L. Delcambre,et al.  Querying through a user interface , 2007, Data Knowl. Eng..

[24]  Weifeng Su,et al.  Holistic Schema Matching for Web Query Interface , 2005 .

[25]  Weifeng Su,et al.  Holistic Schema Matching for Web Query Interfaces , 2006, EDBT.

[26]  Jiawei Han,et al.  Discovering complex matchings across web query interfaces: a correlation mining approach , 2004, KDD.

[27]  Walid G. Aref,et al.  Databases deepen the Web , 2004, Computer.

[28]  Alessandro Bozzon,et al.  Liquid query: multi-domain exploratory search on the web , 2010, WWW '10.

[29]  Tao Peng,et al.  Automatic Integration of Deep Web Query Interfaces Based on Ontology , 2009, 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology.

[30]  Clement T. Yu,et al.  Merging Source Query Interfaces onWeb Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[31]  Parisa Ghodous,et al.  On-line web database integration , 2010, MEDES.

[32]  Bin Zhang,et al.  The Classification and Solution Strategy of Conflicts in Deep Web Query Interface Integration , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[33]  Clement T. Yu,et al.  Automatic integration of Web search interfaces with WISE-Integrator , 2004, The VLDB Journal.

[34]  Clement T. Yu,et al.  Meaningful labeling of integrated query interfaces , 2006, VLDB.

[35]  Erhard Rahm,et al.  Supporting executable mappings in model management , 2005, SIGMOD '05.

[36]  Wei-Ying Ma,et al.  Instance-based Schema Matching for Web Databases by Domain-specific Query Probing , 2004, VLDB.

[37]  Clement T. Yu,et al.  WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web , 2005, VLDB.

[38]  Munindar P. Singh Deep Web Structure , 2002, IEEE Internet Comput..

[39]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[40]  Clement T. Yu,et al.  Deriving Customized Integrated Web Query Interfaces , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[41]  Clement T. Yu,et al.  Merging interface schemas on the deep Web via clustering aggregation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[42]  Clement T. Yu,et al.  Deep web integration with VisQI , 2010, Proc. VLDB Endow..

[43]  Mitesh Patel,et al.  Accessing the deep web , 2007, CACM.

[44]  Philip A. Bernstein,et al.  Merging Models Based on Given Correspondences , 2003, VLDB.

[45]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[46]  Clement T. Yu,et al.  Modeling and Extracting Deep-Web Query Interfaces , 2009, Advances in Information and Intelligent Systems.

[47]  K. Chang,et al.  Accessing the Deep Web : A Survey , 2005 .

[48]  AnHai Doan,et al.  iMAP: Discovering Complex Mappings between Database Schemas. , 2004, SIGMOD 2004.