MrCoM: A Cost Model for Range Query Translation in Deep Web Data Integration

Due to the autonomy of Web databases, a major challenge for query translation in a deep Web data integration system is the lack of cost models at the global level. In this paper, we propose a multiple-regression cost model (MrCoM) based on statistical analysis for global range queries that involve numeric range attributes. Using the MrCoM, the query translation strategy for new global range queries can be inferred. We also propose a pre-processing-based stepwise algorithm (PSA) for selecting significant independent variables into the MrCoM. Experimental results demonstrate that the fitness of the MrCoM is good and the accuracy of the query strategy selection is high.

[1]  Roger C. Pfaffenberger Statistical methods for business and economics / Roger C. Pfaffenberger, James H. Patterson , 1981 .

[2]  Per-Åke Larson,et al.  A query sampling method for estimating local cost parameters in a multidatabase system , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[3]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[4]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[5]  Per-Åke Larson,et al.  Evolutionary techniques for updating query cost models in a dynamic multidatabase environment , 2003, The VLDB Journal.

[6]  Frederick H. Lochovsky,et al.  Data extraction and label assignment for web databases , 2003, WWW '03.

[7]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[8]  Clement T. Yu,et al.  Automatic integration of Web search interfaces with WISE-Integrator , 2004, The VLDB Journal.

[9]  Per-Åke Larson,et al.  Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems , 1998, Distributed and Parallel Databases.

[10]  Kevin Chen-Chuan Chang,et al.  Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web , 2005, CIDR.

[11]  Kevin Chen-Chuan Chang,et al.  Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly , 2005, VLDB.

[12]  Wei-Ying Ma,et al.  Query Selection Techniques for Efficient Crawling of Structured Web Sources , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Alberto Pan,et al.  Efficiently Updating Cost Repository Values for Query Optimization on Web Data Sources in a Mediator/Wrapper Environment , 2006, NGITS.

[14]  Wen-Chi Hou,et al.  Query optimization via contention space partitioning and cost error controlling for dynamic multidatabase systems , 2008, Distributed and Parallel Databases.

[15]  G. Nieuwenhuis Statistical Methods for Business and Economics , 2009 .