Querying E-Catalogs Using Content Summaries

With the rapid development of e-services on the Web, increasing number of e-catalogs are becoming accessible to users A large number of e-catalogs provide information about similar type of products/services To simplify users information searching effort, data integration systems have being developed to integrate e-catalogs providing similar type of information such that users can query those e-catalogs with a mediator through an uniform query interface The conventional approach to answer a query received by a mediator is to select e-catalogs purely based on their query capabilities, i.e., query interface specifications However, an e-catalog having the capability to answer a query does not mean it has relevant answers to the query To remedy the wasted resources of querying catalogs that do not generate an answer, in this paper, we propose to use catalog content summary as a filter and select the relevant e-catalogs to answer a given query based not only on their query capabilities but also on their content relevance to the query A multi-attribute content (MAC) summary is proposed to describe an e-catalog with respect to its content With MAC summary, an e-catalog is selected to answer a query only if the e-catalog is likely having answers to the query MAC summary can be constructed and updated using answers returned from e-catalogs and therefore the e-catalogs need not be cooperative We evaluated MAC summary on 50 e-catalogs, and the experimental results were promising.

[1]  Hye-Young Paik,et al.  WS-CatalogNet: An Infrastructure for Creating, Peering, and Querying e-Catalog Communities , 2004, VLDB.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Seung-won Hwang,et al.  Automatic categorization of query results , 2004, SIGMOD '04.

[4]  Hye-Young Paik,et al.  Towards semantic-driven, flexible and scalable framework for peering and querying e-catalog communities , 2006, Inf. Syst..

[5]  Oscar H. Ibarra,et al.  On the containment and equivalence of database queries with linear constraints (extended abstract) , 1997, PODS '97.

[6]  Jianwen Su,et al.  Data integration by describing sources with constraint databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7]  Doheon Lee,et al.  Database summarization using fuzzy ISA hierarchies , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[8]  NieZaiqing,et al.  Effectively Mining and Using Coverage and Overlap Statistics for Data Integration , 2005 .

[9]  Michael J. Shaw,et al.  A genetic algorithm-based approach to flexible flow-line scheduling with variable lot sizes , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[11]  AnHai Doan,et al.  Mapping Maintenance for Data Integration Systems , 2005, VLDB.

[12]  Jack G. Conrad,et al.  Early user---system interaction for database selection in massive domain-specific online environments , 2003, TOIS.

[13]  Subbarao Kambhampati,et al.  A snapshot of public web services , 2005, SGMD.

[14]  Todd D. Millstein,et al.  Query containment for data integration systems , 2000, PODS '00.

[15]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[16]  Boi Faltings,et al.  Solving Mixed and Conditional Constraint Satisfaction Problems , 2003, Constraints.

[17]  Ling Liu,et al.  Query routing in large-scale digital library systems , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[18]  Daniel Rocco,et al.  Discovering and ranking web services with BASIL: a personalized approach with biased focus , 2004, ICSOC '04.

[19]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[20]  James C. French,et al.  Comparing the performance of collection selection algorithms , 2003, TOIS.

[21]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[22]  Clement T. Yu,et al.  Distributed Top-N Query Processing with Possibly Uncooperative Local Systems , 2003, VLDB.

[23]  Noureddine Mouaddib,et al.  General Purpose Database Summarization , 2005, VLDB.

[24]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.