Query relaxation using malleable schemas

In contrast to classical databases and IR systems, real-world information systems have to deal increasingly with very vague and diverse structures for information management and storage that cannot be adequately handled yet. While current object-relational database systems require clear and unified data schemas, IR systems usually ignore the structured information completely. Malleable schemas, as recently introduced, provide a novel way to deal with vagueness, ambiguity and diversity by incorporating imprecise and overlapping definitions of data structures. In this paper, we propose a novel query relaxation scheme that enables users to find best matching information by exploiting malleable schemas to effectively query vaguely structured information. Our scheme utilizes duplicates in differently described data sets to discover the correlations within a malleable schema, and then uses these correlations to appropriately relax the users' queries. In addition, it ranks results of the relaxed query according to their respective probability of satisfying the original query's intent. We have implemented the scheme and conducted extensive experiments with real-world data to confirm its performance and practicality.

[1]  Alon Y. Halevy,et al.  A Platform for Personal Information Management and Integration , 2005, CIDR.

[2]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[3]  Felix Naumann,et al.  Schema matching using duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[5]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[6]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[7]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[9]  Alon Y. Halevy,et al.  Malleable Schemas: A Preliminary Report , 2005, WebDB.

[10]  Dongwon Lee,et al.  Query relaxation for xml model , 2002 .

[11]  Hua Yang,et al.  CoBase: A scalable and extensible cooperative information system , 1996, Journal of Intelligent Information Systems.

[12]  Neoklis Polyzotis,et al.  Approximate XML query answers , 2004, SIGMOD '04.

[13]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[14]  Wolf-Tilo Balke,et al.  Through different eyes: assessing multiple conceptual views for querying web services , 2004, WWW Alt. '04.

[15]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[16]  Theodore Johnson,et al.  Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.

[17]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[18]  David R. Karger,et al.  Haystack: A General-Purpose Information Management Tool for End Users Based on Semistructured Data , 2005, CIDR.

[19]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[20]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[21]  Andrei Z. Broder,et al.  Towards the next generation of enterprise search technology , 2004, IBM Syst. J..

[22]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[23]  Parke Godfrey,et al.  Minimization in Cooperative Response to Failing Database Queries , 1994, Int. J. Cooperative Inf. Syst..

[24]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Anthony K. H. Tung,et al.  Relaxing join and selection queries , 2006, VLDB.

[26]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.