Answering Imprecise Queries over Autonomous Web Databases

Current approaches for answering queries with imprecise constraints require user-specific distance metrics and importance measures for attributes of interest - metrics that are hard to elicit from lay users. We present AIMQ, a domain and user independent approach for answering imprecise queries over autonomous Web databases. We developed methods for query relaxation that use approximate functional dependencies. We also present an approach to automatically estimate the similarity between values of categorical attributes. Experimental results demonstrating the robustness, efficiency and effectiveness of AIMQ are presented. Results of a preliminary user study demonstrating the high precision of the AIMQ system is also provided.

[1]  David J. DeWitt,et al.  Of Objects and Databases: A Decade of Turmoil , 1996, VLDB.

[2]  Subbarao Kambhampati,et al.  Optimizing Recursive Information Gathering Plans in EMERAC , 2004, Journal of Intelligent Information Systems.

[3]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[4]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[5]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[6]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[7]  Subbarao Kambhampati,et al.  Mining source coverage statistics for data integration , 2001, WIDM '01.

[8]  W. Charles Contextual correlates of meaning , 2000, Applied Psycholinguistics.

[9]  S. Sudarshan,et al.  BANKS: Browsing and Keyword Searching in Relational Databases , 2002, VLDB.

[10]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[11]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[12]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[13]  Subbarao Kambhampati,et al.  Mining approximate functional dependencies and concept similarities to answer imprecise queries , 2004, WebDB '04.

[14]  Hannu Toivonen,et al.  Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[15]  Sharad Mehrotra,et al.  Integrating similarity based retrieval and query refinement in databases , 2002 .

[16]  Heikki Mannila,et al.  Approximate Dependency Inference from Relations , 1992, ICDT.

[17]  T. Landauer Learning and Representing Verbal Meaning , 1998 .

[18]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[19]  Julia Hirschberg,et al.  User Participation in the Reasoning Processes of Expert Systems , 1982, AAAI.

[20]  H. Grice Logic and conversation , 1975 .

[21]  Qiming Chen,et al.  A Structured Approach for Cooperative Query Answering , 1994, IEEE Trans. Knowl. Data Eng..

[22]  Heikki Mannila,et al.  Similarity of Attributes by External Probes , 1998, KDD.

[23]  Subbarao Kambhampati,et al.  Answering imprecise database queries: a novel approach , 2003, WIDM '03.

[24]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[25]  Mehmet M. Dalkilic,et al.  Information dependencies , 2000, PODS '00.

[26]  Yannis Kalfoglou,et al.  Ontology mapping: the state of the art , 2003, The Knowledge Engineering Review.

[27]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[28]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[29]  Terry Gaasterland Cooperative Answering through Controlled Query Relaxation , 1997, IEEE Expert.

[30]  Clement T. Yu,et al.  Distributed Top-N Query Processing with Possibly Uncooperative Local Systems , 2003, VLDB.

[31]  Subbarao Kambhampati,et al.  Mining coverage statistics for websource selection in a mediator , 2002, CIKM '02.

[32]  Subbarao Kambhampati,et al.  Integration of biological sources: current systems and challenges ahead , 2004, SGMD.

[33]  Subbarao Kambhampati,et al.  Optimizing Recursive Information-Gathering Plans , 1999, IJCAI.

[34]  Graeme Hirst,et al.  Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[35]  Gregory F. Cooper,et al.  A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships , 1997, Data Mining and Knowledge Discovery.

[36]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[37]  Amihai Motro FLEX: A Tolerant and Cooperative User Interface to Databases , 1990, IEEE Trans. Knowl. Data Eng..

[38]  Mitesh Patel,et al.  Structured databases on the web: observations and implications , 2004, SGMD.

[39]  Amihai Motro Extending the Relational Database Model to Support Goal Queries , 1986, Expert Database Conf..

[40]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[41]  Surajit Chaudhuri,et al.  Self-tuning histograms: building histograms without looking at data , 1999, SIGMOD '99.

[42]  Kevin Chen-Chuan Chang,et al.  Knocking the door to the deep Web: integrating Web query interfaces , 2004, SIGMOD '04.

[43]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[44]  Subbarao Kambhampati,et al.  Answering Imprecise Queries over Web Databases , 2005, VLDB.

[45]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[46]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[47]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[48]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[49]  Won Kim,et al.  On Database Technology for US Homeland Security , 2002, J. Object Technol..

[50]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[51]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[52]  Dan Klein,et al.  Evaluating strategies for similarity search on the web , 2002, WWW '02.

[53]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[54]  Jennifer Widom,et al.  The Lowell database research self-assessment , 2003, CACM.

[55]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[56]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[57]  Tony T. Lee,et al.  An Infornation-Theoretic Analysis of Relational Databases—Part I: Data Dependencies and Information Metric , 1987, IEEE Transactions on Software Engineering.

[58]  Subbarao Kambhampati,et al.  BibFinder/StatMiner: Effectively Mining and Using Coverage and Overlap Statistics in Data Integration , 2003, VLDB.

[59]  Joan M. Morrissey,et al.  Imprecise information and uncertainty in information systems , 1990, TOIS.

[60]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[61]  Qiming Chen,et al.  Cooperative Query Answering via Type Abstraction Hierarchy , 1991 .

[62]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[63]  Patrick Valduriez,et al.  Scaling Access to Heterogeneous Data Sources with DISCO , 1998, IEEE Trans. Knowl. Data Eng..

[64]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[65]  Subbarao Kambhampati,et al.  Providing ranked relevant results for web database queries , 2004, WWW Alt. '04.

[66]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[67]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[68]  Subbarao Kambhampati,et al.  Effectively mining and using coverage and overlap statistics for data integration , 2005, IEEE Transactions on Knowledge and Data Engineering.

[69]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.

[70]  Ion Muslea,et al.  Machine learning for online query relaxation , 2004, KDD.

[71]  Carole D. Hafner,et al.  The State of the Art in Ontology Design: A Survey and Comparative Review , 1997, AI Mag..