Keyword search across databases and documents

Given the continuous growth of databases and the abundance of diverse files in modern IT environments, there is a pressing need to integrate keyword search on heterogeneous information sources. A particular case in which such integration is needed occurs when a collection of documents (e.g. word processing documents, spreadsheets, text files and so on) is derived directly from a central database, and both repositories are independently updated. Finding hidden relationships between documents and databases is difficult, given the loose connection between them. This problem is especially complicated when database integration techniques must be extended to handle semi-structured data (i.e. documents). Our research focuses on exploiting a relational database system for integrating and exploring complex interrelationships between a database and a collection of potentially related documents. We focus on the discovery and ranking of keyword links (relationships) at different granularity levels between a database schema and a collection of documents. We adapt, extend, and combine information retrieval techniques into the DBMS. As such, we provide algorithms for efficient exploration of discovered relationships among a collection of documents and a DBMS. We experimentally show that our system can discover, query and rank complex relationships discovered between a database and surrounding documents.

[1]  Yang Wen Semantic integration of structured and semistructured data sources , 2002 .

[2]  Carlos Ordonez,et al.  Referential integrity quality metrics , 2008, Decis. Support Syst..

[3]  Carlos Ordonez,et al.  Models for association rules based on clustering and correlation , 2009, Intell. Data Anal..

[4]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[5]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[6]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[7]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[8]  Carlos Ordonez,et al.  Metadata management for federated databases , 2007, CIMS '07.

[9]  Jorma Tarhio,et al.  Approximate Boyer-Moore String Matching for Small Alphabets , 2009, Algorithmica.

[10]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[11]  Carlos Garcia-Alvarado,et al.  Information retrieval from digital libraries in SQL , 2008, WIDM '08.

[12]  Carlos Garcia-Alvarado,et al.  DBDOC: querying and browsing databases and interrelated documents , 2009, KEYS '09.

[13]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  James Ze Wang,et al.  An architecture for creating collaborative semantically capable scientific data sharing infrastructures , 2006, WIDM '06.

[15]  Luis Gravano,et al.  Efficient Keyword Search Across Heterogeneous Relational Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Mukesh K. Mohania,et al.  Enhanced Business Intelligence using EROCS , 2008, 2008 IEEE 24th International Conference on Data Engineering.