Challenges inbuilding a DBMS Resource Advisor

The AVATAR Information Extraction System (IES) at the IBM Almaden Research Center enables highprecision, rule-based, information extraction from text-documents. Drawing from our experience we propose the use of probabilistic database techniques as the formal underpinnings of information extraction systems so as to maintain high precision while increasing recall. This involves building a framework where rule-based annotators can be mapped to queries in a database system. We use examples from AVATAR IES to describe the challenges in achieving this goal. Finally, we show that deriving precision estimates in such a database system presents a significant challenge for probabilistic database systems.

[1]  Gerhard Weikum,et al.  Self-tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering , 2002, VLDB.

[2]  R. G. G. Cattell,et al.  Recent books , 2000, IEEE Spectrum.

[3]  Hamish Cunningham Information Extraction - A User Guide , 1997, ArXiv.

[4]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[5]  Dushyanth Narayanan End-to-end tracing considered essential , 2005 .

[6]  Nanda Kambhatla,et al.  Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations , 2004, ACL 2004.

[7]  Surajit Chaudhuri,et al.  Database Tuning Advisor for Microsoft SQL Server 2005 , 2004, VLDB.

[8]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[9]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[10]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[11]  Anastasia Ailamaki,et al.  Continuous resource monitoring for self-predicting DBMS , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[12]  Raghu Ramakrishnan,et al.  Community Information Management , 2006, IEEE Data Eng. Bull..

[13]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[14]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[15]  Surajit Chaudhuri,et al.  Database tuning advisor for microsoft SQL server 2005: demo , 2005, SIGMOD '05.

[16]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.