Avatar Information Extraction System

TheAVATAR Information Extraction System ( IES) at the IBM Almaden Research Center enables highprecision, rule-based, information extraction from text-documents. Draw ing from our experience we propose the use of probabilistic database techniques as the formal under pi nings of information extraction systems so as to maintain high precision while increasing recall. This involve s building a framework where rule-based annotators can be mapped to queries in a databas e system. We use examples from AVATAR IES to describe the challenges in achieving this goal. Finally, we show that derivin g precision estimates in such a database system presents a significant challe nge for probabilistic database systems.