Ontology Suitability for Uncertain Extraction of Information from Multi-Record Web Documents

Ontology based data extraction from multi-record Web documents works well, but only if the ontology is suitable for the Web document. How do we know whether the ontology is suitable? To resolve this question, we present an approach based on three heuristics: density, schema, and grouping. We encode the first heuristic as a density function and use probabilistic models for the second and third. We argue that these heuristics and our computational models for these heuristics correctly determine the suitability of a Web document for a given ontology.