Multi-source Automatic Annotation for Deep Web

A large number of Web pages returned by filling in search forms are not indexed by most search engines today. The set of such Web pages is referred to as the deep Web. Since results returned by Web databases seldom have proper annotations, it is necessary to assign meaningful labels to the results. This paper presents a framework of automatic annotation which uses multi-annotator to annotate results from different aspects. Especially, search engine-based annotator extends question-answering techniques commonly used in the AI community, constructing validate queries and posing to the search engine. It finds the most appropriate terms to annotate the data units by calculate the similarities between terms and instances. Information for annotating can be acquired automatically without the support of domain ontology. Experiments over four real world domains indicate that the proposed approach is highly effective.