An Effective Schema Extraction Algorithm on the Deep Web

The Deep Web, a complex entity that contains information from a variety of source types, has gotten a lot of press in recent years. In order to unlock the vast Deep Web content, effective approaches to extract, index and search the query interfaces from dynamic Web pages should be studied carefully. Based on our previously proposed grouping patterns and pre-clustering algorithm, this paper presents an effective schema extraction algorithm. Three metrics - (LCA) precision, (LCA) recall, and (LCA) Fl are employed to evaluate the performance of schema extraction algorithm. The experimental results indicate that our algorithm can improve the performance of schema extraction of query interfaces on the Deep Web obviously and avoid resulting in the inconsistencies between the subsets by pre-clustering algorithm and those by schema extraction algorithm.