Enhanced Deep Web Data Sources Classification Model Based on World Knowledge

【Abstract】Bag of words method used in Deep Web sources classification shows many limitations. This paper proposes a novel Deep Web sources enhancing classification model based on world knowledge. It sets up the feature mappings by topic analysis of external knowledge, constructs an auxiliary classifier based on domain concepts, and enriches feature set of Deep Web forms. Experiment is performed based on Wikipedia encyclopedia, and experimental results verify this method is effective and scalable.