Automatic Classification of Deep Web Databases Based on Centroid and WordNet

Deep Web contains a significant amount of visited information, in order to be able to make full use of the information, we need to organize it according to different domain. Therefore, it is imperative that Deep Web databases should be classified by domain automatically. In this paper, a new Deep Web database classification framework is proposed, which adds semantic information to feature vectors and centroid vector by extracting the synsets of terms which can be obtained from WordNet, and replace the terms by corresponding synsets in the feature vectors and centroid vector to achieve dimensionality reduction of vectors. Lastly, highlight the semantic feature vectors by semantic centroid vector, and classify the highlighted semantic feature vectors by classification algorithm. Experiments show that experiment 3 which combines experiment 1 and experiment 2 can effectively improve the classification accuracy of Deep Web databases.