Specific website subject recognition based on the hybrid vector space model

Internet resource search for specific subject websites was realized using a hybrid vector space model developed to describe website subject features. The model describes the content and structure similarities of websites of the same subject by linking text information instead of by using tree and graph structures linked by the websites. The characteristic vector model for the website subject was established by extracting text information about website content and structure features based on the vector space model. Manufacturing-subject website recognition was then implemented with the website subject identified using the centroid-based classification algorithm. The results indicate that the model is applicable to website subject feature description as well as focused crawling and website classification with improved accuracies and efficiencies of the website subject recognition and classification.