论文信息 - Enhanced Deep Web Data Sources Classification Model Based on World Knowledge

Enhanced Deep Web Data Sources Classification Model Based on World Knowledge

【Abstract】Bag of words method used in Deep Web sources classification shows many limitations. This paper proposes a novel Deep Web sources enhancing classification model based on world knowledge. It sets up the feature mappings by topic analysis of external knowledge, constructs an auxiliary classifier based on domain concepts, and enriches feature set of Deep Web forms. Experiment is performed based on Wikipedia encyclopedia, and experimental results verify this method is effective and scalable.

Fang Wei | Cui Zhi-ming | Zhao Pengpeng | Sun Zhen-qiang | Huang Li

[1] Juliana Freire,et al. Organizing Hidden-Web Databases by Clustering Visible Web Documents , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2] Simone Paolo Ponzetto,et al. Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[3] David M. Pennock,et al. Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[4] Hua Li,et al. Enhancing text clustering by leveraging Wikipedia semantics , 2008, SIGIR '08.

[5] Tao Tao,et al. Clustering Structured Web Sources: A Schema-Based, Model-Differentiation Approach , 2004, EDBT Workshops.

[6] Jiawei Han,et al. TSP: Mining top-k closed sequential patterns , 2004, Knowledge and Information Systems.

[7] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[8] Kevin Chen-Chuan Chang,et al. Understanding Web query interfaces: best-effort parsing with hidden syntax , 2004, SIGMOD '04.

[9] Qiang Li,et al. An algorithm for mining strongly correlated pairs in relational table , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[10] Ada Wai-Chee Fu,et al. Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.