Learning to Integrate Web Taxonomies with Fine-Grained Relations: A Case Study Using Maximum Entropy Model

As web taxonomy integration is an emerging issue on the Internet, many research topics, such as personalization, web searches, and electronic markets, would benefit from further development of taxonomy integration techniques. The integration task is to transfer documents from a source web taxonomy to a target web taxonomy. In most current techniques, integration performance is enhanced by referring to the relations between corresponding categories in the source and target taxonomies. However, the techniques may not be effective, since the concepts of the corresponding categories may overlap partially. In this paper we present an effective approach for integrating taxonomies and alleviating the partial overlap problem by considering fine-grained relations using a Maximum Entropy Model. The experiment results show that the proposed approach improves the classification accuracy of taxonomies over previous approaches.

[1]  Dieter Fensel,et al.  Product Data Integration in B2B E-Commerce , 2001, IEEE Intell. Syst..

[2]  Dell Zhang,et al.  Web taxonomy integration using support vector machines , 2004, WWW '04.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[5]  Shui-Lung Chuang,et al.  Liveclassifier: creating hierarchical text classifiers through web corpora , 2004, WWW '04.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[9]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[10]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[11]  Ramakrishnan Srikant,et al.  On integrating catalogs , 2001, WWW '01.

[12]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[13]  Dell Zhang,et al.  Web taxonomy integration through co-bootstrapping , 2004, SIGIR '04.

[14]  Adwait Ratnaparkhi,et al.  Statistical Models for Unsupervised Prepositional Phrase Attachment , 1998, ACL.

[15]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[16]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .