Hierarchical Text Categorization Using Neural Networks

This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent flat model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.

[1]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[4]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[5]  Daphne Koller,et al.  Using machine learning to improve information access , 1998 .

[6]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[7]  Wai Lam,et al.  Automatic Text Categorization and Its Application to Text Retrieval , 1999, IEEE Trans. Knowl. Data Eng..

[8]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[11]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[12]  Steve R. Waterhouse,et al.  Classification using hierarchical mixtures of experts , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[13]  Wai Lam,et al.  Using a generalized instance set for automatic text categorization , 1998, SIGIR '98.

[14]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[15]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[16]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[17]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[18]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[19]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[20]  Dunja Mladenic,et al.  Machine Learning on non-homogeneous, distributed text data , 1998 .

[21]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[22]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[23]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[24]  Florence d'Alché-Buc,et al.  Trio Learning: A New Strategy for Building Hybrid Neural Trees , 1994, Int. J. Neural Syst..

[25]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[26]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[27]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[28]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[29]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[30]  Steven R. Waterhouse,et al.  Classification and Regression using Mixtures of Experts , 1997 .

[31]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[32]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[33]  Isabelle Moulinier,et al.  Applying an existing machine learning algorithm to text categorization , 1995, Learning for Natural Language Processing.

[34]  Yves Chauvin,et al.  Backpropagation: the basic theory , 1995 .

[35]  Maria Simi,et al.  Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[36]  Y Yang,et al.  An evaluation of statistical approaches to MEDLINE indexing. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[37]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[38]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[39]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[40]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[41]  Chris Buckley,et al.  Learning routing queries in a query zone , 1997, SIGIR '97.

[42]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[43]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[44]  P. McCullagh,et al.  Generalized Linear Models , 1992 .