We summarize our experiments and results in employing information fusion for automatic classification of free text documents into a given number of categories. We try to characterize this information fusion work in terms of the Joint Directors of Laboratories scheme. The text used in the experiments is taken from the Reuters-22173 collection, which not only comes pre-analyzed, but facilitates training of the neural networks, as well as evaluation of the classification decisions. We use different kinds of feature extractors to derive information from documents, and use neural networks for both learning and fusion. We compare the effectiveness of individual feature extractors in classifying the text with that of information fusion from different interesting combinations of feature extractors. The results indicate that information fusion almost always performs better than the individual feature extractors, and certain combinations seem to do better than the others. Additional parameters can have varying degrees of effectiveness, and remain to be investigated.
[1]
E. Uberbacher,et al.
Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.
,
1991,
Proceedings of the National Academy of Sciences of the United States of America.
[2]
Vladimir Protopopescu,et al.
Multi-sensor text classification experiments -- a comparison
,
1997
.
[3]
T. Landauer,et al.
Indexing by Latent Semantic Analysis
,
1990
.
[4]
David D. Lewis,et al.
Representation and Learning in Information Retrieval
,
1991
.
[5]
Y Xu,et al.
Recognizing exons in genomic sequence using GRAIL II.
,
1994,
Genetic engineering.
[6]
Hinrich Schütze,et al.
A comparison of classifiers and document representations for the routing problem
,
1995,
SIGIR '95.