For researchers interested in reading articles concerning a specific topic, the current document search techniques, based primarily on keyword matching, are insufficient. They tend to return too many "hits", most of which are not truly relevant. An individualized text filtering system that can select/recommend useful articles would be a tremendous time-saver for researchers, especially in the field of bioinformatics, in which numerous articles are published daily. Machine learning tools such as text classification may be the answer to this need. This paper describes some preliminary work on developing such a text filtering system. Support Vector Machine is used to classify articles from Journal of Bacteriology to determine whether an article addresses issues related to "gene function". Preliminary results, problems, and difficulties encountered are discussed.
[1]
Naryttza N. Diaz,et al.
The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes
,
2005,
Nucleic acids research.
[2]
Claudio Gentile,et al.
Hierarchical classification: combining Bayes with SVM
,
2006,
ICML.
[3]
Christopher D. Manning,et al.
Introduction to Information Retrieval
,
2010,
J. Assoc. Inf. Sci. Technol..
[4]
Rich Caruana,et al.
An empirical comparison of supervised learning algorithms
,
2006,
ICML.
[5]
Thorsten Joachims,et al.
Making large scale SVM learning practical
,
1998
.