Hunting for truly relevant articles in bioinformatics literature: a preliminary study

For researchers interested in reading articles concerning a specific topic, the current document search techniques, based primarily on keyword matching, are insufficient. They tend to return too many "hits", most of which are not truly relevant. An individualized text filtering system that can select/recommend useful articles would be a tremendous time-saver for researchers, especially in the field of bioinformatics, in which numerous articles are published daily. Machine learning tools such as text classification may be the answer to this need. This paper describes some preliminary work on developing such a text filtering system. Support Vector Machine is used to classify articles from Journal of Bacteriology to determine whether an article addresses issues related to "gene function". Preliminary results, problems, and difficulties encountered are discussed.