Classifying Biomedical Articles using Web Resources: application to KDD Cup 02

This paper presents a novel approach for text classification on biomedical literature, involving the use of information extracted from related web resources. Our method creates a representation of an article based on information extracted from public online databases, that is afterwards used by traditional statistical text classification algorithms. We validated this approach by implementing the proposed method, and testing it on the KDD2002 Cup challenge: bio-text task. Results show that our approach of searching for additional data on online databases can effectively improve efficiency on text classification systems for biomedical literature.

[1]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[2]  Alfonso Valencia,et al.  Extracting Information Automatically From Biological Literature , 2001, Comparative and functional genomics.

[3]  Wei Chu,et al.  A machine learning approach for the curation of biomedical literature: KDD Cup 2002 (task 1) , 2002, SKDD.

[4]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[5]  Ronen Feldman,et al.  Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1) , 2002, SKDD.

[7]  Alexander A. Morgan,et al.  Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles , 2002, SKDD.

[8]  J. Kenney Mathematics of statistics , 1940 .

[9]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[10]  Teresa K. Attwood,et al.  Introduction to Bioinformatics , 2001 .

[11]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Pooja Jain,et al.  ReBIL: Relating Biological Information through Literature , 2003 .

[14]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[15]  Han Tong Loh,et al.  A Machine Learning Approach for the Curation of Biomedical Literature , 2003, ECIR.

[16]  Moustafa Ghanem,et al.  Automatic scientific text classification using local patterns: KDD CUP 2002 (task 1) , 2002, SKDD.

[17]  G M Rubin,et al.  Around the genomes: the Drosophila genome project. , 1996, Genome research.

[18]  Mário J. Silva,et al.  Theme-based Retrieval of Web News , 2000, WebDB.

[19]  M. Gerstein Integrative database analysis in structural genomics , 2000, Nature Structural Biology.