A Platform of Biomedical Literature Mining for Categorization of Cancer Related Abstracts

In this paper, we develop a platform framework for categorization of cancer related abstracts using support vector machines (SVMs) based text categorization techniques with a one-against-all (OAA) learning algorithm for classification decisions. The corpora for the work were selected from the Website of PubMed database. By using information derived from PubMed literature source, including topics of breast cancer, cervical cancer, gastric cancer, lung cancer, rectum cancer and esophagus cancer, we randomly selected 6,000 medical abstracts for implementing our system and performing experiments. The experimental results show that the platform model has potentials for categorization of multiple cancer related literature texts.

[1]  Jude W. Shavlik,et al.  Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction , 2004, ILP.

[2]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[3]  Patrick S. Schnable,et al.  Using the biological taxonomy to access biological literature with PathBinderH , 2005, Bioinform..

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Goran Nenadic,et al.  Mining protein function from text using term-based support vector machines , 2005, BMC Bioinformatics.

[6]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[7]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[9]  Hui-Ling Huang,et al.  ESVM: Evolutionary support vector machine for automatic feature selection and classification of microarray data , 2007, Biosyst..

[10]  G. Magenes,et al.  Identification of fetal sufferance antepartum through a multiparametric analysis and a support vector machine , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  Hsin-Chang Yang,et al.  A New Measure of Text Relatedness Using a Novel Classifier-based Vector Approach , 2006, Computers and Their Applications.

[12]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[13]  I. Muchnik,et al.  Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. , 1999, Proteins.

[14]  Javed Mostafa,et al.  Detecting Gene Relations from MEDLINE Abstracts , 2000, Pacific Symposium on Biocomputing.

[15]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[16]  Walker H. Land,et al.  Detection and Classification of Organophosphate Nerve Agent Simulants Using Support Vector Machines with Multiarray Sensors , 2004, J. Chem. Inf. Model..

[17]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[18]  Renato De Leone,et al.  Integrating support vector machines and neural networks , 2007, Neural Networks.

[19]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..