Biomedical Article Classification Using an Agent-Based Model of T-Cell Cross-Regulation

We propose a novel bio-inspired solution for biomedical article classification. Our method draws from an existing model of T-cell cross-regulation in the vertebrate immune system (IS), which is a complex adaptive system of millions of cells interacting to distinguish between harmless and harmful intruders. Analogously, automatic biomedical article classification assumes that the interaction and co-occurrence of thousands of words in text can be used to identify conceptually-related classes of articles—at a minimum, two classes with relevant and irrelevant articles for a given concept (e.g. articles with protein-protein interaction information). Our agent-based method for document classification expands the existing analytical model of Carneiro et al. [1], by allowing us to deal simultaneously with many distinct T-cell features (epitomes) and their collective dynamics using agent based modeling. We already extended this model to develop a bio-inspired spam-detection system [2, 3]. Here we develop our agent-base model further, and test it on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge [4]. We study several new parameter configurations leading to encouraging results comparable to state-of-the-art classifiers. These results help us understand both T-cell cross-regulation and its applicability to document classification in general. Therefore, we show that our bio-inspired algorithm is a promising novel method for biomedical article classification and for binary document classification in general.

[1]  Ana Gabriela Maguitman,et al.  Uncovering Protein-Protein Interactions in the Bibliome , 2007 .

[2]  Abdul Sattar,et al.  AI 2006: Advances in Artificial Intelligence, 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006, Proceedings , 2006, Australian Conference on Artificial Intelligence.

[3]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[4]  Phil Husbands,et al.  Artificial Life IX: Proceedings of the Ninth International Conference on the Simulation and Synthesis of Living Systems , 2004 .

[5]  Steve Cayzer,et al.  An Immune-based Approach to Document Classification , 2003, IIS.

[6]  Luis Mateus Rocha,et al.  Adaptive Spam Detection Inspired by the Immune System , 2008, ALIFE.

[7]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[8]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[9]  Luis Mateus Rocha,et al.  Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift , 2008, ICARIS.

[10]  N. Sepúlveda How is the T-cell repertoire shaped. , 2011 .

[11]  Karin M. Verspoor,et al.  Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks , 2008, Genome Biology.

[12]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[13]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[14]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[15]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[16]  Jonathan Timmis,et al.  Artificial immune systems—today and tomorrow , 2007, Natural Computing.

[17]  C. van den Dool,et al.  When three is not a crowd: a Crossregulation Model of the dynamics and repertoire selection of regulatory CD4+ T cells , 2007, Immunological reviews.

[18]  Eugene W. Myers,et al.  Whole-genome DNA sequencing , 1999, Comput. Sci. Eng..

[19]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[20]  Jasleen Kaur,et al.  Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features , 2010, TCBB.