We participated (as Team 81) in the Article Classification (ACT) and Interaction Method (IMT) subtasks of the Protein-Protein Interaction task of the Biocreative III Challenge. For the ACT we pursued an extensive testing of available Named Entity Recognition (NER) tools, and used the most promising ones to extend our the Variable Trigonometric Threshold (VTT) linear classifier we successfully used in BioCreative II and II.5. Our main goal was to exploit the power of available NER tools to aid in the document classification of documents relevant for Protein-Protein Interaction. We also used a Support Vector Machine Classifier on NER features for comparison purposes. For the IMT, we experimented with a primarily statistical approach, as opposed to a deeper natural language processing strategy; in a nutshell, we exploited classifiers, simple pattern matching, and ranking of candidate matches using statistical considerations. We will also report on our efforts to integrate our IMT method sentence classifier into our ACT pipeline. Article Classification Task We participated in both the online submission with our own annotation server implementing the VTT algorithm via the BioCreative MetaServer platform, as well as the offline component of the Challenge. We used three distinct classifiers: (1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier that employs word-pair textual features and protein counts extracted using the ABNER tool [1], and which we successfully introduced in the abstract classification task of BioCreative II [2] as well as on the full-text scenario of Biocreative II.5 [3], (2) a novel version of VTT that includes various NER features as well as various
[1]
Karin M. Verspoor,et al.
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
,
2008,
Genome Biology.
[2]
Hagit Shatkay,et al.
Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users
,
2008,
Bioinform..
[3]
Luis Mateus Rocha,et al.
Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features
,
2010,
IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[4]
Jasleen Kaur,et al.
Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features
,
2010,
TCBB.
[5]
Burr Settles,et al.
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text
,
2005
.