MODE: multiobjective differential evolution for feature selection and classifier ensemble

In this paper, we propose a multiobjective differential evolution (MODE)-based feature selection and ensemble learning approaches for entity extraction in biomedical texts. The first step of the algorithm concerns with the problem of automatic feature selection in a machine learning framework, namely conditional random field. The final Pareto optimal front which is obtained as an output of the feature selection module contains a set of solutions, each of which represents a particular feature representation. In the second step of our algorithm, we combine a subset of these classifiers using a MODE-based ensemble technique. Our experiments on three benchmark datasets namely GENIA, GENETAG and AIMed show the F-measure values of 76.75, 94.15 and 91.91 %, respectively. Comparisons with the existing systems show that our proposed algorithm achieves the performance levels which are at par with the state of the art. These results also exhibit that our method is general in nature and because of this it performs well across the several domain of datasets. The key contribution of this work is the development of MODE-based generalized feature selection and ensemble learning techniques with the aim of extracting entities from the biomedical texts of several domains.

[1]  Cheng-Ju Kuo,et al.  Rich Feature Set, Unification of Bidirectional Parsing and Dictionary Filtering for High F-Score Gene Mention Tagging. , 2007 .

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Jing Sun,et al.  Boosting performance of gene mention tagging system by classifiers ensemble , 2010, Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010).

[4]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[5]  B.V. Dasarathy,et al.  A composite classifier system design: Concepts and methodology , 1979, Proceedings of the IEEE.

[6]  Rie Kubota Ando,et al.  BioCreative II Gene Mention Tagging System at IBM Watson , 2007 .

[7]  N. A. Elhefnawy Solving Bi-level Problems Using Modified Particle Swarm Optimization Algorithm , 2014 .

[8]  Asif Ekbal,et al.  A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies , 2011, Expert Syst. Appl..

[9]  A. Abraham,et al.  Simplex Differential Evolution , 2009 .

[10]  Stefan Preitl,et al.  Iterative Feedback Tuning in Fuzzy Control Systems. Theory and Applications , 2006 .

[11]  Malvina Nissim,et al.  Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web , 2004, NLPBA/BioNLP.

[12]  Tiejun Zhao,et al.  Biomedical Named Entity Recognition Based on Classifiers Ensemble , 2008, Int. J. Comput. Sci. Appl..

[13]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[14]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[15]  Paolo Rosso,et al.  Biomedical Named Entity Recognition: A Poor Knowledge HMM-Based Approach , 2007, NLDB.

[16]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[17]  Gary Geunbae Lee,et al.  POSBIOTM-NER in the Shared Task of BioNLP/NLPBA2004 , 2004, NLPBA/BioNLP.

[18]  F. J. WANG,et al.  Parallelisation for finite-discrete element analysis in a distributed-memory environment , 2004, Int. J. Comput. Eng. Sci..

[19]  Asif Ekbal,et al.  Differential Evolution Based Feature Selection and Classifier Ensemble for Named Entity Recognition , 2012, COLING.

[20]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[21]  Hae-Chang Rim,et al.  ME-based biomedical named entity recognition using lexical knowledge , 2006, TALIP.

[22]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[23]  Asif Ekbal,et al.  Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach , 2011, TALIP.

[24]  Yuji Sato,et al.  Voice quality conversion using interactive evolution of prosodic control , 2009, Appl. Soft Comput..

[25]  Edwin Lughofer,et al.  Machine learning based analysis of gender differences in visual inspection decision making , 2013, Inf. Sci..

[26]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[27]  Asif Ekbal,et al.  Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition , 2010 .

[28]  Asif Ekbal,et al.  Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[29]  Su Jian,et al.  Exploring deep knowledge resources in biomedical name recognition , 2004 .

[30]  Asif Ekbal,et al.  Weighted Vote Based Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition , 2010, NLDB.

[31]  Hae-Chang Rim,et al.  Two-Phase Biomedical Named Entity Recognition Using A Hybrid Method , 2005, IJCNLP.

[32]  Cheng-Ju Kuo,et al.  High-Recall Gene Mention Recognition by Unification of Multiple Backward Parsing Models , 2007 .

[33]  D. A. Preece,et al.  An introduction to the statistical analysis of data , 1979 .

[34]  R. Sabourin,et al.  Feature subset selection using genetic algorithms for handwritten digit recognition , 2001, Proceedings XIV Brazilian Symposium on Computer Graphics and Image Processing.

[35]  Janez Brest,et al.  Self-adaptive differential evolution algorithm using population size reduction and three strategies , 2011, Soft Comput..

[36]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[37]  Jing Sun,et al.  Boosting performance of gene mention tagging system by hybrid methods , 2012, J. Biomed. Informatics.

[38]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[39]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[40]  Pabitra Mitra,et al.  Feature selection techniques for maximum entropy based biomedical named entity recognition , 2009, J. Biomed. Informatics.

[41]  Rajkumar Roy,et al.  Evolutionary computing in manufacturing industry: an overview of recent applications , 2005, Appl. Soft Comput..

[42]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[43]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .