Simulated annealing based classifier ensemble techniques: Application to part of speech tagging

Part-of-Speech (PoS) tagging is an important pipelined module for almost all Natural Language Processing (NLP) application areas. In this paper we formulate PoS tagging within the frameworks of single and multi-objective optimization techniques. At the very first step we propose a classifier ensemble technique for PoS tagging using the concept of single objective optimization (SOO) that exploits the search capability of simulated annealing (SA). Thereafter we devise a method based on multiobjective optimization (MOO) to solve the same problem, and for this a recently developed multiobjective simulated annealing based technique, AMOSA, is used. The characteristic features of AMOSA are its concepts of the amount of domination and archive in simulated annealing, and situation specific acceptance probabilities. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) as the underlying classification methods that make use of a diverse set of features, mostly based on local contexts and orthographic constructs. We evaluate our proposed approaches for two Indian languages, namely Bengali and Hindi. Evaluation results of the single objective version shows the overall accuracy of 88.92% for Bengali and 87.67% for Hindi. The MOO based ensemble yields the overall accuracies of 90.45% and 89.88% for Bengali and Hindi, respectively.

[1]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[2]  D. A. Preece,et al.  An introduction to the statistical analysis of data , 1979 .

[3]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[4]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Sivaji Bandyopadhyay,et al.  Maximum Entropy Based Bengali Part of Speech Tagging , 2008 .

[7]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[8]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[9]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[10]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[11]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[12]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[15]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (2nd, extended ed.) , 1994 .

[16]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[17]  ABOUT IIT BOMBAY & , 2022 .

[18]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[19]  Pushpak Bhattacharyya,et al.  Morphological Richness Offsets Resource Demand - Experiences in Constructing a POS Tagger for Hindi , 2006, ACL.

[20]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[21]  Douglas H. Norrie,et al.  Agent-Based Systems for Intelligent Manufacturing: A State-of-the-Art Survey , 1999, Knowledge and Information Systems.

[22]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[23]  Carlos A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[24]  Sudeshna Sarkar,et al.  Part of Speech Tagging for Bengali with Hidden Markov Model , 2006 .

[25]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[26]  Chris Murphy,et al.  Dominance-Based Multiobjective Simulated Annealing , 2008, IEEE Transactions on Evolutionary Computation.

[27]  Sivaji Bandyopadhyay,et al.  Web-Based Bengali News Corpus for Lexicon Development and POS Tagging , 2008, Polibits.

[28]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[29]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[30]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[31]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[32]  Sathish Pammi Tagging and Chunking using Decision Forests , 2022 .

[33]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[34]  Avinesh Pvs,et al.  Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning , 2006 .

[35]  Anders Søgaard,et al.  Simple Semi-Supervised Training of Part-Of-Speech Taggers , 2010, ACL.

[36]  John D. Lafferty,et al.  Decision Tree Models Applied to the Labeling of Text with Parts-of-Speech , 1992, HLT.

[37]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[38]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  Sivaji Bandyopadhyay,et al.  Voted NER System using Appropriate Unlabeled Data , 2009, NEWS@IJCNLP.

[41]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[42]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[43]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[44]  Masahiko Haruno,et al.  Feature Selection in SVM Text Categorization , 1999, AAAI/IAAI.

[45]  Kevin J. Cherkauer Human Expert-level Performance on a Scientiic Image Analysis Task by a System Using Combined Artiicial Neural Networks , 1996 .

[46]  Slav Petrov,et al.  Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[47]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[48]  B. Suman,et al.  A survey of simulated annealing as a tool for single and multiobjective optimization , 2006, J. Oper. Res. Soc..

[49]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[50]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[51]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[52]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.