Local ensemble learning from imbalanced and noisy data for word sense disambiguation

Abstract Natural Language Processing plays a key role in man-machine interactions, allowing computers to understand and analyze human language. One of its more challenging sub-domains is word sense disambiguation, the task of automatically identifying the intended sense (or concept) of an ambiguous word based on the context in which the word is used. This requires proper feature extraction to capture specific data properties and a dedicated machine learning solution to allow for the accurate labeling of the appropriate sense. However, the pattern classification problem posed here is highly challenging, as we must deal with high-dimensional and multi-class imbalanced data that additionally may be corrupted with class label noise. To address these issues, we propose a local ensemble learning solution. It uses a one-class decomposition of the multi-class problem, assigning an ensemble of one-class classifiers to each of the distributions. The classifiers are trained on the basis of low-dimensional subsets of features and a kernel feature space transformation to obtain a more compact representation. Instance weighting is used to filter out potentially noisy instances and reduce overlapping among classes. Finally, a two-level classifier fusion technique is used to reconstruct the original multi-class problem. Our results show that the proposed learning approach displays robustness to both multi-class skewed distributions and class label noise, making it a useful tool for the considered task.

[1]  Krzysztof J. Cios,et al.  ur-CAIM: improved CAIM discretization for unbalanced and balanced data , 2016, Soft Comput..

[2]  Diana McCarthy,et al.  Domain-Speci(cid:12)c Sense Distributions and Predominant Sense Acquisition , 2022 .

[3]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[4]  Francisco Herrera,et al.  On the usefulness of one-class classifier ensembles for decomposition of multi-class problems , 2015, Pattern Recognit..

[5]  Ted Pedersen,et al.  Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain , 2007, AMIA.

[6]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[7]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[8]  Marc Weeber,et al.  Developing a test collection for biomedical word sense disambiguation , 2001, AMIA.

[9]  Roberto Alejo,et al.  An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem , 2014, Neural Processing Letters.

[10]  Halil Kilicoglu,et al.  Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment , 2006 .

[11]  Michal Wozniak,et al.  Hybrid Classifiers - Methods of Data, Knowledge, and Classifier Combination , 2013, Studies in Computational Intelligence.

[12]  Hongfang Liu,et al.  Research Paper: A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation , 2004, J. Am. Medical Informatics Assoc..

[13]  Bartosz Krawczyk,et al.  Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets , 2016, Pattern Recognit..

[14]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[15]  Graeme Hirst,et al.  Determining Word Sense Dominance Using a Thesaurus , 2006, EACL.

[16]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[17]  Mark Stevenson,et al.  Disambiguation of biomedical text using diverse sources of information , 2008, BMC Bioinformatics.

[18]  Pedro Antonio Gutiérrez,et al.  A dynamic over-sampling procedure based on sensitivity for multi-class problems , 2011, Pattern Recognit..

[19]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[20]  Christopher G. Chute,et al.  Word sense disambiguation across two domains: Biomedical literature and clinical notes , 2008, J. Biomed. Informatics.

[21]  Thomas C. Rindflesch,et al.  Using Symbolic Knowledge in the UMLS to Disambiguate Words in Small Datasets with a Naïve Bayes Classifier , 2004, MedInfo.

[22]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[23]  Lawrence O. Hall,et al.  Active cleaning of label noise , 2016, Pattern Recognit..

[24]  Thomas C. Rindflesch,et al.  Effects of information and machine learning algorithms on word sense disambiguation with small datasets , 2005, Int. J. Medical Informatics.

[25]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[26]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[27]  Bridget T. McInnes,et al.  Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation , 2011, BMC Bioinformatics.

[28]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[29]  Ted Pedersen,et al.  A Comparative Study of Support Vector Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain , 2005, IICAI.

[30]  Antonio Jimeno-Yepes,et al.  Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts , 2011, BMC Bioinformatics.

[31]  Mark Stevenson,et al.  Disambiguation of Biomedical Abbreviations , 2009, BioNLP@HLT-NAACL.

[32]  David M. J. Tax,et al.  Kernel Whitening for One-Class Classification , 2003, Int. J. Pattern Recognit. Artif. Intell..

[33]  Sebastián Maldonado,et al.  Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers , 2014, Intell. Data Anal..

[34]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[36]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[37]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[38]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[39]  Francisco Herrera,et al.  Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data , 2016, Knowl. Based Syst..

[40]  Nathalie Japkowicz,et al.  One-Class versus Binary Classification: Which and When? , 2012, 2012 11th International Conference on Machine Learning and Applications.

[41]  Mark Stevenson,et al.  The Effect of Word Sense Disambiguation Accuracy on Literature Based Discovery , 2015, DTMBIO@CIKM.

[42]  Sebastián Ventura,et al.  Weighted Data Gravitation Classification for Standard and Imbalanced Data , 2013, IEEE Transactions on Cybernetics.

[43]  David Yarowsky,et al.  Evaluating sense disambiguation across diverse parameter spaces , 2002, Natural Language Engineering.

[44]  Ted Pedersen,et al.  Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[45]  Francisco Herrera,et al.  Evaluating the classifier behavior with noisy data considering performance and robustness: The Equalized Loss of Accuracy measure , 2016, Neurocomputing.

[46]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[47]  Knut Reinert,et al.  SeqAn An efficient, generic C++ library for sequence analysis , 2008, BMC Bioinformatics.

[48]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[49]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[50]  Mário A. T. Figueiredo,et al.  Soft clustering using weighted one-class support vector machines , 2009, Pattern Recognit..

[51]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[52]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[53]  Mark Stevenson,et al.  Disambiguation in the biomedical domain: The role of ambiguity type , 2010, J. Biomed. Informatics.

[54]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[56]  Carol Friedman,et al.  Word Sense Disambiguation via Semantic Type Classification , 2008, AMIA.

[57]  Yorick Wilks,et al.  The Interaction of Knowledge Sources in Word Sense Disambiguation , 2001, CL.

[58]  Liu Xiao,et al.  BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification , 2016 .