Improving Term Extraction by System Combination Using Boosting

Term extraction is the task of automatically detecting, from textual corpora, lexical units that designate concepts in thematically restricted domains (e.g. medicine). Current systems for term extraction integrate linguistic and statistical cues to perform the detection of terms. The best results have been obtained when some kind of combination of simple base term extractors is performed [14]. In this paper it is shown that this combination can be further improved by posing an additional learning problem of how to find the best combination of base term extractors. Empirical results, using AdaBoost in the metalearning step, show that the ensemble constructed surpasses the performance of all individual extractors and simple voting schemes, obtaining significantly better accuracy figures at all levels of recall.

[1]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[2]  Sophia Ananiadou,et al.  A Methodology for Automatic Term Recognition , 1994, COLING.

[3]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[4]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Horacio Rodríguez,et al.  Improving term extraction by combining different techniques , 2001 .

[6]  Lluís Màrquez i Villodre,et al.  Boosting Applied toe Word Sense Disambiguation , 2000, ECML.

[7]  Lluís Màrquez i Villodre,et al.  Boosting Applied to Word Sense Disambiguation , 2000, ArXiv.

[8]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[9]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[10]  Bernardo Magnini,et al.  Integrating Subject Field Codes into WordNet , 2000, LREC.

[11]  D. Bourigault Lexter : un Logiciel d'EXtraction de TERminologie : application à l'acquisition des connaissances à partir de textes , 1994 .

[12]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[14]  Xavier Carreras,et al.  Boosting trees for clause splitting , 2001, CoNLL.

[15]  D. Maynard Term recognition using combined knowledge sources , 1999 .

[16]  B. Daille Approche mixte pour l'extraction de terminologie : statistique lexicale et filtres linguistiques , 1994 .

[17]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.