论文信息 - Support Vector Machines to Weight Voters in a Voting System of Entity Extractors

Support Vector Machines to Weight Voters in a Voting System of Entity Extractors

Support vector machines are used to combine the outputs of multiple entity extractors, thus creating a composite entity extraction system. The composite system has a significantly higher f-measure than any of the component systems. Compared to a standard voting technique for combining the results of multiple entity extractors, the SVM approach produces comparable precision and recall statistics but tends to utilize fewer of the component entity extractors, thus providing superior computational efficiency, which is critical in practical applications. In this paper, we present our experimental results of comparing a standard voting technique with SVM that each aggregate four entity extractors. We also describe our future plans of integrating agent-based technology into our experimental testbed where we examine the evolution of composite techniques as part of the analysis stream. Given that much of the improvement comes from tuning the algorithms to the data stream with a human-in-the-loop, we are considering the merits of employing cognitive agents that are strategically embedded in the workflow for processing data. As we tune the algorithms for better performance on the data streams, we envision agents learning the patterns of data streams and apply the appropriate tuning to ensure optimality.

[1] Hopkins UniversityBaltimore. Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999 .

[2] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[4] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5] Lynette Hirschman,et al. Mixed-Initiative Development of Language Processing Systems , 1997, ANLP.

[6] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7] John A. Zinky,et al. Tools and techniques for performance measurement of large distributed multiagent systems , 2003, AAMAS '03.

[8] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[9] Aravind Joshi,et al. Cogniac: a discourse processing engine , 1995 .