Using Support Vector Machines for Terrorism Information Extraction

Information extraction (IE) is of great importance in many applications including web intelligence, search engines, text understanding, etc. To extract information from text documents, most IE systems rely on a set of extraction patterns. Each extraction pattern is defined based on the syntactic and/or semantic constraints on the positions of desired entities within natural language sentences. The IE systems also provide a set of pattern templates that determines the kind of syntactic and semantic constraints to be considered. In this paper, we argue that such pattern templates restricts the kind of extraction patterns that can be learned by IE systems. To allow a wider range of context information to be considered in learning extraction patterns, we first propose to model the content and context information of a candidate entity to be extracted as a set of features. A classification model is then built for each category of entities using Support Vector Machines (SVM). We have conducted IE experiments to evaluate our proposed method on a text collection in the terrorism domain. From the preliminary experimental results, we conclude that our proposed method can deliver reasonable accuracies.

[1]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[2]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[3]  Marti A. Hearst,et al.  Adaptive Sentence Boundary Disambiguation , 1994, ANLP.

[4]  Ellen Riloff,et al.  An Empirical Study of Automated Dictionary Construction for Information Extraction in Three Domains , 1996, Artif. Intell..

[5]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[6]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[7]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[8]  Ion Muslea,et al.  Extraction Patterns for Information Extraction Tasks: A Survey , 1999 .

[9]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[10]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11]  Vibhu O. Mittal,et al.  Applying Machine Learning for High‐Performance Named‐Entity Extraction , 2000, Comput. Intell..

[12]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[13]  Dan I. Moldovan,et al.  Acquisition of Linguistic Patterns , 1995 .

[14]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[15]  Dan I. Moldovan,et al.  Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction , 1995, IEEE Trans. Knowl. Data Eng..

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.