论文信息 - Application of SVM in Citation Information Extraction

Application of SVM in Citation Information Extraction

Support Vector Machines are an effective form of binary-class classification algorithm. To enhance the utilization of text structural features for information extraction, which are greatly restricted by the Hidden Markov Model (HMM), this paper proposes a support vector machine multi-class classification based on Markov properties to extract the information from a citation database. The proposed model extracts symbol characteristics as features and composes a binary tree of the transition probabilities. Experiments show that the proposed method outperforms HMM and basic SVM methods.

Jiguang Liang | Robert Layton | Wei Wang

[1] Chen Zhi-ping. Text Information Extraction Based on Hidden Markov Model , 2004 .

[2] Hwee Tou Ng,et al. A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[3] Edward A. Fox,et al. Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[4] Rong Li,et al. Text Information Extraction Based on Genetic Algorithm and Hidden Markov Model , 2009, 2009 First International Workshop on Education Technology and Computer Science.

[5] Soo-Young Lee,et al. Support Vector Machines with Binary Tree Architecture for Multi-Class Classification , 2004 .

[6] Gao Wen. Citation Extraction Based on Hidden Markov Model , 2003 .

[7] Chen Mianyun. On Multiclass Classification Methods for Support Vector Machines , 2005 .