Named Entity Recognition Based on A Machine Learning Model

For the recruitment information in Web pages, a novel unified model for named entity recognition is proposed in this study. The models provide a simple statistical framework to incorporate a wide variety of linguistic knowledge and statistical models in a unified way. In our approach, firstly, Multi-Rules are built for a better representation of the named entity, in order to emphasize the specific semantics and term space in the named entity. Then an optimal algorithm of the hierarchically structured DSTCRFs is performed, in order to pick out the structure attributes of the named entity from the recruitment knowledge and optimize the efficiency of the training. The experimental results showed that the accuracy rate has been significantly improved and the complexity of sample training has been decreased.

[1]  Changning Huang,et al.  Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.

[2]  Masaki Murata,et al.  Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules , 2000, ACL.

[3]  Kalina Bontcheva,et al.  Using Uneven Margins SVM and Perceptron for Information Extraction , 2005, CoNLL.

[4]  Lei Zhang,et al.  Chinese Named Entity Identification Using Class-based Language Model , 2002, COLING.

[5]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[6]  Wei Li,et al.  Rapid development of Hindi named entity recognition using conditional random fields and feature induction , 2003, TALIP.

[7]  T. Gaustad Proceedings of the 20th International Conference on Computational Linguistics (Coling 2004) , 2004, ACL 2004.

[8]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[9]  Nancy Chinchor,et al.  Appendix E: MUC-7 Named Entity Task Definition (version 3.5) , 1998, MUC.

[10]  Andrew McCallum,et al.  Accurate Information Extraction from Research Papers using Conditional Random Fields , 2004, NAACL.

[11]  Bo Zhang,et al.  Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction , 2008, J. Mach. Learn. Res..

[12]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[13]  Wei-Ying Ma,et al.  2D Conditional Random Fields for Web information extraction , 2005, ICML.

[14]  Hae-Chang Rim,et al.  Semantic Role Labeling using Maximum Entropy Model , 2004, CoNLL.

[15]  Ronen Feldman,et al.  A hybrid approach to NER by MEMM and manual rules , 2005, CIKM '05.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  B. Noble,et al.  On certain integrals of Lipschitz-Hankel type involving products of bessel functions , 1955, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences.

[18]  Guohong Fu,et al.  Chinese named entity recognition using lexicalized HMMs , 2005, SKDD.

[19]  Wen-zhong Guo,et al.  Chinese Web page classification using noise-tolerant support vector machines , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[20]  Christopher D. Manning,et al.  Joint Learning Improves Semantic Role Labeling , 2005, ACL.

[21]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[22]  Yue Kou Improving the accuracy of entity identification through refinement , 2008, Ph.D. '08.

[23]  Bo Zhang,et al.  Webpage understanding: an integrated approach , 2007, KDD '07.

[24]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.