Selecting features of linear-chain conditional random fields via greedy stage-wise algorithms

This paper presents two embedded feature selection algorithms for linear-chain CRFs named GFSA_LCRF and PGFSA_LCRF. GFSA_LCRF iteratively selects a feature incorporating which into the CRF will improve the conditional log-likelihood of the CRF most at one time. For time efficiency, only the weight of the new feature is optimized to maximize the log-likelihood instead of all weights of features in the CRF. The process is iterated until incorporating new features into the CRF can not improve the log-likelihood of the CRF noticeably. PGFSA_LCRF adopts pseudo-likelihood as evaluation criterion to iteratively select features to improve the speed of GFSA_LCRF. Furthermore, it scans all candidate features and forms a small feature set containing some promising features at certain iterations. Then, the small feature set will be used by subsequent iterations to further improve the speed. Experiments on two real-world problems show that CRFs with significantly fewer features selected by our algorithms achieve competitive performance while obtaining significantly shorter testing time.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Wei-Ying Ma,et al.  2D Conditional Random Fields for Web information extraction , 2005, ICML.

[3]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[7]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[8]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  Andrew McCallum,et al.  Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences , 2003 .

[11]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[12]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[13]  Hanna M. Wallach,et al.  Efficient Training of Conditional Random Fields , 2002 .

[14]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[15]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[16]  Andrew McCallum,et al.  Gene Prediction with Conditional Random Fields , 2005 .

[17]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[18]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[19]  Thomas G. Dietterich,et al.  Training conditional random fields via gradient tree boosting , 2004, ICML.

[20]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[21]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[22]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[23]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[24]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[27]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[28]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[29]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[30]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[31]  Mark W. Schmidt,et al.  Segmenting Brain Tumors with Conditional Random Fields and Support Vector Machines , 2005, CVBIA.