Rule Extraction from A Trained Conditional Random Field Model

Conditional Random Field (CRF) has proven to be highly successful for sequence labeling problems like part of speech tagging, segmentation etc. However, the model acts like a black box, providing no insight into what is learned. We propose a system for rule extraction from CRF to assist comprehensibility of the model. Experiments on POS tagging and chunking problem in English are performed as case studies. We test the quality of the extracted rule base by implementing a majority voting rule based tagger, which shows maximum precision of 93.9% for POS tagging and 77% for chunking. The obtained rules conform to our linguistic knowledge of English. We also give quantitative comparison of our approach with PART decision list and C4.5 decision tree learner. Comprehensibility of statistical models is our guiding principle.

[1]  Andreu Català,et al.  Rule extraction from support vector machines , 2002, ESANN.

[2]  Adam Berger,et al.  The Improved Iterative Scaling Algorithm A Gentle Introduction , 2003 .

[3]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[4]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[5]  Bart Baesens,et al.  Rule Extraction from Support Vector Machines: An Overview of Issues and Application in Credit Scoring , 2008, Rule Extraction from Support Vector Machines.

[6]  Bart Baesens,et al.  Using Rule Extraction to Improve the Comprehensibility of Predictive Models , 2006 .

[7]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[10]  Jude W. Shavlik,et al.  Extracting Refined Rules from Knowledge-Based Neural Networks , 1993, Machine Learning.

[11]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.