Sequential Labeling Using Deep-Structured Conditional Random Fields

We develop and present the deep-structured conditional random field (CRF), a multi-layer CRF model in which each higher layer's input observation sequence consists of the previous layer's observation sequence and the resulted frame-level marginal probabilities. Such a structure can closely approximate the long-range state dependency using only linear-chain or zeroth-order CRFs by constructing features on the previous layer's output (belief). Although the final layer is trained to maximize the log-likelihood of the state (label) sequence, each lower layer is optimized by maximizing the frame-level marginal probabilities. In this deep-structured CRF, both parameter estimation and state sequence inference are carried out efficiently layer-by-layer from bottom to top. We evaluate the deep-structured CRF on two natural language processing tasks: search query tagging and advertisement field segmentation. The experimental results demonstrate that the deep-structured CRF achieves word labeling accuracies that are significantly higher than the best results reported on these tasks using the same labeled training set.

[1]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[2]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[3]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[7]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[8]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[9]  Dan Klein,et al.  Unsupervised Learning of Field Segmentation Models for Information Extraction , 2005, ACL.

[10]  Dong Yu,et al.  Evaluation of a long-contextual-Span hidden trajectory model and phonetic recognizer using a* lattice search , 2005, INTERSPEECH.

[11]  Henry A. Kautz,et al.  Hierarchical Conditional Random Fields for GPS-Based Activity Recognition , 2005, ISRR.

[12]  William W. Cohen,et al.  Stacked Sequential Learning , 2005, IJCAI.

[13]  Paul A. Viola,et al.  Learning to extract information from semi-structured text using a discriminative context free grammar , 2005, SIGIR '05.

[14]  Dong Yu,et al.  A bidirectional target-filtering model of speech coarticulation and reduction: two-stage implementation for phonetic recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[16]  Dong Yu,et al.  Structured speech modeling , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Dong Yu,et al.  A lattice search technique for a long-contextual-span hidden trajectory model of speech , 2006, Speech Commun..

[18]  Bo Zhang,et al.  Webpage understanding: an integrated approach , 2007, KDD '07.

[19]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[20]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[21]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[22]  I. V. Ramakrishnan,et al.  Exploiting Structured Reference Data for Unsupervised Text Segmentation with Conditional Random Fields , 2008, SDM.

[23]  Tran The Truyen On Conditional Random Fields : Applications , Feature Selection , Parameter Estimation and Hierarchical Modelling , 2008 .

[24]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[25]  Rosie Jones,et al.  The Linguistic Structure of English Web-Search Queries , 2008, EMNLP.

[26]  Dong Yu,et al.  Solving nonlinear estimation problems using splines [Lecture Notes] , 2009, IEEE Signal Processing Magazine.

[27]  Xiao Li,et al.  Extracting structured information from user queries with semi-supervised conditional random fields , 2009, SIGIR.

[28]  Li Deng,et al.  Learning in the Deep-Structured Conditional Random Fields , 2009 .

[29]  Yifan Gong,et al.  A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Dong Yu,et al.  Using continuous features in the maximum entropy model , 2009, Pattern Recognit. Lett..

[32]  Xiao Li On the Use of Virtual Evidence in Conditional Random Fields , 2009, EMNLP.

[33]  Dong Yu,et al.  Solving Nonlinear Estimation Problems Using Splines , 2009 .

[34]  Dong Yu,et al.  Language recognition using deep-structured conditional random fields , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.