Hidden-Unit Conditional Random Fields

The paper explores a generalization of conditional random elds (CRFs) in which binary stochastic hidden units appear between the data and the labels. Hidden-unit CRFs are potentially more powerful than standard CRFs because they can represent nonlinear dependencies at each frame. The hidden units in these models also learn to discover latent distributed structure in the data that improves classication. We derive ecient algorithms for inference and learning in these models by observing that the hidden units are conditionally independent given the data and the labels. Finally, we show that hiddenunit CRFs perform well in experiments on a range of tasks, including optical character recognition, text classication, protein structure prediction, and part-of-speech tagging.

[1]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[2]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[3]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[4]  Geoffrey E. Hinton,et al.  Gated Softmax Classification , 2010, NIPS.

[5]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[6]  George Karypis,et al.  YASSPP: Better kernels and coding schemes lead to improvements in protein secondary structure prediction , 2006, Proteins.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[9]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Zoubin Ghahramani,et al.  Conditional Graphical Models , 2007 .

[12]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[13]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[14]  Lawrence K. Saul,et al.  Matrix updates for perceptron training of continuous density hidden Markov models , 2009, ICML '09.

[15]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[16]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[17]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Thierry Artières,et al.  Neural conditional random fields , 2010, AISTATS.

[19]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[20]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[21]  Jaime G. Carbonell,et al.  Segmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognition , 2005, RECOMB.

[22]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[23]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[24]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[25]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[26]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[27]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[28]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[29]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[30]  Geoffrey E. Hinton,et al.  Generative versus discriminative training of RBMs for classification of fMRI images , 2008, NIPS.

[31]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[32]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[33]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[34]  Andrew Gelfand,et al.  On Herding and the Perceptron Cycling Theorem , 2010, NIPS.

[35]  Fei Sha,et al.  Large Margin Training of Continuous Density Hidden Markov Models , 2009 .

[36]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[38]  Guilherme Hoefel,et al.  Learning a two-stage SVM/CRF sequence classifier , 2008, CIKM '08.

[39]  Eric Fosler-Lussier,et al.  Backpropagation training for multilayer conditional random field based phone recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[41]  Tony Berber Sardinha Corpus linguistics: readings in a widening discipline , 2007 .

[42]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[43]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[44]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[45]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.