Neural conditional random fields

We propose a non-linear graphical model for structured prediction. It combines the power of deep neural networks to extract high level features with the graphical framework of Markov networks, yielding a powerful and scalable probabilistic model that we apply to signal labeling tasks.

[1]  Yoshua Bengio,et al.  Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[3]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[6]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[7]  Robert H. Kassel,et al.  A comparison of approaches to on-line handwritten character recognition , 1995 .

[8]  Yasubumi Sakakibara,et al.  RNA secondary structural alignment with conditional random fields , 2005, ECCB/JBI.

[9]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[10]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[11]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[14]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[15]  Yanjun Qi,et al.  Semi-Supervised Sequence Labeling with Self-Learned Features , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[17]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[18]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[19]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[20]  Thierry Artières,et al.  Large margin training for hidden Markov models with partially observed states , 2009, ICML '09.

[21]  Zoubin Ghahramani,et al.  Conditional Graphical Models , 2007 .

[22]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[23]  Lawrence K. Saul,et al.  Matrix updates for perceptron training of continuous density hidden Markov models , 2009, ICML '09.

[24]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[25]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[26]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[27]  Dong Yu,et al.  Structured speech modeling , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Thomas Hofmann,et al.  Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences , 2003, EMNLP.

[29]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.