Semi-supervised training for conditional random fields with pseudo auxiliary task

Conditional random fields (CRFs) have been successful in many sequence labeling tasks, which conventionally rely on a hand-craft feature representation of input data. However, a powerful data representation could be another determining factor of the performance, which has not attracted enough attention yet. We describe a novel sequence labeling framework that builds a supervised CRF and an unsuper-vised dynamic model on a shared nonlinear feature transformation neural network. The model could be used for transfer learning by jointly optimizing two learning tasks together. We demonstrate the effectiveness of the proposed modeling framework using synthetic data. We also show that this model yields a significant improvement of recognition accuracy over conventional CRFs on gesture recognition tasks.

[1]  Ashish Kapoor,et al.  A real-time head nod and shake detector , 2001, PUI '01.

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Yasubumi Sakakibara,et al.  RNA secondary structural alignment with conditional random fields , 2005, ECCB/JBI.

[4]  Burr Settles ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[5]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[6]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[7]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[8]  Yasubumi Sakakibara,et al.  RNA Structural Alignment with Conditional Random Fields , 2005 .

[9]  Andrew McCallum,et al.  Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..

[10]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[11]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Christophe Garcia,et al.  Modeling gaze behavior for a 3D ECA in a dialogue situation , 2006, IUI '06.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Norihiro Hagita,et al.  Messages embedded in gaze of interface agents --- impression management with agent's gaze , 2002, CHI.

[18]  S. Drucker,et al.  The Role of Eye Gaze in Avatar Mediated Conversational Interfaces , 2000 .