论文信息 - Learning Distributed Representations for Structured Output Prediction

Learning Distributed Representations for Structured Output Prediction

In recent years, distributed representations of inputs have led to performance gains in many applications by allowing statistical information to be shared across inputs. However, the predicted outputs (labels, and more generally structures) are still treated as discrete objects even though outputs are often not discrete units of meaning. In this paper, we present a new formulation for structured prediction where we represent individual labels in a structure as dense vectors and allow semantically similar labels to share parameters. We extend this representation to larger structures by defining compositionality using tensor products to give a natural generalization of standard structured prediction approaches. We define a learning objective for jointly learning the model parameters and the label vectors and propose an alternating minimization algorithm for learning. We show that our formulation outperforms structural SVM baselines in two tasks: multiclass document classification and part-of-speech tagging.

Christopher D. Manning | Vivek Srikumar | Vivek Srikumar

[1] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[3] Regina Barzilay,et al. Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[4] Geoffrey E. Hinton. Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[5] Tony A. Plate,et al. Holographic reduced representations , 1995, IEEE Trans. Neural Networks.

[6] Stephen P. Boyd,et al. Semidefinite Programming , 1996, SIAM Rev..

[7] Slav Petrov,et al. A Universal Part-of-Speech Tagset , 2011, LREC.

[8] Sebastian Riedel,et al. The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[9] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[10] Shimon Ullman,et al. Uncovering shared structures in multiclass classification , 2007, ICML '07.

[11] Stephen P. Boyd,et al. Rank minimization and applications in system theory , 2004, Proceedings of the 2004 American Control Conference.