论文信息 - Feature Selection in Kernel Space: A Case Study on Dependency Parsing - 字舞流文

Feature Selection in Kernel Space: A Case Study on Dependency Parsing

Given a set of basic binary features, we propose a new L1 norm SVM based feature selection method that explicitly selects the features in their polynomial or tree kernel spaces. The efficiency comes from the anti-monotone property of the subgradients: the subgradient with respect to a combined feature can be bounded by the subgradient with respect to each of its component features, and a feature can be pruned safely without further consideration if its corresponding subgradient is not steep enough. We conduct experiments on the English dependency parsing task with a third order graph-based parser. Benefiting from the rich features selected in the tree kernel space, our model achieved the best reported unlabeled attachment score of 93.72 without using any additional resource.

Yang Liu | Xian Qian | Xian Qian | Yang Liu

[1] Aron Culotta,et al. Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[2] Alexis Nasr,et al. Pseudo-Projectivity, A Polynomially Parsable Non-Projective Dependency Grammar , 1998, ACL.

[3] Qi Zhang,et al. A Progressive Feature Selection Algorithm for Ultra Large Feature Spaces , 2006, ACL.

[4] Yuji Matsumoto,et al. Efficient Stacked Dependency Parsing by Forest Reranking , 2013, Transactions of the Association for Computational Linguistics.

[5] Jun Suzuki,et al. Convolution Kernels with Feature Selection for Natural Language Processing Tasks , 2004, ACL.

[6] Brian Kingsbury,et al. How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets , 2014, ArXiv.

[7] Noah A. Smith,et al. Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[8] Ivor W. Tsang,et al. Towards Large-scale and Ultrahigh Dimensional Feature Selection via Feature Generation , 2012, ArXiv.

[9] Yue-Shi Lee,et al. An Approximate Approach for Training Polynomial Kernel SVMs in Linear Time , 2007, ACL.

[10] Ben Taskar,et al. Learning structured prediction models: a large margin approach , 2005, ICML.

[11] Hao Zhang,et al. Enforcing Structural Diversity in Cube-pruned Dependency Parsing , 2014, ACL.

[12] Hiroki Arimura,et al. Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[13] Stephen P. Boyd,et al. Subgradient Methods , 2007 .

[14] Joakim Nivre,et al. Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[15] Yuji Matsumoto,et al. Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[16] Jonas Kuhn,et al. The Best of Both Worlds – A Graph-based Completion Model for Transition-based Parsers , 2012, EACL.

[17] Jun'ichi Tsujii,et al. Learning Combination Features with L1 Regularization , 2009, HLT-NAACL.

[18] Michael Collins,et al. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[19] Yang Liu,et al. Branch and Bound Algorithm for Dependency Parsing with Non-local Features , 2013, Transactions of the Association for Computational Linguistics.

[20] Mohammed J. Zaki. Efficiently mining frequent trees in a forest , 2002, KDD.

[21] Hao Zhang,et al. Online Learning for Inexact Hypergraph Search , 2013, EMNLP.

[22] Taku Kudo,et al. Boosting-based Parse Reranking with Subtree Features , 2005, ACL.

[23] Yossi Matias,et al. Spectral bloom filters , 2003, SIGMOD '03.

[24] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[25] Hao Zhang,et al. Generalized Higher-Order Dependency Parsing with Cube Pruning , 2012, EMNLP.

[26] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[27] Alexander M. Rush,et al. Vine Pruning for Efficient Multi-Pass Dependency Parsing , 2012, NAACL.