论文信息 - Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization - 字舞流文

Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization

We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by "Viterbi training." We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.

Noah A. Smith | Shay B. Cohen

[1] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[2] Mark Johnson,et al. Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[3] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4] Thomas L. Griffiths,et al. A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[5] J. Pitman,et al. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[6] Max Welling,et al. Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[7] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[8] J. Pitman. Combinatorial Stochastic Processes , 2006 .

[9] Noah A. Smith,et al. Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[10] Thomas L. Griffiths,et al. Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[11] Francisco Casacuberta,et al. Submission to ICGI-2000 Computational complexity of problems on probabilistic grammars and transducers , 2007 .

[12] Eugene Charniak,et al. Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[13] Pierre Hansen,et al. NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[14] Mark Johnson,et al. Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[15] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[16] T. A. Cartwright,et al. Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[17] Valentin I. Spitkovsky,et al. Viterbi Training Improves Unsupervised Dependency Parsing , 2010, CoNLL.

[18] Mark Johnson,et al. Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars , 2008, SIGMORPHON.

[19] Christian N. S. Pedersen,et al. The consensus string problem and the complexity of comparing hidden Markov models , 2002, J. Comput. Syst. Sci..

[20] Chong Wang,et al. Variational Inference for the Nested Chinese Restaurant Process , 2009, NIPS.

[21] Naonori Ueda,et al. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[22] Dan Klein,et al. Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[23] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[24] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25] Max Welling,et al. Accelerated Variational Dirichlet Process Mixtures , 2006, NIPS.

[26] Michael I. Jordan,et al. Variational inference for Dirichlet process mixtures , 2006 .

[27] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[28] John DeNero,et al. The Complexity of Phrase Alignment Problems , 2008, ACL.

[29] J. Sethuraman. A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[30] Eugene Charniak,et al. Effective Self-Training for Parsing , 2006, NAACL.

[31] Claire Cardie,et al. Structured Local Training and Biased Potential Functions for Conditional Random Fields with Application to Coreference Resolution , 2007, HLT-NAACL.

[32] W. H. Day. Computationally difficult parsimony problems in phylogenetic systematics , 1983 .

[33] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[34] Mark Johnson,et al. Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure , 2008, ACL.

[35] Khalil Simaan,et al. Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[36] C. Antoniak. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[37] Noah A. Smith,et al. Novel estimation methods for unsupervised discovery of latent structure in natural language text , 2007 .

[38] Meena Mahajan,et al. The Planar k-means Problem is NP-hard I , 2009 .

[39] Noah A. Smith,et al. Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction , 2008, NIPS.

[40] Christian P. Robert,et al. Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[41] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[42] Mark Johnson,et al. A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[43] Naoki Abe,et al. On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[44] Joshua Goodman,et al. Parsing Algorithms and Metrics , 1996, ACL.

[45] Michael Sipser,et al. Introduction to the Theory of Computation , 1996, SIGA.

[46] Kevin Knight,et al. Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[47] Mark Steedman,et al. On “The Computation” , 2007 .

[48] Micha Elsner,et al. Structured Generative Models for Unsupervised Named-Entity Clustering , 2009, HLT-NAACL.

[49] Aravind K. Joshi,et al. Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[50] Noah A. Smith,et al. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[51] Giorgio Satta,et al. On the Complexity of Non-Projective Data-Driven Dependency Parsing , 2007, IWPT.

[52] Joshua Goodman,et al. Parsing Inside-Out , 1998, ArXiv.

[53] Steven Abney,et al. Semisupervised Learning for Computational Linguistics , 2007 .

[54] Eugene Charniak,et al. Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[55] Hemanta K. Maji,et al. Computational Complexity of Statistical Machine Translation , 2006, EACL.

[56] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[57] Noah A. Smith,et al. Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.

[58] Dan Klein,et al. The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[59] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[60] Thomas L. Griffiths,et al. Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.