Empirical Risk Minimization with Approximations of Probabilistic Grammars

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of the parameters of a fixed probabilistic grammar using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the un-supervised setting.

[1]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[4]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[5]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[6]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[7]  Noah A. Smith,et al.  Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization , 2010, ACL.

[8]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[9]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[10]  Jake Porway,et al.  A stochastic graph grammar for compositional object representation and recognition , 2009, Pattern Recognit..

[11]  Naoki Abe,et al.  Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence , 1991, COLT '91.

[12]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[13]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[14]  Noah A. Smith,et al.  Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning , 2012, CL.

[15]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[16]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[17]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[18]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[19]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[20]  Sanjoy Dasgupta,et al.  The Sample Complexity of Learning Fixed-Structure Bayesian Networks , 1997, Machine Learning.

[21]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[22]  Michael Collins,et al.  Parameter Estimation for Statistical Parsing Models: Theory and Practice of , 2001, IWPT.

[23]  Y. Aloimonos,et al.  Discovering a Language for Human Activity 1 , 2005 .

[24]  Daniel Gildea,et al.  Optimal Parsing Strategies for Linear Context-Free Rewriting Systems , 2010, NAACL.

[25]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[26]  Giorgio Satta,et al.  An Optimal-Time Binarization Algorithm for Linear Context-Free Rewriting Systems with Fan-Out Two , 2009, ACL/IJCNLP.