CS224N/Ling 237 Final Programming Project Guidelines

This project is an opportunity for you to work on an NLP system in an area of your choice! The projects will be judged on creativity in defining the problem to be investigated, the methods used, thoroughness in considering and justifying your design decisions, and quality of your write-up, including your testing of the system, error and success analysis, and reporting of results. You will not be penalized if your system performs poorly, providing your initial design decisions weren’t obviously unjustifiable, and you have made reasonable attempts to analyze why it failed, and to examine how the system might be improved. The final project can be a group project. Indeed, we’d strongly encourage you to work as a group, so you can attempt something larger and more interesting. The amount of work should be appropriately scaled to the size of the group (though the expected scaling is sublinear), and you should include a brief statement on the responsibilities of different members of the team. Team members will normally get the same grade, but we reserve the right to differentiate in egregious cases. In general we would like group sizes of 2 or 3 – if you’re considering a bigger group, you must talk to us and convince us that a group of greater than 3 is manageable given the inherent parallelizability of the task, and the time available to organize and implement the system. Solo projects are allowed. You are free (and, where appropriate, encouraged) to make use of existing code and systems as part of your project, but you should make sure their use is properly acknowledged, and make clear what additional value your project is adding.

[1]  Noah A. Smith,et al.  Parsing with Soft and Hard Constraints on Dependency Length , 2005 .

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[4]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[5]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[6]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[7]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[8]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[9]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[10]  Ronald Rosenfeld,et al.  Adaptive Language Modeling Using the Maximum Entropy Principle , 1993, HLT.

[11]  Ted Briscoe,et al.  Robust stochastic parsing using the inside-outside algorithm , 1994, ArXiv.

[12]  Hinrich Schütze Distributional Part-of-Speech Tagging , 1995, EACL.

[13]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[14]  John Hale,et al.  A Statistical Approach to Anaphora Resolution , 1998, VLC@COLING/ACL.

[15]  Hinrich Schütze,et al.  Part-of-Speech Induction From Scratch , 1993, ACL.

[16]  Bob Carpenter,et al.  Probabilistic Parsing using Left Corner Language Models , 1997, IWPT.

[17]  Noah A. Smith,et al.  Guiding Unsupervised Grammar Induction Using Contrastive Estimation , 2005 .

[18]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[19]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[20]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[21]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[22]  Kathleen McKeown,et al.  The decomposition of human-written summary sentences , 1999, SIGIR '99.

[23]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[24]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[25]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[26]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[27]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[28]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[29]  Andrei Mikheev,et al.  Tagging Sentence Boundaries , 2000, ANLP.

[30]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.