Linguistically Motivated Large-Scale NLP with C&C and Boxer

The statistical modelling of language, together with advances in wide-coverage grammar development, have led to high levels of robustness and efficiency in NLP systems and made linguistically motivated large-scale language processing a possibility (Matsuzaki et al., 2007; Kaplan et al., 2004). This paper describes an NLP system which is based on syntactic and semantic formalisms from theoretical linguistics, and which we have used to analyse the entire Gigaword corpus (1 billion words) in less than 5 days using only 18 processors. This combination of detail and speed of analysis represents a break-through in NLP technology.

[1]  Anders Søgaard,et al.  Patrick Blackburn and Johan Bos, Representation and Inference for Natural Language , 2007, Stud Logica.

[2]  Stefan Riezler,et al.  Speed and Accuracy in Shallow and Deep Stochastic Parsing , 2004, NAACL.

[3]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[4]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[5]  Francis Jeffry Pelletier,et al.  Representation and Inference for Natural Language: A First Course in Computational Semantics , 2005, Computational Linguistics.

[6]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[7]  James R. Curran,et al.  Formalism-Independent Parser Evaluation with CCG and DepBank , 2007, ACL.

[8]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[9]  Malvina Nissim,et al.  Question Answering with QED at TREC 2005 , 2005, TREC.

[10]  James R. Curran,et al.  Perceptron Training for a Wide-Coverage Lexicalized-Grammar Parser , 2007, ACL 2007.

[11]  James R. Curran,et al.  Multi-Tagging for Lexicalized-Grammar Parsing , 2006, ACL.

[12]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[13]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[14]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[15]  James R. Curran,et al.  Investigating GIS and Smoothing for Maximum Entropy Taggers , 2003, EACL.

[16]  Mark Steedman,et al.  Object-Extraction and Question-Parsing using CCG , 2004, EMNLP.

[17]  Jun'ichi Tsujii,et al.  Efficient HPSG Parsing with Supertagging and CFG-Filtering , 2007, IJCAI.

[18]  Johan Bos Towards Wide-Coverage Semantic Interpretation , 2005 .

[19]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.