Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)

Software engineering is a first-class research topic in computer science, but generally has not been treated as such within the natural language processing community. However, the need for wellengineered NLP components is increasing as NLP begins to show up outside our research community: bioinformatics, the search industry, education applications, etc. In addition, NLP research itself, e.g., when it involves large data sets, often requires a high level of software quality. Simply applying standard software engineering practices to NLP often fails due to the unique characteristics of natural language as an input type. The goals of this workshop include raising awareness of the need for good software engineering practices in NLP, stimulating research on same, and providing a forum for sharing current work in this area.

[1]  Manny Rayner,et al.  Building Mobile Spoken Dialogue Applications Using Regulus , 2008, LREC.

[2]  Johanna D. Moore,et al.  Diagnosing Natural Language Answers to Support Adaptive Tutoring , 2008, FLAIRS Conference.

[3]  Beth Ann Hockey,et al.  Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler (Studies in Computational Linguistics (Stanford, Calif.).) , 2006 .

[4]  Charles B. Callaway,et al.  Interpretation and Generation in a Knowledge-Based TutorialSystem , 2006, Proceedings of the Workshop KRAQ'06 on Knowledge and Reasoning for Language Processing - KRAQ '06.

[5]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[6]  James F. Allen,et al.  Deep Linguistic Processing for Spoken Dialogue Systems , 2007, ACL 2007.

[7]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  John Blackmore,et al.  Proceedings of the Twenty-Second International FLAIRS Conference (2009) c-rater:Automatic Content Scoring for Short Constructed Responses , 2022 .

[9]  Lorna Balkan,et al.  Test Suites for Natural Language Processing , 1995, TC.

[10]  Burr Settles ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[11]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.

[12]  Patrick Paroubek,et al.  PASSAGE: from French Parser Evaluation to Large Sized Treebank , 2008, LREC.

[13]  Sophia Ananiadou,et al.  Towards Data and Goal Oriented Analysis: Tool Inter-operability and Combinatorial Comparison , 2008, IJCNLP.

[14]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[15]  Hitoshi Isahara,et al.  Many-to-Many Multilingual Medical Speech Translation on a PDA , 2008, AMTA 2008.

[16]  K Bretonnel Cohen,et al.  Journal of Biomedical Discovery and Collaboration Open Access an Open-source Framework for Large-scale, Flexible Evaluation of Biomedical Text Mining Systems , 2008 .

[17]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[18]  Michael White,et al.  Hypertagging: Supertagging for Surface Realization with CCG , 2008, ACL.

[19]  Lucy Vanderwende,et al.  What Syntax Can Contribute in the Entailment Task , 2005, MLCW.

[20]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21]  David Haussler,et al.  KDD for Science Data Analysis: Issues and Examples , 1996, KDD.

[22]  James F. Allen,et al.  An architecture for a generic dialogue shell , 2000, Natural Language Engineering.

[23]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[24]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[25]  Pierre-Alain Moëllic,et al.  PIRIA: a general tool for indexing, search, and retrieval of multimedia content , 2004, IS&T/SPIE Electronic Imaging.

[26]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[27]  K. Bretonnel Cohen,et al.  U-Compare: share and compare text mining tools with UIMA , 2009, Bioinform..

[28]  Michael White,et al.  Projecting Propbank Roles onto the CCGbank , 2008, LREC.

[29]  Fredrik Olsson,et al.  Protein names and how to find them , 2002, Int. J. Medical Informatics.

[30]  S. Golomb Run-length encodings. , 1966 .

[31]  Sanjiv Augustine,et al.  Managing Agile Projects , 2005 .

[32]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[33]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[34]  Joel D. Martin,et al.  PORTAGE: A Phrase-Based Machine Translation System , 2005, ParallelText@ACL.

[35]  Gordon Sande,et al.  John Wilder Tukey , 2001 .

[36]  Mehdi Embarek,et al.  Learning Patterns for Building Resources about Semantic Relations in the Medical Domain , 2008, LREC.

[37]  Diane J. Litman,et al.  Content-Learning Correlations in Spoken Tutoring Dialogs at Word, Turn, and Discourse Levels , 2008, FLAIRS Conference.

[38]  Beth Ann Hockey,et al.  A Voice Enabled Procedure Browser for the International Space Station , 2005, ACL.

[39]  Khalil Sima'an,et al.  Proceedings of the Sixth International Language Resources and Evaluation (LREC'08) , 2008 .

[40]  Chak-Kuen Wong,et al.  On Binary Search Trees , 1971, IFIP Congress.

[41]  Paul Watson,et al.  How do I model state?: Let me count the ways , 2008, CACM.

[42]  Patrick Ng,et al.  Apples to apples: improving the performance of motif finders and their significance analysis in the Twilight Zone , 2006, ISMB.

[43]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[44]  Andy Way,et al.  Evaluating Automatic LFG F-Structure Annotation for the Penn-II Treebank , 2004 .

[45]  Thorsten Brants,et al.  Randomized Language Models via Perfect Hash Functions , 2008, ACL.

[46]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[47]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[48]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[49]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[50]  Sophia Ananiadou,et al.  Filling the Gaps Between Tools and Users: A Tool Comparator, Using Protein-Protein Interactions as an Example , 2007, Pacific Symposium on Biocomputing.

[51]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.