Natural Language Processing: Overview

The advent of the World Wide Web has greatly increased demand for software tools and appliances for processing unstructured and semi-structured natural language text. Ancillary developments, such as corporate intranets, enterprise portals, and ubiquitous e-mail, have created many challenges and opportunities in application areas such as information retrieval, electronic commerce, and knowledge management. On the supply side, the development of language technology to address such attendant problems as information overload and rapid globalization has been facilitated by two technical breakthroughs. The first is conceptual, and represents a new emphasis upon empirical approaches to language processing that rely more heavily upon corpus statistics than linguistic theory. The second is computational, and consists of more powerful, networked machines that are capable of processing millions of documents and performing the billions of calculations that the statistical profiling of large corpora requires. This article outlines the new application areas and describes some of the advances that have been made. The emphasis is upon showing how the technical approaches outlined elsewhere in this encyclopedia can be combined to create products and services that have genuine value.

[1]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[2]  Douglas E. Appelt,et al.  FASTUS: A Finite-state Processor for Information Extraction from Real-world Text , 1993, IJCAI.

[3]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[4]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[5]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[6]  James Allan,et al.  Document classification using multiword features , 1998, CIKM '98.

[7]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[8]  Stephen Tomlinson Stemming Evaluated in 6 Languages by Hummingbird SearchServerTM at CLEF 2001 , 2001, CLEF.

[9]  Judith L. Klavans,et al.  Columbia Newsblaster: Multilingual News Summarization on the Web , 2004, NAACL.

[10]  Chris Buckley,et al.  Automatic Text Summarization by Paragraph Extraction , 1997 .

[11]  Mark T. Maybury,et al.  Multimedia summaries of broadcast news , 1997, Proceedings Intelligent Information Systems. IIS'97.

[12]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[13]  Khalid Al-Kofahi,et al.  Information extraction from case law and retrieval of prior cases , 2003, Artif. Intell..

[14]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[15]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[16]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[17]  Peter Jackson,et al.  Combining multiple classifiers for text categorization , 2001, CIKM '01.

[18]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[19]  Sanda M. Harabagiu,et al.  LCC Tools for Question Answering , 2002, TREC.

[20]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[21]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[22]  Mark Wasson Large-scale Controlled Vocabulary Indexing for Named Entities , 2000, ANLP.