A STUDY ON NLP APPLICATIONS AND AMBIGUITY PROBLEMS SHAIDAH JUSOH

Natural language processing (NLP) has been considered as one of the important area in Artificial Intelligence. However, the progress made in natural language processing is quite slow, compared to other areas. The aim of this study is to conduct a systematic literature review for identifying the most prominent applications, techniques and challenging issues in NLP applications. To conduct this review, I had screened 587 retrieved papers from major databases such as SCOPUS and IEEE Explore, and also from Google search engine. In searching relevant papers search keywords such as "natural language processing, NLP applications, and complexity of NLP applications" had been used. However, to focus to the scope of the study 503 papers were excluded. Only the most prominent NLP applications namely information extraction, question answering system and automated text summarization were chosen to be reviewed. It is obvious that the challenging issue in NLP is the complexity of the natural language itself, which is the ambiguity problems that occur in various level of the language. This paper also aims at addressing ambiguity problems which occur at lexical and structural levels and significance techniques or approaches for solving the problems. Finally, the paper briefly discuss the future of NLP.

[1]  Bert F. Green,et al.  Baseball: an automatic question-answerer , 1899, IRE-AIEE-ACM '61 (Western).

[2]  Yorick Wilks,et al.  A Preferential, Pattern-Seeking, Semantics for Natural Language Inference , 1975, Artif. Intell..

[3]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[4]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[5]  A. Nadas,et al.  An iterative 'flip-flop' approximation of the most informative split in the construction of decision trees , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[7]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[8]  F. A. Mohammed,et al.  A knowledge based Arabic question answering system (AQAS) , 1993, SGAR.

[9]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[10]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[11]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[12]  Haodong Wu,et al.  Prepositional Phrase Attachment Through A Hybrid Disambiguation Model , 1996, COLING.

[13]  Attaching Multiple Prepositional Phrases: Generalized Backed-off Estimation , 1997, ArXiv.

[14]  Walter Daelemans,et al.  Resolving PP attachment Ambiguities with Memory-Based Learning , 1997, CoNLL.

[15]  David Yarowsky,et al.  Homograph Disambiguation in Text-to-Speech Synthesis , 1997 .

[16]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[17]  Park,et al.  Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. , 1998, Genome informatics. Workshop on Genome Informatics.

[18]  Agustí Lloberas,et al.  A Connectionist Approach to Propositional Phrase Attachment for Real World Texts , 1998, COLING-ACL.

[19]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[20]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.

[21]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[22]  PP-Attachment: A Committee Machine Approach , 1999, EMNLP.

[23]  Sven Hartrumpf,et al.  Hybrid Disambiguation of Prepositional Phrase Attachment and Interpretation , 1999, EMNLP.

[24]  P Zweigenbaum,et al.  Identifying proper names in parallel medical terminologies. , 2000, Studies in health technology and informatics.

[25]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[26]  Eneko Agirre,et al.  Exploring Automatic Word Sense Disambiguation with Decision Lists and the Web , 2000, SAIC@COLING.

[27]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[28]  Diana McCarthy,et al.  Disambiguating Noun and Verb Senses Using Automatically Acquired Selectional Preferences , 2001, *SEMEVAL.

[29]  Rada Mihalcea,et al.  A Highly Accurate Bootstrapping Algorithm for Word Sense Disambiguation , 2001, Int. J. Artif. Intell. Tools.

[30]  Seth Lindstromberg Preposition entries in UK monolingual learners' dictionaries: problems and possible solutions , 2001 .

[31]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[32]  Hani Abu-Salem,et al.  QARAB: A: Question Answering System to Support the Arabic Language , 2002, SEMITIC@ACL.

[33]  Chew Lim Tan,et al.  Word and Sentence Extraction Using Irregular Pyramid , 2002, Document Analysis Systems.

[34]  Hongfang Liu,et al.  Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS , 2002, J. Am. Medical Informatics Assoc..

[35]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[36]  Chris D. Reviewer-Paice Review of "Automatic summarization" by Inderjeet Mani, Amsterdam: John Benjamins (Natural language processing series, edited by Ruslan Mitkov, volume 3), 2001 , 2002 .

[37]  Clare-Marie Karat,et al.  Conversational interface technologies , 2002 .

[38]  James Allan,et al.  Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.

[39]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[40]  Premkumar Natarajan,et al.  Surprise! What's in a Cebuano or Hindi Name? , 2003, TALIP.

[41]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[42]  Jakub Piskorski,et al.  Named-Entity Recognition for Polish with SProUT , 2004, IMTCI.

[43]  Suresh Manandhar,et al.  An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery , 2004 .

[44]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[45]  Eckhard Bick A Named Entity Recognizer for Danish , 2004, LREC.

[46]  Yukiko Sasaki Alam,et al.  Decision Trees for Sense Disambiguation of Prepositions: Case of Over , 2004, HLT-NAACL 2004.

[47]  Dan Klein,et al.  Unsupervised Learning of Field Segmentation Models for Information Extraction , 2005, ACL.

[48]  Jong-Hyeok Lee,et al.  Practical Word-Sense Disambiguation Using Co-occurring Concept Codes , 2005, Machine Translation.

[49]  Paola Velardi,et al.  Structural semantic interconnections: a knowledge-based approach to word sense disambiguation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Andrew McCallum,et al.  Information Extraction , 2005, ACM Queue.

[51]  Gary Geunbae Lee,et al.  Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping , 2005, IJCNLP.

[52]  Pushpak Bhattacharyya,et al.  Prepositional Phrase Attachment through Semantic Association using Connectionist Approach , 2006 .

[53]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[54]  Boris Katz,et al.  Natural Language Annotations for Question Answering , 2006, FLAIRS Conference.

[55]  Chutima Boonthum-Denecke,et al.  Preposition Senses: Generalized Disambiguation Model , 2006, CICLing.

[56]  Samuel W. K. Chan Beyond keyword and cue-phrase matching: A sentence-based abstraction technique for information extraction , 2006, Decis. Support Syst..

[57]  Yuan Zhao,et al.  Detection of word fragments in Mandarin telephone conversation , 2006, INTERSPEECH.

[58]  Hongfang Liu,et al.  Research Paper: Quantitative Assessment of Dictionary-based Protein Named Entity Tagging , 2006, J. Am. Medical Informatics Assoc..

[59]  A. Waibel,et al.  Multilingual named entity extraction and translation from text and speech , 2006 .

[60]  Saleem Abuleil Hybrid system for extracting and classifying Arabic proper names , 2006 .

[61]  David Nadeau,et al.  Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision , 2007 .

[62]  Na-Rae Han,et al.  Detection of Grammatical Errors Involving Prepositions , 2007, ACL 2007.

[63]  Eneko Agirre,et al.  Knowledge Sources for WSD , 2007 .

[64]  Josef Steinberger,et al.  Automatic Text Summarization (The state of the art 2007 and new challenges) , 2008 .

[65]  Fernando Ferri,et al.  Ambiguity detection in multimodal systems , 2008, AVI '08.

[66]  Bassam H. Hammo,et al.  Evaluation of Query-Based Arabic Text Summarization System , 2008, 2008 International Conference on Natural Language Processing and Knowledge Engineering.

[67]  Timothy Baldwin,et al.  Prepositions in Applications: A Survey and Introduction to the Special Issue , 2009, CL.

[68]  Ghassan Kanaan,et al.  A New Question Answering System for the Arabic Language , 2009 .

[69]  Christian R. Huyck,et al.  Prepositional phrase attachment ambiguity resolution using semantic hierarchies , 2009 .

[70]  Christopher D. Manning,et al.  Topic Modeling for the Social Sciences , 2009 .

[71]  Udo Kruschwitz,et al.  Experimenting with Automatic Text Summarisation for Arabic , 2009, LTC.

[72]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[73]  Daniel Jurafsky,et al.  Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation , 2009, NAACL.

[74]  Joel Waldfogel,et al.  Introduction , 2010, Inf. Econ. Policy.

[75]  Mohammad Hajjar,et al.  A System for Evaluation of Arabic Root Extraction Methods , 2010, 2010 Fifth International Conference on Internet and Web Applications and Services.

[76]  Shaidah Jusoh,et al.  Automated Text Summarization: Sentence Refinement Approach , 2011, ICDIPC.

[77]  Hejab M. Alfawareh,et al.  Resolving Ambiguous Entity through Context Knowledge and Fuzzy Approach , 2011 .

[78]  M. Hemalatha,et al.  Automatic Text categorization and summarization using rule reduction , 2012, IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM -2012).

[79]  Wajdi Zaghouani,et al.  RENAR: A Rule-Based Arabic Named Entity Recognition System , 2012, TALIP.

[80]  Norita Md Norwawi,et al.  Lexical Disambiguation in Natural Language Questions (NLQs) , 2017, ArXiv.