Context Analysis for Computer-Assisted Near-Synonym Learning

Despite their similar meanings, near-synonyms may have different usages in different contexts. For second-language learners, such differences are not easily grasped in practical use. This chapter introduces several context analysis techniques such as pointwise mutual information (PMI), n-gram language model, latent semantic analysis (LSA), and independent component analysis (ICA) to verify whether near-synonyms do match the given contexts. Applications can benefit from such techniques to provide useful contextual information for learners, making it easier for them to understand different usages of various near-synonyms. Based on these context analysis techniques, we build a prototype computer-assisted near-synonym learning system. In experiments, we evaluate the context analysis methods on both Chinese and English sentences, and compared its performance to several previously proposed supervised and unsupervised methods. Experimental results show that training on the independent components that contain useful contextual features with minimized term dependence can improve the classifiers’ ability to discriminate among near-synonyms, thus yielding better performance.

[1]  Rada Mihalcea,et al.  Using WordNet and Lexical Operators to Improve Internet Searches , 2000, IEEE Internet Comput..

[2]  Soo Ngee Koh,et al.  Developing a Computer-facilitated Tool for Acquiring Near-synonyms in Chinese and English (short paper) , 2009, IWCS.

[3]  Reinhard Rapp,et al.  Mining Text for Word Senses Using Independent Component Analysis , 2004, SDM.

[4]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[5]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[6]  David J. Weir,et al.  Characterising Measures of Lexical Distributional Similarity , 2004, COLING.

[7]  D. C. Howell Statistical Methods for Psychology , 1987 .

[8]  L. K. Hansen,et al.  Independent Components in Text , 2000 .

[9]  Chung-Hsien Wu,et al.  Annotation and verification of sense pools in OntoNotes , 2010, Inf. Process. Manag..

[10]  Philip Edmonds,et al.  Choosing the Word Most Typical in Context Using a Lexical Co-occurrence Network , 1997, ACL.

[11]  Chih-Ping Wei,et al.  A Latent Semantic Indexing-based approach to multilingual document clustering , 2008, Decis. Support Syst..

[12]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[13]  Timothy Cribbin,et al.  Discovering latent topical structure by second-order similarity analysis , 2011, J. Assoc. Inf. Sci. Technol..

[14]  Diana Inkpen A statistical model for near-synonym choice , 2007, TSLP.

[15]  Antonietta Alonge,et al.  The Top-Down Strategy for Building EuroWordNet: Vocabulary Coverage , 1998 .

[16]  Tong Wang,et al.  Near-synonym Lexical Choice in Latent Semantic Space , 2010, COLING.

[17]  E. Oja,et al.  Independent Component Analysis , 2013 .

[18]  Chung-Hsien Wu,et al.  Psychiatric document retrieval using a discourse-aware model , 2009, Artif. Intell..

[19]  Joan Claudi Socoró,et al.  Reliability in ICA-Based Text Classification , 2004, ICA.

[20]  George A. Miller,et al.  Squibs and Discussions: WordNet Nouns: Classes and Instances , 2006, CL.

[21]  Chung-Hsien Wu,et al.  Sentence Correction Incorporating Relative Position and Parse Template Language Models , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Lung-Hao Lee,et al.  Near-synonym substitution using a discriminative vector space model , 2016, Knowl. Based Syst..

[23]  Mark Dras,et al.  Exploring Approaches to Discriminating among Near-Synonyms , 2007, ALTA.

[24]  Ali Shiri,et al.  Query expansion behavior within a thesaurus-enhanced search environment: A user-centered evaluation , 2006 .

[25]  Chin-Hwa Kuo,et al.  Bootstrapping in a language learning environment , 2003, J. Comput. Assist. Learn..

[26]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[27]  Roberto Navigli,et al.  An analysis of ontology-based query expansion strategies , 2003 .

[28]  Liang-Chih Yu,et al.  Independent component analysis for near-synonym choice , 2013, Decis. Support Syst..

[29]  Jia-Fei Hong,et al.  中文词汇网络:跨语言知识处理基础架构的设计理念与实践 = Chinese wordnet : design, implementation, and application of an infrastructure for cross-lingual knowledge processing , 2010 .

[30]  Ali Shiri,et al.  Query expansion behavior within a thesaurus-enhanced search environment: A user-centered evaluation , 2006, J. Assoc. Inf. Sci. Technol..

[31]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[32]  J. Leon Zhao,et al.  Automatic discovery of similarity relationships through Web mining , 2003, Decis. Support Syst..

[33]  Graeme Hirst,et al.  Building and Using a Lexical Knowledge Base of Near-Synonym Differences , 2006, Computational Linguistics.

[34]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[35]  Hans Van Halteren,et al.  Author verification by linguistic profiling: An exploration of the parameter space , 2007, TSLP.

[36]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[37]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[38]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .