Online Computation of Mutual Information and Word Context Entropy

Mutual information (MI) has been extensively used to measure the co-occurrence strength between two words in the field of natural language processing. Similarly, the word context entropy is also a useful measure to determine the distribution of words in contexts, and can be used to calculate word similarity. Calculating scores for both measures usually relies on a large text corpus to obtain a reliable estimation. However, calculation based on a static corpus may not reflect the dynamic nature of languages. In this paper, we consider the web documents as a text corpus, and develop an efficient online calculator for both mutual information and word context entropy. The major advantage of the online computation is that the web corpus not only is large enough to obtain a reliable estimation but also can reflect the dynamic nature of languages.

[1]  Sungjin Lee,et al.  Grammatical error simulation for computer-assisted language learning , 2011, Knowl. Based Syst..

[2]  Chung-Hsien Wu,et al.  Annotation and verification of sense pools in OntoNotes , 2010, Inf. Process. Manag..

[3]  Chung-Hsien Wu,et al.  Sentence Correction Incorporating Relative Position and Parse Template Language Models , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Michel Verleysen,et al.  Feature selection with missing data using mutual information estimators , 2012, Neurocomputing.

[5]  Chung-Hsien Wu,et al.  Psychiatric document retrieval using a discourse-aware model , 2009, Artif. Intell..

[6]  Chin-Chuan Cheng,et al.  Word-Focused Extensive Reading with Guidance , 2004 .

[7]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[8]  Lina Fatima Soualmia,et al.  Translating the foundational model of anatomy into french using knowledge-based and lexical methods , 2011, BMC Medical Informatics Decis. Mak..

[9]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[10]  Chung-Hsien Wu,et al.  HAL-Based Evolutionary Inference for Pattern Induction From Psychiatry Web Resources , 2008, IEEE Transactions on Evolutionary Computation.

[11]  Yan Chen,et al.  Knowledge Modeling and Semantic Retrieval of Product Data Based on Fuzzy Ontology and SPARQL , 2011 .

[12]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[13]  Dong Ming,et al.  Modeling a Configuration System of Product-Service System Based on Ontology Under Mass Customization , 2011 .

[14]  Chung-Hsien Wu,et al.  Using Semantic Dependencies to Mine Depressive Symptoms from Consultation Records , 2005, IEEE Intell. Syst..

[15]  Liang-Chih Yu,et al.  Mining association language patterns using a distributional semantic model for negative life event classification , 2011, J. Biomed. Informatics.

[16]  Chiquito J. Crasto,et al.  GenDrux: A biomedical literature search system to identify gene expression-based drug sensitivity in breast cancer , 2011, BMC Medical Informatics Decis. Mak..

[17]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[18]  Joseph Y. Lo,et al.  Mutual information-based template matching scheme for detection of breast masses: From mammography to digital breast tomosynthesis , 2011, J. Biomed. Informatics.

[19]  Chung-Hsien Wu,et al.  Psychiatric Consultation Record Retrieval Using Scenario-Based Representation and Multilevel Mixture Model , 2007, IEEE Transactions on Information Technology in Biomedicine.

[20]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.