A General Framework of Generating Estimation Functions for Computing the Mutual Information of Terms

Computing statistical dependence of terms in textual documents is a widely studied subject and a core problem in many areas of science. This study focuses on such a problem and explores the techniques of estimation using the expected mutual information measure. A general framework is established for tackling a variety of estimations: (i) general forms of estimation functions are introduced; (ii) a set of constraints for the estimation functions is discussed; (iii) general forms of probability distributions are defined; (iv) general forms of the measures for calculating mutual information of terms (MIT) are formalised; (v) properties of the MIT measures are studied and, (vi) relations between the MIT measures are revealed. Four estimation methods, as examples, are proposed and mathematical meanings of the individual methods are respectively interpreted. The methods may be directly applied to practical problems for computing dependence values of individual term pairs. Due to its generality, our method is applicable to various areas, involving statistical semantic analysis of textual data.

[1]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[2]  Key-Sun Choi,et al.  A Comparison of Collocation-Based Similarity Measures in Query Expansion , 1999, Inf. Process. Manag..

[3]  T. McCluskey,et al.  A Simple Method for Estimating Term Mutual Information , 2012 .

[4]  Ioannis Pitas,et al.  A Mutual Information based Face Clustering Algorithm for Movies , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[5]  Guy Marchal,et al.  Multimodality image registration by maximization of mutual information , 1997, IEEE Transactions on Medical Imaging.

[6]  Jianying Wang,et al.  A corpus analysis approach for automatic query expansion and its extension to multiple databases , 1999, TOIS.

[7]  Di Cai,et al.  An Information-Theoretic Foundation for the Measurement of Discrimination Information , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  C. J. van Rijsbergen,et al.  Learning semantic relatedness from term discrimination information , 2009, Expert Syst. Appl..

[9]  Di Cai,et al.  Determining semantic relatedness through the measurement of discrimination information using Jensen difference , 2009, Int. J. Intell. Syst..

[10]  Takenobu Tokunaga,et al.  Query expansion using heterogeneous thesauri , 2000, Inf. Process. Manag..

[11]  ChengXiang Zhai,et al.  Semantic term matching in axiomatic approaches to information retrieval , 2006, SIGIR.

[12]  R. Sibson Information radius , 1969 .

[13]  Driss Aboutajdine,et al.  A Powerful Feature Selection approach based on Mutual Information , 2008 .

[14]  Di Cai Determining semantic relatedness through the measurement of discrimination information using Jensen difference , 2009 .

[15]  G. Crooks On Measures of Entropy and Information , 2015 .

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Robert M. Losee Term dependence: A basis for Luhn and Zipf models , 2001, J. Assoc. Inf. Sci. Technol..

[18]  C. R. Rao,et al.  Diversity: its measurement, decomposition, apportionment and analysis , 1982 .

[19]  Gang Wang,et al.  Feature selection with conditional mutual information maximin in text categorization , 2004, CIKM '04.

[20]  Driss Aboutajdine,et al.  Textural feature selection by joint mutual information based on Gaussian mixture model for multispectral image classification , 2010, Pattern Recognit. Lett..

[21]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[22]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[23]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .