Distributed information-theoretic biclustering

This paper investigates the problem of distributed biclustering of memoryless sources and extends previous work [1] to the general case with more than two sources. Given a set of distributed stationary memoryless sources, the encoders' goal is to find rate-limited representations of these sources such that the mutual information between two selected subsets of descriptions (each of them generated by distinct encoders) is maximized. This formulation is fundamentally different from conventional distributed source coding problems since here redundancy among descriptions should actually be maximally preserved. We derive non-trivial outer and inner bounds to the achievable region for this problem and further connect them to the CEO problem under logarithmic loss distortion. Since information-theoretic biclustering is closely related to distributed hypothesis testing against independence, our results are also expected to apply to that problem.

[1]  Te Han,et al.  Hypothesis testing with multiterminal data compression , 1987, IEEE Trans. Inf. Theory.

[2]  Martin Bossert,et al.  Canalizing Boolean Functions Maximize Mutual Information , 2012, IEEE Transactions on Information Theory.

[3]  Thomas M. Cover,et al.  Network Information Theory , 2001 .

[4]  Varun Jog,et al.  An information inequality for the BSSC broadcast channel , 2010, 2010 Information Theory and Applications Workshop (ITA).

[5]  Alon Orlitsky,et al.  Coding for computing , 2001, IEEE Trans. Inf. Theory.

[6]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[7]  Chunguang Li,et al.  Distributed Information Theoretic Clustering , 2014, IEEE Transactions on Signal Processing.

[8]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[9]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Tsachy Weissman,et al.  Multiterminal Source Coding Under Logarithmic Loss , 2011, IEEE Transactions on Information Theory.

[11]  Alexander Kraskov,et al.  MIC: Mutual Information Based Hierarchical Clustering , 2008, 0809.1605.

[12]  Udi Ben Porat,et al.  Analysis of Biological Networks : Network Modules – Clustering and Biclustering ∗ , 2006 .

[13]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Toby Berger,et al.  The CEO problem [multiterminal source coding] , 1996, IEEE Trans. Inf. Theory.

[15]  R. Schneider Convex Bodies: The Brunn–Minkowski Theory: Minkowski addition , 1993 .

[16]  Aaron B. Wagner,et al.  Distributed Rate-Distortion With Common Components , 2011, IEEE Transactions on Information Theory.

[17]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[19]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[20]  Rudolf Ahlswede,et al.  Source coding with side information and a converse for degraded broadcast channels , 1975, IEEE Trans. Inf. Theory.

[21]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[22]  R. Baker Kearfott,et al.  Introduction to Interval Analysis , 2009 .

[23]  Venkat Anantharam,et al.  Evaluation of Marton's Inner Bound for the General Broadcast Channel , 2009, IEEE Transactions on Information Theory.

[24]  Aaron D. Wyner,et al.  A theorem on the entropy of certain binary sequences and applications-II , 1973, IEEE Trans. Inf. Theory.

[25]  Hans S. Witsenhausen,et al.  A conditional entropy bound for a pair of discrete random variables , 1975, IEEE Trans. Inf. Theory.

[26]  Gerald Matz,et al.  A Tight Upper Bound on the Mutual Information of Two Boolean Functions , 2016, 2016 IEEE Information Theory Workshop (ITW).

[27]  Elza Erkip,et al.  The Efficiency of Investment Information , 1998, IEEE Trans. Inf. Theory.

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  Aaron D. Wyner,et al.  Coding Theorems for a Discrete Source With a Fidelity CriterionInstitute of Radio Engineers, International Convention Record, vol. 7, 1959. , 1993 .

[30]  Thomas A. Courtade,et al.  Which Boolean Functions Maximize Mutual Information on Noisy Inputs? , 2014, IEEE Transactions on Information Theory.

[31]  Shun-ichi Amari,et al.  Statistical Inference Under Multiterminal Data Compression , 1998, IEEE Trans. Inf. Theory.

[32]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[33]  D. Feng,et al.  Segmentation of dynamic PET images using cluster analysis , 2000, 2000 IEEE Nuclear Science Symposium. Conference Record (Cat. No.00CH37149).

[34]  Aaron D. Wyner,et al.  On source coding with side information at the decoder , 1975, IEEE Trans. Inf. Theory.

[35]  W. Rudin Principles of mathematical analysis , 1964 .

[36]  Naftali Tishby,et al.  An Information Theoretic Tradeoff between Complexity and Accuracy , 2003, COLT.

[37]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[38]  Sebastian Nowozin,et al.  Information Theoretic Clustering Using Minimum Spanning Trees , 2012, DAGM/OAGM Symposium.

[39]  János Körner,et al.  How to encode the modulo-two sum of binary sources (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[40]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[41]  Giuseppe Longo,et al.  The information theory approach to communications , 1977 .

[42]  Aaron D. Wyner,et al.  The rate-distortion function for source coding with side information at the decoder , 1976, IEEE Trans. Inf. Theory.

[43]  H. Witsenhausen ON SEQUENCES OF PAIRS OF DEPENDENT RANDOM VARIABLES , 1975 .

[44]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[45]  Fei Sha,et al.  Demystifying Information-Theoretic Clustering , 2013, ICML.

[46]  Vivek K. Goyal,et al.  Multiple description coding: compression meets the network , 2001, IEEE Signal Process. Mag..

[47]  Chandra Nair,et al.  Upper concave envelopes and auxiliary random variables , 2013 .

[48]  Zhen Zhang,et al.  On the CEO problem , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[49]  藤重 悟 Submodular functions and optimization , 1991 .

[50]  Rudolf Ahlswede,et al.  On the connection between the entropies of input and output distributions of discrete memoryless channels , 1977 .

[51]  Te Sun Han,et al.  A unified achievable rate region for a general class of multiterminal source coding systems , 1980, IEEE Trans. Inf. Theory.

[52]  Hans S. Witsenhausen,et al.  Entropy inequalities for discrete channels , 1974, IEEE Trans. Inf. Theory.

[53]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Aaron D. Wyner,et al.  A theorem on the entropy of certain binary sequences and applications-I , 1973, IEEE Trans. Inf. Theory.

[55]  Abbas El Gamal,et al.  Achievable rates for multiple descriptions , 1982, IEEE Trans. Inf. Theory.

[56]  Rudolf Ahlswede,et al.  Hypothesis testing with communication constraints , 1986, IEEE Trans. Inf. Theory.

[57]  Gerald Matz,et al.  Distributed information-theoretic biclustering of two memoryless sources , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[58]  Joseph A. O'Sullivan,et al.  Achievable Rates for Pattern Recognition , 2005, IEEE Transactions on Information Theory.

[59]  Thomas A. Courtade,et al.  Which Boolean functions are most informative? , 2013, 2013 IEEE International Symposium on Information Theory.

[60]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.