Information Distance in Multiples

Information distance is a parameter-free similarity measure based on compression, used in pattern recognition, data mining, phylogeny, clustering and classification. The notion of information distance is extended from pairs to multiples (finite lists). We study maximal overlap, metricity, universality, minimal overlap, additivity and normalized information distance in multiples. We use the theoretical notion of Kolmogorov complexity which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program.

[1]  gérard,et al.  Formation à distance , 2008 .

[2]  L. Hood,et al.  Gene expression dynamics in the macrophage exhibit criticality , 2008, Proceedings of the National Academy of Sciences.

[3]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[4]  Nikolai K. Vereshchagin,et al.  Independent minimum length programs to translate between given strings , 2000, Proceedings 15th Annual IEEE Conference on Computational Complexity.

[5]  Stephanie Wehner,et al.  Analyzing worms and network traffic using compression , 2005, J. Comput. Secur..

[6]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[7]  Samantha Jenkins,et al.  Information theory-based software metrics and obfuscation , 2004, J. Syst. Softw..

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[10]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[12]  Cungen Cao,et al.  A Google-Based Statistical Acquisition Model of Chinese Lexical Concepts , 2007, KSEM.

[13]  Li Wei,et al.  Compression-based data mining of sequential data , 2007, Data Mining and Knowledge Discovery.

[14]  Alexander Kraskov,et al.  Published under the scientific responsability of the EUROPEAN PHYSICAL SOCIETY Incorporating , 2002 .

[15]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[16]  Jan H. M. Korst,et al.  Web-Based Artist Categorization , 2006, ISMIR.

[17]  Michael V. Vyugin Systems of Strings with High Mutual Complexity , 2003, Probl. Inf. Transm..

[18]  Andrej Muchnik,et al.  Conditional complexity and codes , 2002, Theor. Comput. Sci..

[19]  Ronald de Wolf,et al.  Algorithmic Clustering of Music Based on String Compression , 2004, Computer Music Journal.

[20]  Nikolai K. Vereshchagin,et al.  Upper semilattice of binary strings with the relation "x is simple conditional to y" , 1999, Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).

[21]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[22]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[23]  Dieter Fensel,et al.  Unifying Reasoning and Search to Web Scale , 2007, IEEE Internet Computing.

[24]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[25]  András Kocsor,et al.  Sequence analysis Application of compression-based distance measures to protein sequence classification : a methodological study , 2005 .

[26]  Manuel Cebrián,et al.  The Normalized Compression Distance Is Resistant to Noise , 2007, IEEE Transactions on Information Theory.

[27]  Nikolai K. Vereshchagin,et al.  Logical operations and Kolmogorov complexity. II , 2001, Proceedings 16th Annual IEEE Conference on Computational Complexity.

[28]  Nikolai K. Vereshchagin,et al.  Logical operations and Kolmogorov complexity , 2002, Theor. Comput. Sci..

[29]  Xian Zhang,et al.  Information distance from a question to an answer , 2007, KDD '07.

[30]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[31]  Luis Filipe Coelho Antunes,et al.  Clustering Fetal Heart Rate Tracings by Compression , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[32]  Natalio Krasnogor,et al.  Measuring the similarity of protein structures by means of the universal similarity metric , 2004, Bioinform..

[33]  Bo Hu,et al.  Semantic metrics , 2007, Int. J. Metadata Semant. Ontologies.

[34]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[35]  Xin Chen,et al.  Shared information and program plagiarism detection , 2004, IEEE Transactions on Information Theory.

[36]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[37]  Cécile Ané,et al.  Missing the forest for the trees: phylogenetic compression and its implications for inferring complex evolutionary histories. , 2005, Systematic biology.

[38]  Nikolai K. Vereshchagin,et al.  Upper semi-lattice of binary strings with the relation "x is simple conditional to y" , 2002, Theor. Comput. Sci..

[39]  Ilya Shmulevich,et al.  Critical networks exhibit maximal information diversity in structure-dynamics relationships. , 2008, Physical review letters.

[40]  Bin Hu,et al.  On Capturing Semantics in Ontology Mapping , 2008, World Wide Web.

[41]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[42]  Frank van Harmelen,et al.  Using Google distance to weight approximate ontology matches , 2007, WWW '07.

[43]  Michael V. Vyugin Information distance and conditional complexities , 2002, Theor. Comput. Sci..

[44]  W. Marsden I and J , 2012 .

[45]  L. Levin,et al.  THE COMPLEXITY OF FINITE OBJECTS AND THE DEVELOPMENT OF THE CONCEPTS OF INFORMATION AND RANDOMNESS BY MEANS OF THE THEORY OF ALGORITHMS , 1970 .

[46]  D. Fesenmaier,et al.  Representation of the Online Tourism Domain in Search Engines , 2008 .

[47]  Mohammed Bennamoun,et al.  Featureless Data Clustering , 2009 .

[48]  Bin Ma,et al.  Information shared by many objects , 2008, CIKM '08.