Similarity metric induced metrics with application in machine learning and bioinformatics

Similarity metric and distance metric are widely used in many research areas and applications. In this paper, for a given similarity metric, we will introduce a family of distance metrics of Minkowski type. We will then show general solutions to construct normalized similarity metric and normalized distance metric from a similarity metric and a distance metric. Applying the general solutions to a given non-negative similarity metric and its induced family of distance metrics, we derive general normalized similarity metrics and normalized distance metrics. Finally we briefly discuss some of the applications of our general similarity and distance metric formulations.

[1]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Arno J. Knobbe,et al.  Analysing Binary Associations , 1996, KDD.

[3]  C. Rajski,et al.  A Metric Space of Discrete Probability Distributions , 1961, Inf. Control..

[4]  Yasuichi Horibe,et al.  Entropy and correlation , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[6]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[7]  B. John Oommen,et al.  The Normalized String Editing Problem Revisited , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[9]  Andrea Torsello,et al.  Polynomial-time metrics for attributed trees , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Bin Ma,et al.  On the similarity metric and the distance metric , 2009, Theor. Comput. Sci..

[11]  Francesco M. Malvestuto,et al.  Statistical treatment of the information content of a database , 1986, Inf. Syst..

[12]  Ömer Egecioglu,et al.  A new approach to sequence comparison: normalized sequence alignment , 2001, Bioinform..

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Z. Meral Özsoyoglu,et al.  Distance based indexing for string proximity search , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  E. H. Linfoot An Informational Measure of Correlation , 1957, Inf. Control..

[16]  Vladimir Pestov,et al.  Indexing schemes for similarity search in datasets of short protein fragments , 2007, Inf. Syst..

[17]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[18]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[19]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[20]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[21]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[22]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[23]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.