Information Geometry and Statistical Pattern Recognition

This paper discusses a geometry associated with U -divergence including ideas of U -models, U -loss functions of two versions. On the basis of the geometry we observe that U -divergence projection of a data distribution p onto U -model MU associates with the Pythagorean relation for the triangle connection of p q and q∗, for any q of the U -model where q∗ denotes the point of MU projected from p. This geometric consideration is implemented on the problem of statistical pattern recognition. U -Boost algorithm proposed in the practical application is shown to pursue iteratively the U -divergence projection onto U -model evolving by one dimension according to one iteration. In particular U -Boost algorithm released from the probability constraint reveals a novel property of statistical property beyond the notion of Fisher consistency, which helps us to understand the statistical meaning of AdaBoost.

[1]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[2]  S. Eguchi Second Order Efficiency of Minimum Contrast Estimators in a Curved Exponential Family , 1983 .

[3]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[4]  J. Copas Binary Regression Models for Contaminated Data , 1988 .

[5]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[6]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[7]  S. Eguchi Geometry of minimum contrast , 1992 .

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  Giovanni Pistone,et al.  An Infinite-Dimensional Geometric Structure on the Space of all the Probability Measures Equivalent to a Given One , 1995 .

[10]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[13]  Shinto Eguchi,et al.  The Influence Function of Principal Component Analysis by Self-Organizing Rule , 1998, Neural Computation.

[14]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[15]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  David W. Scott,et al.  Parametric Statistical Modeling by Minimum Integrated Square Error , 2001, Technometrics.

[18]  Shinto Eguchi,et al.  Recent developments in discriminant analysis from an information geometric point of view , 2001 .

[19]  Shinto Eguchi,et al.  A Class of Robust Principal Component Vectors , 2001 .

[20]  John D. Lafferty,et al.  Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[21]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[22]  J. Copas,et al.  A class of logistic‐type discriminant functions , 2002 .

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[25]  Satoshi Miyata,et al.  Genotyping of single nucleotide polymorphism using model-based clustering , 2004, Bioinform..

[26]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[27]  Shinto Eguchi,et al.  Robustifying AdaBoost by Adding the Naive Error Rate , 2004, Neural Computation.

[28]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.