On distance measures, surrogate loss functions, and distributed detection

In this paper, we show the correspondence between distance measures and surrogate loss functions in the context of decentralized binary hypothesis testing. This correspondence helps explicate the use of various distance measures in signal processing and quantization theory, as well as explain the behavior of surrogate loss functions often used in machine learning and statistics. We then develop a notion of equivalence among distance measures, and among loss functions. Finally, we investigate the statistical behavior of a nonparametric decentralized hypothesis testing algorithm by minimizing convex surrogate loss functions that are equivalent to the 0-1 loss.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  D. Blackwell Comparison of Experiments , 1951 .

[3]  D. Blackwell Equivalent Comparisons of Experiments , 1953 .

[4]  R. N. Bradt On the Design and Comparison of Certain Dichotomous Experiments , 1954 .

[5]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[6]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[7]  Thomas L. Marzetta,et al.  Detection, Estimation, and Modulation Theory , 1976 .

[8]  H. V. Poor,et al.  Applications of Ali-Silvey Distance Measures in the Design of Generalized Quantizers for Binary Decision Systems , 1977, IEEE Trans. Commun..

[9]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[10]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[11]  Maurizio Longo,et al.  Quantization for decentralized hypothesis testing under communication constraints , 1990, IEEE Trans. Inf. Theory.

[12]  J. Tsitsiklis Decentralized Detection' , 1993 .

[13]  John N. Tsitsiklis,et al.  Extremal properties of likelihood-ratio quantizers , 1993, IEEE Trans. Commun..

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[15]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[16]  Rick S. Blum,et al.  Distributed detection with multiple sensors I. Advanced topics , 1997, Proc. IEEE.

[17]  L. Breiman Arcing Classifiers , 1998 .

[18]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[19]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[20]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[21]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[22]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[23]  Venugopal V. Veeravalli,et al.  Decentralized detection in sensor networks , 2003, IEEE Trans. Signal Process..

[24]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[25]  Michael I. Jordan,et al.  Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[26]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[27]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .