Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

Abstract This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or ‘photo-z’ problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.

[1]  F. Murtagh Hierarchical trees in N-body simulations: Relations with cluster analysis methods , 1988 .

[2]  A. Fontana,et al.  Photometric redshifts with the Multilayer Perceptron Neural Network: Application to the HDF-S and SDSS , 2003, astro-ph/0312064.

[3]  Michigan.,et al.  Estimating photometric redshifts with artificial neural networks , 2002, astro-ph/0203250.

[4]  R. Nichol,et al.  The Application of Photometric Redshifts to the SDSS Early Data Release , 2002, astro-ph/0211080.

[5]  Adrian E. Raftery,et al.  Bayesian inference for multiband image segmentation via model-based cluster trees , 2005, Image Vis. Comput..

[6]  B. Dragovich p‐Adic and Adelic Cosmology: p‐Adic Origin of Dark Energy and Dark Matter , 2006, hep-th/0602044.

[7]  G. Longo,et al.  The use of neural networks to probe the structure of the nearby universe , 2007, astro-ph/0701137.

[8]  M. Raddick,et al.  The Fifth Data Release of the Sloan Digital Sky Survey , 2007, 0707.3380.

[9]  D. Raffaele,et al.  Mining the SDSS archive. I. Photometric redshifts in the nearby universe , 2007, astro-ph/0703108.

[10]  G. Longo,et al.  Mining the SDSS Archive. I. Photometric Redshifts in the Nearby Universe , 2007 .

[11]  Fionn Murtagh,et al.  Ultrametric Wavelet Regression of Multivariate Time Series: Application to Colombian Conflict Analysis , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[12]  J. Brinkmann,et al.  EVOLUTION OF THE VELOCITY-DISPERSION FUNCTION OF LUMINOUS RED GALAXIES: A HIERARCHICAL BAYESIAN MEASUREMENT , 2011, 1109.6678.

[13]  M. A. Strauss,et al.  SPECTRAL CLASSIFICATION AND REDSHIFT MEASUREMENT FOR THE SDSS-III BARYON OSCILLATION SPECTROSCOPIC SURVEY , 2012, 1207.7326.

[14]  Andrei Khrennikov,et al.  Modeling Fluid's Dynamics with Master Equations in Ultrametric Spaces Representing the Treelike Structure of Capillary Networks , 2016, Entropy.

[15]  Fionn Murtagh,et al.  Sparse p-adic data coding for computationally efficient and effective big data analytics , 2016, P-Adic Numbers, Ultrametric Analysis, and Applications.

[16]  Fionn Murtagh Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics , 2017 .