A Statistical Pipeline for Identifying Physical Features that Differentiate Classes of 3D Shapes

It has been a longstanding challenge in geometric morphometrics and medical imaging to infer the physical locations (or regions) of 3D shapes that are most associated with a given response variable (e.g.~class labels) without needing common predefined landmarks across the shapes, computing correspondence maps between the shapes, or requiring the shapes to be diffeomorphic to each other. In this paper, we introduce SINATRA: the first statistical pipeline for sub-image analysis which identifies physical shape features that explain most of the variation between two classes without the aforementioned requirements. We also illustrate how the problem of 3D sub-image analysis can be mapped onto the well-studied problem of variable selection in nonlinear regression models. Here, the key insight is that tools from integral geometry and differential topology, specifically the Euler characteristic, can be used to transform a 3D mesh representation of an image or shape into a collection of vectors with minimal loss of geometric information. Crucially, this transform is invertible. The two central statistical, computational, and mathematical innovations of our method are: (1) how to perform robust variable selection in the transformed space of vectors, and (2) how to pullback the most informative features in the transformed space to physical locations or regions on the original shapes. We highlight the utility, power, and properties of our method through detailed simulation studies, which themselves are a novel contribution to 3D image analysis. Finally, we apply SINATRA to a dataset of mandibular molars from four different genera of primates and demonstrate the ability to identify unique morphological properties that summarize phylogeny.

[1]  D. Guatelli‐Steinberg Primate Dentition: An Introduction To The Teeth Of Non‐Human Primates , 2003 .

[2]  Sayan Mukherjee,et al.  Characterizing the Function Space for Bayesian Kernel Models , 2007, J. Mach. Learn. Res..

[3]  Brittany Terese Fasy,et al.  Challenges in Reconstructing Shapes from Euler Characteristic Curves , 2018, ArXiv.

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  J. Marron,et al.  Persistent Homology Analysis of Brain Artery Trees. , 2014, The annals of applied statistics.

[6]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[7]  Sayan Mukherjee,et al.  Bayesian Approximate Kernel Regression With Variable Selection , 2015, Journal of the American Statistical Association.

[8]  Sayan Mukherjee,et al.  Functional Data Analysis using a Topological Summary Statistic: the Smooth Euler Characteristic Transform , 2016 .

[9]  Rachel Levanger,et al.  Persistent homology and Euler integral transforms , 2018, J. Appl. Comput. Topol..

[10]  Sayan Mukherjee,et al.  Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits , 2016, bioRxiv.

[11]  Erli Pang,et al.  Standing genetic variation as the predominant source for adaptation of a songbird , 2019, Proceedings of the National Academy of Sciences.

[12]  A. Grafen The phylogenetic regression. , 1989, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[13]  Radford M. Neal Regression and Classification Using Gaussian Process Priors , 2009 .

[14]  Daniel E. Runcie,et al.  Predictor Variable Prioritization in Nonlinear Models: A Genetic Association Case Study , 2018 .

[15]  Shireen Y. Elhabian,et al.  ShapeWorks: Particle-Based Shape Correspondence and Visualization Software , 2017 .

[16]  Jason W. Locasale,et al.  Melanoma Therapeutic Strategies that Select against Resistance by Exploiting MYC-Driven Evolutionary Convergence. , 2017, Cell reports.

[17]  Joaquin F. Rodriguez-Nieva,et al.  Identifying topological order through unsupervised machine learning , 2018, Nature Physics.

[18]  Thomas A. Funkhouser,et al.  Algorithms to automatically quantify the geometric similarity of anatomical surfaces , 2011, Proceedings of the National Academy of Sciences.

[19]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[20]  Maks Ovsjanikov,et al.  Functional maps , 2012, ACM Trans. Graph..

[21]  Sayan Mukherjee,et al.  How Many Directions Determine a Shape and other Sufficiency Results for Two Topological Transforms , 2018, Transactions of the American Mathematical Society, Series B.

[22]  James A. Coan,et al.  Spatial Bayesian variable selection and grouping for high-dimensional scalar-on-image regression , 2015, 1509.04069.

[23]  S. Schlager,et al.  Retrodeformation of fossil specimens based on 3D bilateral semi-landmarks: Implementation in the R package “Morpho” , 2018, PloS one.

[24]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[25]  Ezra Miller,et al.  Fruit flies and moduli: interactions between biology and mathematics , 2015, 1508.05381.

[26]  William F. Broderick,et al.  Bayesian nonparametric models characterize instantaneous strategies in a competitive dynamic game , 2018, Nature Communications.

[27]  D. Heckerman,et al.  Linear mixed model for heritability estimation that explicitly addresses environmental variation , 2016, Proceedings of the National Academy of Sciences.

[28]  T. Mitchell-Olds,et al.  Evolutionary genetics of plant adaptation. , 2011, Trends in genetics : TIG.

[29]  Deovrat Kakde,et al.  The Mean and Median Criteria for Kernel Bandwidth Selection for Support Vector Data Description , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[30]  M. J. Bayarri,et al.  Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .

[31]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[32]  Zhihua Zhang,et al.  Bayesian Generalized Kernel Mixed Models , 2011, J. Mach. Learn. Res..

[33]  Polina Golland,et al.  Fast Geodesic Regression for Population-Based Image Analysis , 2017, MICCAI.

[34]  Andrew S. Burrell,et al.  Primate phylogenetic relationships and divergence dates inferred from complete mitochondrial genomes. , 2014, Molecular phylogenetics and evolution.

[35]  P. Gienapp,et al.  Climate change and evolution: disentangling environmental and genetic responses , 2008, Molecular ecology.

[36]  Aki Vehtari,et al.  An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data , 2019, Nature Communications.

[37]  Doug M. Boyer,et al.  Gaussian Process Landmarking for Three-Dimensional Geometric Morphometrics , 2018, SIAM J. Math. Data Sci..

[38]  Jochen C Reif,et al.  Modeling Epistasis in Genomic Selection , 2015, Genetics.

[39]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[40]  D. Neale,et al.  Disentangling the Roles of History and Local Selection in Shaping Clinal Variation of Allele Frequencies and Gene Expression in Norway Spruce (Picea abies) , 2012, Genetics.

[41]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[42]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[43]  D. Boyer,et al.  Lower molar shape and size in prosimian and platyrrhine primates. , 2016, American journal of physical anthropology.

[44]  S. Mukherjee,et al.  Persistent Homology Transform for Modeling Shapes and Surfaces , 2013, 1310.1030.

[45]  Mike West,et al.  VARIABLE PRIORITIZATION IN NONLINEAR BLACK BOX METHODS: A GENETIC ASSOCIATION CASE STUDY1. , 2018, The annals of applied statistics.

[46]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[47]  Yi Yang,et al.  A fast unified algorithm for solving group-lasso penalize learning problems , 2014, Statistics and Computing.

[48]  Steve Oudot,et al.  Inverse Problems in Topological Persistence , 2018, Topological Data Analysis.

[49]  Tingran Gao,et al.  The diffusion geometry of fibre bundles: Horizontal diffusion maps , 2016, Applied and Computational Harmonic Analysis.

[50]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[51]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Matthew Stephens,et al.  Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes , 2018, Nature Communications.

[53]  Stefan Schlager,et al.  Morpho and Rvcg – Shape Analysis in R: R-Packages for Geometric Morphometrics, Shape Analysis and Surface Manipulations , 2017 .

[54]  Doug M. Boyer,et al.  MORPHOSOURCE: ARCHIVING AND SHARING 3-D DIGITAL SPECIMEN DATA , 2016 .

[55]  Ingrid Daubechies,et al.  A New Fully Automated Approach for Aligning and Comparing Shapes , 2015, Anatomical record.

[56]  Teuta Pilizota,et al.  Inferring time derivatives including cell growth rates using Gaussian processes , 2016, Nature Communications.

[57]  Tingran Gao,et al.  Hypoelliptic Diffusion Maps and Their Applications in Automated Geometric Morphometrics , 2015 .

[58]  Ron Kimmel,et al.  Computational caricaturization of surfaces , 2015, Comput. Vis. Image Underst..

[59]  K. Worsley Estimating the number of peaks in a random field using the Hadwiger characteristic of excursion sets, with applications to medical images , 1995 .

[60]  Leonidas J. Guibas,et al.  Limit Shapes – A Tool for Understanding Shape Differences and Variability in 3D Model Collections , 2019, Comput. Graph. Forum.

[61]  Doug M. Boyer,et al.  Gaussian Process Landmarking on Manifolds , 2018, SIAM J. Math. Data Sci..

[62]  Paul Dupuis,et al.  Variational problems on ows of di eomorphisms for image matching , 1998 .

[63]  D. Kendall A Survey of the Statistical Theory of Shape , 1989 .