Object Oriented Data Analysis

Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. The notion of Object Oriented Data Analysis also impacts data analysis, through providing a language for discussion of the many choices needed in many modern complex data analyses. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.

[1]  Nicholas I. Fisher,et al.  Statistical Analysis of Spherical Data. , 1987 .

[2]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[3]  Hongtu Zhu,et al.  VARYING COEFFICIENT MODEL FOR MODELING DIFFUSION TENSORS ALONG WHITE MATTER TRACTS. , 2013, The annals of applied statistics.

[4]  M. Pierrynowski,et al.  Asymptotics for Object Descriptors , 2014, Biometrical journal. Biometrische Zeitschrift.

[5]  Regina Y. Liu,et al.  Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh) , 1999 .

[6]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[7]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[8]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[9]  J. Marron,et al.  A Nonparametric Regression Model With Tree-Structured Response , 2012 .

[10]  Anuj Srivastava,et al.  Statistical Modeling of Curves Using Shapes and Related Features , 2012 .

[11]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[12]  J. S. Marron,et al.  Wavelet estimation using Bayesian basis selection and basis averaging , 2000 .

[13]  Stephen M. Pizer,et al.  Deep Structure of Images in Populations Via Geometric Models in Populations , 2005, DSSCV.

[14]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[15]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[16]  J. S. Marron,et al.  Principal Nested Spheres for Time-Warped Functional Data Analysis , 2013, 1304.6789.

[17]  Henry W. Altland,et al.  Applied Functional Data Analysis , 2003, Technometrics.

[18]  Michael W. Berry,et al.  Mathematical Foundations Behind Latent Semantic Analysis , 2007 .

[19]  M. Wand,et al.  Semiparametric Regression: Parametric Regression , 2003 .

[20]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[21]  José M. R. Murteira,et al.  Regression Analysis of Multivariate Fractional Data , 2016 .

[22]  Martin T. Hagan,et al.  Neural network design , 1995 .

[23]  Leslie E. Papke,et al.  Econometric Methods for Fractional Response Variables with an Application to 401(K) Plan Participation Rates , 1993 .

[24]  Michael H. Neumann,et al.  Exact Risk Analysis of Wavelet Regression , 1998 .

[25]  J. Jordan,et al.  Association of Incident Symptomatic Hip Osteoarthritis With Differences in Hip Shape by Active Shape Modeling: The Johnston County Osteoarthritis Project , 2014, Arthritis care & research.

[26]  Leslie E. Papke,et al.  Panel data methods for fractional response variables with an application to test pass rates , 2008 .

[27]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[28]  J. S. Marron,et al.  Direction-Projection-Permutation for High-Dimensional Hypothesis Tests , 2013, 1304.0796.

[29]  Leif Ellingson,et al.  Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis , 2015 .

[30]  Joel S. Parker,et al.  Visualization of Cross‐Platform Microarray Normalization , 2009 .

[31]  Kaleem Siddiqi,et al.  Medial Representations: Mathematics, Algorithms and Applications , 2008 .

[32]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[33]  E. Diday New Kinds of Graphical Representation in Clustering , 1986 .

[34]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[35]  K. Mardia,et al.  Statistical Shape Analysis , 1998 .

[36]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[37]  Carey E. Priebe,et al.  The out-of-sample problem for classical multidimensional scaling , 2008, Comput. Stat. Data Anal..

[38]  J. S. Marron,et al.  Non-linear Hypothesis Testing of Geometric Object Properties of Shapes Applied to Hippocampi , 2015, Journal of Mathematical Imaging and Vision.

[39]  J. Marron,et al.  Registration of Functional Data Using Fisher-Rao Metric , 2011, 1103.3817.

[40]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[41]  Conglin Lu,et al.  Automatic male pelvis segmentation from CT images via statistically trained multi-object deformable m-rep models , 2004 .

[42]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[43]  Trevor Hastie,et al.  Polynomial splines and their tensor products in extended linear modeling. Discussion and rejoinder , 1997 .

[44]  Pantelis Z. Hadjipantelis,et al.  The statistical analysis of acoustic phonetic data: exploring differences between spoken Romance languages , 2015, Journal of the Royal Statistical Society: Series C (Applied Statistics).

[45]  Joshua V. Stough,et al.  Conditional-mean initialization using neighboring objects in deformable model segmentation , 2008, SPIE Medical Imaging.

[46]  Conglin Lu,et al.  Estimating the Statistics of Multi-object Anatomic Geometry Using Inter-object Relationships , 2005, DSSCV.

[47]  James Stephen Marron,et al.  Object-Oriented Data Analysis of Cell Images , 2014 .

[48]  J. Ramsay,et al.  Description and processing of functional data arising from juggling trajectories , 2014 .

[49]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[50]  Guido Gerig,et al.  Elastic model-based segmentation of 3-D neuroradiological data sets , 1999, IEEE Transactions on Medical Imaging.

[51]  James Stephen Marron Spectral View of Wavelets and Nonlinear Regression , 1999 .

[52]  K. Mardia Statistics of Directional Data , 1972 .

[53]  C. R. Rao,et al.  Some statistical methods for comparison of growth curves. , 1958 .

[54]  Simone Vantini,et al.  AneuRisk65: A dataset of three-dimensional cerebral vascular geometries , 2014 .

[55]  Steven J. Owen,et al.  A Survey of Unstructured Mesh Generation Technology , 1998, IMR.

[56]  S. Joshi,et al.  Automatic Segmentation of Intra-treatment CT Images for Adaptive Radiation Therapy of the Prostate , 2005, MICCAI.

[57]  Christopher R. Cabanski,et al.  SigFuge: single gene clustering of RNA-seq reveals differential isoform usage among cancer samples , 2014, Nucleic acids research.

[58]  Hamid Laga,et al.  Landmark‐Guided Elastic Shape Analysis of Spherically‐Parameterized Surfaces , 2013, Comput. Graph. Forum.

[59]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[60]  Wei Wu,et al.  Introduction to neural spike train data for phase-amplitude analysis , 2014 .

[61]  B. Marx The Visual Display of Quantitative Information , 1985 .

[62]  J. Marron,et al.  Analysis of juggling data: Object oriented data analysis of clustering in acceleration functions , 2014 .

[63]  Ipek Oguz,et al.  Tree-Oriented Analysis of Brain Artery Structure , 2013, Journal of Mathematical Imaging and Vision.

[64]  D. Ruppert What is Kurtosis? An Influence Function Approach , 1987 .

[65]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[66]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[67]  A. Welsh,et al.  Colours and Cocktails: Compositional Data Analysis 2013 Lancaster Lecture , 2014 .

[68]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[69]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[70]  Alan H. Welsh,et al.  Regression for compositional data by using distributions defined on the hypersphere , 2011 .

[71]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 2007 .

[72]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[73]  Dmitri A. Jdanov,et al.  Human Mortality Database , 2019, Encyclopedia of Gerontology and Population Aging.

[74]  J Steve Marron,et al.  Overview of object oriented data analysis , 2014, Biometrical journal. Biometrische Zeitschrift.

[75]  Robert L. Grossman,et al.  Graph-Theoretic Scagnostics , 2005, INFOVIS.

[76]  P. Thomas Fletcher,et al.  Statistics of shape via principal geodesic analysis on Lie groups , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[77]  Leland Wilkinson,et al.  Scagnostics Distributions , 2008 .

[78]  J. Kadane,et al.  Bayesian phylogenetic inference from animal mitochondrial genome arrangements , 2002 .

[79]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[80]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[81]  Yoshihiro Yamanishi,et al.  Glycan classification with tree kernels , 2007, Bioinform..

[82]  Robert G. Staudte,et al.  Wiley Series in Probability and Mathematical Statistics , 2011 .

[83]  J. S. Marron,et al.  Asymptotics of hierarchical clustering for growing dimension , 2014, J. Multivar. Anal..

[84]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[85]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[86]  J. S. Marron,et al.  Sticky central limit theorems on open books , 2012, 1202.4267.

[87]  J. Marron,et al.  Object oriented data analysis: Sets of trees , 2007, 0711.3147.

[88]  J. S. Marron,et al.  Principal arc analysis on direct product manifolds , 2011, 1104.3472.

[89]  J. Marron,et al.  SCALE SPACE VIEW OF CURVE ESTIMATION , 2000 .

[90]  J. Marron,et al.  SiZer for Exploration of Structures in Curves , 1999 .

[91]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[92]  Martin Styner,et al.  Statistical Shape Analysis of Multi-Object Complexes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  J. S. Marron,et al.  Functional Data Analysis of Amplitude and Phase Variation , 2015, 1512.03216.

[94]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[95]  Martin Styner,et al.  Cortical correspondence using entropy-based particle systems and local features , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[96]  Tom M. W. Nye,et al.  Principal components analysis in the space of phylogenetic trees , 2011, 1202.5132.

[97]  Gregg Tracton,et al.  Training models of anatomic shape variability. , 2008, Medical physics.

[98]  T. MARGUSH,et al.  Distances between trees , 1982, Discret. Appl. Math..

[99]  A. Munk,et al.  INTRINSIC SHAPE ANALYSIS: GEODESIC PCA FOR RIEMANNIAN MANIFOLDS MODULO ISOMETRIC LIE GROUP ACTIONS , 2007 .

[100]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[101]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[102]  Ted Chang,et al.  Estimating the Relative Rotation of Two Tectonic Plates from Boundary Crossings , 1988 .

[103]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[104]  Stephen R. Aylward,et al.  Volume rendering of segmented image objects , 2002, IEEE Transactions on Medical Imaging.

[105]  J. S. Marron,et al.  Local polynomial regression for symmetric positive definite matrices , 2012, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[106]  James Stephen Marron,et al.  SiZer for jump detection , 2006 .

[107]  J. Atchison,et al.  Logistic-normal distributions:Some properties and uses , 1980 .

[108]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[109]  J. Marron,et al.  Statistics of time warpings and phase variations , 2014 .

[110]  G. Terrell Statistical theory and computational aspects of smoothing , 1997 .

[111]  J. S. Marron,et al.  Visualizing genetic constraints , 2013, 1312.1801.

[112]  Jean-Philippe Vert,et al.  A tree kernel to analyse phylogenetic profiles , 2002, ISMB.

[113]  Gregory M. Constantine,et al.  Metric Models for Random Graphs , 1998 .

[114]  J. A. D. Aston,et al.  Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese , 2013, Journal of the American Statistical Association.

[115]  J. Marron,et al.  Analysis of principal nested spheres. , 2012, Biometrika.

[116]  Louis J. Billera,et al.  Geometry of the Space of Phylogenetic Trees , 2001, Adv. Appl. Math..

[117]  Martin Styner,et al.  Shape Modeling and Analysis with Entropy-Based Particle Systems , 2007, IPMI.

[118]  Chris Field,et al.  Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis , 2011 .

[119]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[120]  W. Cleveland The elements of graphing data , 1986 .

[121]  Calvin L. Williams,et al.  Modern Applied Statistics with S-Plus , 1997 .

[122]  Simone Vantini,et al.  Object Oriented Data Analysis: A few methodological challenges. , 2014, Biometrical journal. Biometrische Zeitschrift.

[123]  Piercesare Secchi,et al.  Distances and inference for covariance operators , 2014 .

[124]  N. Given,et al.  Regional Appearance in Deformable Model Segmentation , 2007 .

[125]  S. Sheather,et al.  Robust Estimation and Testing , 1990 .

[126]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[127]  Ted Chang,et al.  Evidence for relative motions between the Indian and Australian Plates during the last 20 m.y. from plate tectonic reconstructions: Implications for the deformation of the Indo‐Australian Plate , 1991 .

[128]  E. Grunsky,et al.  Robust Principal Component Analysis for Power Transformed Compositional Data , 2015 .

[129]  A. Adithya Parallel Coordinates , 2015 .

[130]  Milan Sonka,et al.  Segmentation, Skeletonization, and Branchpoint Matching - A Fully Automated Quantitative Evaluation of Human Intrathoracic Airway Trees , 2002, MICCAI.

[131]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[132]  Chris A. Glasbey,et al.  A latent Gaussian model for compositional data with zeros , 2008 .

[133]  Deborah F. Swayne,et al.  Data Visualization With Multidimensional Scaling , 2008 .

[134]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[135]  John A D Aston,et al.  Characterizing fundamental frequency in Mandarin: a functional principal component approach utilizing mixed effect models. , 2012, The Journal of the Acoustical Society of America.

[136]  J. S. Marron,et al.  Visualization and inference based on wavelet coefficients, SiZer and SiNos , 2007, Comput. Stat. Data Anal..

[137]  Conglin Lu,et al.  Statistical Multi-Object Shape Models , 2007, International Journal of Computer Vision.

[138]  J. S. Marron,et al.  Distance-Weighted Discrimination , 2007 .

[139]  B. Silverman,et al.  Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[140]  T. E. Harris First passage and recurrence distributions , 1952 .

[141]  R. Aspden,et al.  Early identification of radiographic osteoarthritis of the hip using an active shape model to quantify changes in bone morphometric features: can hip shape tell us anything about the progression of osteoarthritis? , 2007, Arthritis and rheumatism.

[142]  N. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[143]  Richard A. Becker,et al.  Brushing scatterplots , 1987 .

[144]  M. Fréchet Les éléments aléatoires de nature quelconque dans un espace distancié , 1948 .

[145]  Edward L. Chaney,et al.  Segmentation by Posterior Optimization of M-reps : Strategy and Results , 2007 .

[146]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[147]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[148]  J. S. Marron,et al.  Backwards Principal Component Analysis and Principal Nested Relations , 2014, Journal of Mathematical Imaging and Vision.

[149]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[150]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[151]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[152]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[153]  Shankar Bhamidi,et al.  Functional Data Analysis of Tree Data Objects , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[154]  J. S. Marron,et al.  Nested Sphere Statistics of Skeletal Models , 2013, Innovations for Shape Analysis, Models and Algorithms.

[155]  N. L. Johnson,et al.  Systems of frequency curves generated by methods of translation. , 1949, Biometrika.

[156]  John W. Tukey,et al.  Data-Based Graphics: Visual Display in the Decades to Come , 1990 .

[157]  A. Wood,et al.  A data-based power transformation for compositional data , 2011, 1106.1451.

[158]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[159]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[160]  J. Marron,et al.  "Virus hunting" using radial distance weighted discrimination , 2015, 1602.02900.

[161]  Gabor Pataki,et al.  A Principal Component Analysis for Trees , 2008, 0810.0944.

[162]  B. Silverman,et al.  Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[163]  James Stephen Marron,et al.  Visual Error Criteria for Qualitative Smoothing , 1995 .

[164]  Richard A. Becker,et al.  The Visual Design and Control of Trellis Display , 1996 .