Statistics-Driven Localization of Dissimilarities in Data

The identification of dissimilar regions in spatial and temporal data is a fundamental part of data exploration. This process takes place in applications, such as biomedical image processing as well as climatic data analysis. We propose a general solution for this task by employing well-founded statistical tools. From a large set of candidate regions, we derive an empirical distribution of the data and perform statistical hypothesis testing to obtain p-values as measures of dissimilarity. Having p-values, we quantify differences and rank regions on a global scale according to their dissimilarity to user-specified exemplar regions. We demonstrate our approach and its generality with two application scenarios, namely interactive exploration of climatic data and segmentation editing in the medical domain. In both cases our data exploration protocol unifies the interactive data analysis, guiding the user towards regions with the most relevant dissimilarity characteristics. The dissimilarity analysis results are conveyed with a radial tree, which prevents the user from searching exhaustively through all the data.

[1]  A. I.,et al.  Neural Field Continuum Limits and the Structure–Function Partitioning of Cognitive–Emotional Brain Networks , 2023, Biology.

[2]  AN Kolmogorov-Smirnov,et al.  Sulla determinazione empírica di uma legge di distribuzione , 1933 .

[3]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[4]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[5]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[6]  J. Hemelrijk,et al.  Some remarks on the combination of independent tests , 1953 .

[7]  T. W. Anderson On the Distribution of the Two-Sample Cramer-von Mises Criterion , 1962 .

[8]  Eugene S. Edgington,et al.  An Additive Method for Combining Probability Values from Independent Experiments , 1972 .

[9]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[10]  A. Pettitt A two-sample Anderson-Darling rank statistic , 1976 .

[11]  J. D. Beasley,et al.  Algorithm AS 111: The Percentage Points of the Normal Distribution , 1977 .

[12]  Anil K. Bera,et al.  A test for normality of observations and regression residuals , 1987 .

[13]  M. J. Wichura The percentage points of the normal distribution , 1988 .

[14]  Donald A. Jackson,et al.  Are probability estimates from the permutation model of Mantel's test stable? , 1989 .

[15]  William J. Cody,et al.  Algorithm 715: SPECFUN–a portable FORTRAN package of special function routines and test drivers , 1993, TOMS.

[16]  Rangasami L. Kashyap,et al.  Building Skeleton Models via 3-D Medial Surface/Axis Thinning Algorithms , 1994, CVGIP Graph. Model. Image Process..

[17]  J. Praestgaard Permutation and bootstrap Kolmogorov-Smirnov tests for the equality of two distributions , 1995 .

[18]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[19]  Karl J. Friston,et al.  Multisubject fMRI Studies and Conjunction Analyses , 1999, NeuroImage.

[20]  Jean-Marie Dufour,et al.  Série Scientifique Scientific Series Exact Nonparametric Two- Sample Homogeneity Tests for Possibly Discrete Distributions Exact Nonparametric Two-sample Homogeneity Tests for Possibly Discrete Distribution , 2022 .

[21]  Heinz-Otto Peitgen,et al.  IWT-interactive watershed transform: a hierarchical method for efficient interactive and automated segmentation of multidimensional gray-scale images , 2003, SPIE Medical Imaging.

[22]  Ben Shneiderman,et al.  Dynamic Query Tools for Time Series Data Sets: Timebox Widgets for Interactive Exploration , 2004, Inf. Vis..

[23]  Joe Michael Kniss,et al.  Statistically quantitative volume visualization , 2005, VIS 05. IEEE Visualization, 2005..

[24]  Ben Shneiderman,et al.  Interactive pattern search in time series , 2005, IS&T/SPIE Electronic Imaging.

[25]  Suyash P. Awate,et al.  MRI Tissue Classification with Neighborhood Statistics: A Nonparametric, Entropy-Minimizing Approach , 2005, MICCAI.

[26]  Jesper Andersson,et al.  Valid conjunction inference with the minimum statistic , 2005, NeuroImage.

[27]  M. Whitlock Combining probability from independent tests: the weighted Z‐method is superior to Fisher's approach , 2005, Journal of evolutionary biology.

[28]  Anders Ynnerman,et al.  Local Histograms for Design of Transfer Functions in Direct Volume Rendering , 2006, IEEE Transactions on Visualization and Computer Graphics.

[29]  Robert D. Cousins,et al.  Annotated Bibliography of Some Papers on Combining Significances or p-values , 2007, 0705.2209.

[30]  Wolfgang Jank,et al.  Similarity-Based Forecasting with Simultaneous Previews: A River Plot Interface for Time Series Forecasting , 2007, 2007 11th International Conference Information Visualization (IV '07).

[31]  Eduard Gröller,et al.  Statistical analysis of Multi-Material Components using Dual Energy CT , 2008, VMV.

[32]  Jian Huang,et al.  Distribution-Driven Visualization of Volume Data , 2009, IEEE Transactions on Visualization and Computer Graphics.

[33]  Kenneth Moreland,et al.  Diverging Color Maps for Scientific Visualization , 2009, ISVC.

[34]  Stefan Bruckner,et al.  Volume visualization based on statistical transfer-function spaces , 2010, 2010 IEEE Pacific Visualization Symposium (PacificVis).

[35]  Ghassan Hamarneh,et al.  Exploration and Visualization of Segmentation Uncertainty using Shape and Appearance Prior Information , 2010, IEEE Transactions on Visualization and Computer Graphics.

[36]  Jörg-Stefan Praßni,et al.  Shape-based transfer functions for volume visualization , 2010, 2010 IEEE Pacific Visualization Symposium (PacificVis).

[37]  John W. Emerson,et al.  Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions , 2011, R J..

[38]  Tarn Duong,et al.  Closed-form density-based framework for automatic detection of cellular morphology changes , 2012, Proceedings of the National Academy of Sciences.

[39]  Peter Mindek,et al.  ViviSection: Skeleton‐based Volume Editing , 2013, Comput. Graph. Forum.

[40]  Silvia Miksch,et al.  Visual Analytics for Model Selection in Time Series Analysis , 2013, IEEE Transactions on Visualization and Computer Graphics.

[41]  Stefan Bruckner,et al.  Guided Volume Editing based on Histogram Dissimilarity , 2015, Comput. Graph. Forum.