Measuring Abnormality in High Dimensional Spaces with Applications in Biomechanical Gait Analysis

Accurately measuring a subject’s abnormality using high dimensional data can empower better outcomes research. Utilizing applications in instrumented gait analysis, this article demonstrates how using data that is inherently non-independent to measure overall abnormality may bias results. A methodology is then introduced to address this bias and accurately measure abnormality in high dimensional spaces. While this methodology is in line with previous literature, it differs in two major ways. Advantageously, it can be applied to datasets in which the number of observations is less than the number of features/variables, and it can be abstracted to practically any number of domains or dimensions. Initial results of these methods show that they can detect known, real-world differences in abnormality between subject groups where established measures could not. This methodology is made freely available via the abnormality R package on CRAN.

[1]  Tsau Young Lin,et al.  Foundations and Novel Approaches in Data Mining , 2006, Studies in Computational Intelligence.

[2]  Peter Filzmoser,et al.  Outlier identification in high dimensions , 2008, Comput. Stat. Data Anal..

[3]  Ira Assent,et al.  OutRank: ranking outliers in high dimensional data , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[4]  G. Zararsiz,et al.  MVN: An R Package for Assessing Multivariate Normality , 2014, R J..

[5]  Manuela Galli,et al.  Novel characterization of gait impairments in people with multiple sclerosis by means of the gait profile score , 2014, Journal of the Neurological Sciences.

[6]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[7]  S. Frontier Étude de la décroissance des valeurs propres dans une analyse en composantes principales: Comparaison avec le modd́le du bâton brisé , 1976 .

[8]  Michael H Schwartz,et al.  The Gait Deviation Index: a new comprehensive index of gait pathology. , 2008, Gait & posture.

[9]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[10]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[11]  L. Guttman Some necessary conditions for common-factor analysis , 1954 .

[12]  Manuela Galli,et al.  Use of the Gait Deviation Index for the assessment of gastrocnemius fascia lengthening in children with Cerebral Palsy. , 2011, Research in developmental disabilities.

[13]  Manuela Galli,et al.  Use of the gait profile score for the evaluation of patients with joint hypermobility syndrome/Ehlers-Danlos syndrome hypermobility type. , 2013, Research in developmental disabilities.

[14]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[15]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[16]  Donald A. Jackson,et al.  How many principal components? stopping rules for determining the number of non-trivial axes revisited , 2005, Comput. Stat. Data Anal..

[17]  L. Schutte,et al.  An index for quantifying deviations from normal gait. , 2000, Gait & posture.

[18]  J. Royston Some Techniques for Assessing Multivarate Normality Based on the Shapiro‐Wilk W , 1983 .

[19]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[20]  Richard Baker,et al.  Single-event multilevel surgery in children with spastic diplegia: a pilot randomized controlled trial. , 2011, The Journal of bone and joint surgery. American volume.

[21]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[22]  Theodore A. Walls,et al.  Non-Graphical Solutions for Cattell’s Scree Test , 2013 .

[23]  L. Ferré Selection of components in principal component analysis: a comparison of methods , 1995 .

[24]  J. McGinley,et al.  Single‐event multilevel surgery for children with cerebral palsy: a systematic review , 2012, Developmental medicine and child neurology.

[25]  David B. Skillicorn Understanding High-Dimensional Spaces , 2012, SpringerBriefs in Computer Science.

[26]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[27]  Stéphane Dray,et al.  On the number of principal components: A test of dimensionality based on measurements of similarity between matrices , 2008, Comput. Stat. Data Anal..

[28]  P. Wretenberg,et al.  Quantifying gait deviations in individuals with rheumatoid arthritis using the Gait Deviation Index , 2014, Scandinavian journal of rheumatology.

[29]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[30]  Donald A. Jackson STOPPING RULES IN PRINCIPAL COMPONENTS ANALYSIS: A COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES' , 1993 .

[31]  Donald A. Jackson,et al.  GIVING MEANINGFUL INTERPRETATION TO ORDINATION AXES: ASSESSING LOADING SIGNIFICANCE IN PRINCIPAL COMPONENT ANALYSIS , 2003 .

[32]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[33]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[34]  B C McDowell,et al.  Further evidence of validity of the Gait Deviation Index. , 2010, Gait & posture.

[35]  Nicolas Vuillerme,et al.  Biomechanics and physiological parameters during gait in lower-limb amputees: a systematic review. , 2011, Gait & posture.

[36]  Shu-Ching Chen,et al.  Principal Component-based Anomaly Detection Scheme , 2006, Foundations and Novel Approaches in Data Mining.

[37]  J. Horn A rationale and test for the number of factors in factor analysis , 1965, Psychometrika.

[38]  Adam Rozumalski,et al.  The gait profile score and movement analysis profile. , 2009, Gait & posture.