High Dimensional Big Data and Pattern Analysis: A Tutorial

Sensors and actuators embedded in physical objects being linked through wired/wireless networks known as "internet of things" are churning out huge volumes of data (McKinsey Quarterly report, 2010). This phenomenon has led to the archiving of mammoth amounts of data from scientific simulations in the physical sciences and bioinformatics, to social media and a plethora of other areas. It is predicted that over 30 billion devices with 200 billion intermittent connections will be connected by 2020. The creation and archival of the massive amounts of data spawned a multitude of industries. Data management and up-stream analytics is aided by data compression and dimensionality reduction. This review paper will focus on some foundational methods of dimensionality reduction by examining in extensive detail some of the main algorithms, and points the reader to emerging next generation methods that seek to identify structure within high dimensional data not captured by 2nd order statistics.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  J. Wolfowitz,et al.  Introduction to the Theory of Statistics. , 1951 .

[3]  Franklin A. Graybill,et al.  Introduction to the Theory of Statistics, 3rd ed. , 1974 .

[4]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[5]  J. Overall,et al.  Applied multivariate analysis , 1983 .

[6]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[7]  Gilbert Strang,et al.  Introduction to applied mathematics , 1988 .

[8]  J. Friedman Exploratory Projection Pursuit , 1987 .

[9]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  E. Oja,et al.  Independent Component Analysis , 2013 .

[12]  N. H. Timm Applied Multivariate Analysis , 2002 .

[13]  W. Härdle,et al.  Applied Multivariate Statistical Analysis , 2003 .

[14]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[15]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[16]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17]  Christopher J. C. Burges,et al.  Dimension Reduction: A Guided Tour , 2010, Found. Trends Mach. Learn..

[18]  Choudur K. Lakshminarayan,et al.  Pattern Recognition in Large-Scale Data Sets: Application in Integrated Circuit Manufacturing , 2013, BDA.