Directional Statistics in Machine Learning: a Brief Review

The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more. Consequently, statistical and machine learning models tailored to different data encodings are important. We focus on data encoded as normalized vectors, so that their "direction" is more important than their magnitude. Specifically, we consider high-dimensional vectors that lie either on the surface of the unit hypersphere or on the real projective plane. For such data, we briefly review common mathematical models prevalent in machine learning, while also outlining some technical aspects, software, applications, and open mathematical challenges.

[1]  Baba C. Vemuri,et al.  von Mises-Fisher mixture model of the diffusion ODF , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[2]  Y. Chikuse Statistics on special manifolds , 2003 .

[3]  David E. Tyler Statistical analysis for the angular central Gaussian distribution on the sphere , 1987 .

[4]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[5]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[6]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[7]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[8]  Inderjit S. Dhillon,et al.  Diametrical clustering for identifying anti-correlated gene clusters , 2003, Bioinform..

[9]  G. S. Watson,et al.  The Statistics of Orientation Data , 1966, The Journal of Geology.

[10]  David Christie Efficient von Mises–Fisher concentration parameter estimation using Taylor series , 2015 .

[11]  Suvrit Sra,et al.  A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x) , 2012, Comput. Stat..

[12]  Charles Elkan,et al.  Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution , 2006, ICML.

[13]  Kaustubh Supekar,et al.  A parcellation scheme based on von Mises-Fisher distributions and Markov random fields for segmenting brain regions using resting-state fMRI , 2013, NeuroImage.

[14]  Markus Breitenbach,et al.  Mixture of Watson Distributions: A Generative Model for Hyperspherical Embeddings , 2007, AISTATS.

[15]  Polina Golland,et al.  Discovering structure in the space of fMRI selectivity profiles , 2010, NeuroImage.

[16]  A. Erdélyi,et al.  Higher Transcendental Functions , 1954 .

[17]  Inderjit S. Dhillon,et al.  Text Clustering with Mixture of von Mises-Fisher Distributions , 2009 .

[18]  John W. Fisher,et al.  A Dirichlet Process Mixture Model for Spherical Data , 2015, AISTATS.

[19]  I. Dhillon,et al.  Matrix nearness problems in data mining , 2007 .

[20]  Alain Trémeau,et al.  Unsupervised Clustering of Depth Images Using Watson Mixture Model , 2014, 2014 22nd International Conference on Pattern Recognition.

[21]  René Vidal,et al.  Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds , 2009, CVPR.

[22]  Shin Ishii,et al.  Parameter estimation for von Mises–Fisher distributions , 2007, Comput. Stat..

[23]  Suvrit Sra,et al.  The multivariate Watson distribution: Maximum-likelihood estimation and other aspects , 2011, J. Multivar. Anal..

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  David H. Laidlaw,et al.  White Matter Supervoxel Segmentation by Axial DP-Means Clustering , 2013, MCV.

[26]  Lloyd Allison,et al.  Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions , 2015, Machine Learning.

[27]  Bryan Silverthorn,et al.  Spherical Topic Models , 2010, ICML.

[28]  T. MacRobert Higher Transcendental Functions , 1955, Nature.

[29]  John S. Thompson,et al.  Spatial Fading Correlation model using mixtures of Von Mises Fisher distributions , 2009, IEEE Transactions on Wireless Communications.

[30]  Kanti V. Mardia,et al.  Statistics of Directional Data , 1972 .

[31]  Reshad Hosseini,et al.  Natural Image Modelling using Mixture Models with compression as an application , 2012 .

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[34]  Kurt Hornik,et al.  On maximum likelihood estimation of the concentration parameter of von Mises–Fisher distributions , 2013, Comput. Stat..

[35]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[36]  Shuicheng Yan,et al.  Correlation Metric for Generalized Feature Extraction , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ganapati P. Patil,et al.  Statistical Distributions in Scientific Work , 1981 .

[38]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[39]  K. Mardia Statistics of Directional Data , 1972 .

[40]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[41]  Thomas S. Huang,et al.  Generative model-based speaker clustering via mixture of von Mises-Fisher distributions , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[43]  Reshad Hosseini,et al.  K-means++ for mixtures of von Mises-Fisher Distributions , 2015, 2015 7th Conference on Information and Knowledge Technology (IKT).

[44]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[45]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[46]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[47]  Arindam Banerjee,et al.  Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning , 2007, SDM.

[48]  D. E. Amos,et al.  Computation of modified Bessel functions and their ratios , 1974 .

[49]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.