Clustering of periodic multichannel timeseries data with application to plasma fluctuations

Abstract A periodic datamining algorithm has been developed and used to extract distinct plasma fluctuations in multichannel oscillatory timeseries data. The technique uses the Expectation Maximisation algorithm to solve for the maximum likelihood estimates and cluster assignments of a mixture of multivariate independent von Mises distributions (EM-VMM). The performance of the algorithm shows significant benefits when compared to a periodic k-means algorithm and clustering using non-periodic techniques on several artificial datasets and real experimental data. Additionally, a new technique for identifying interesting features in multichannel oscillatory timeseries data is described (STFT-clustering). STFT-clustering identifies the coincidence of spectral features over most channels of a multi-channel array using the averaged short time Fourier transform of the signals. These features are filtered using clustering to remove noise. This method is particularly good at identifying weaker features and complements existing methods of feature extraction. Results from applying the STFT-clustering and EM-VMM algorithm to the extraction and clustering of plasma wave modes in the time series data from a helical magnetic probe array on the H-1NF heliac are presented.

[1]  P. Sprent,et al.  Statistical Analysis of Circular Data. , 1994 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Kanti V. Mardia,et al.  A multivariate von mises distribution with applications to bioinformatics , 2008 .

[4]  J. Vega Intelligent methods for data retrieval in fusion databases , 2008 .

[5]  Gonzalo Pajares,et al.  Data mining technique for fast retrieval of similar waveforms in Fusion massive databases , 2008 .

[6]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[7]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[8]  W. D’haeseleer,et al.  Flux Coordinates and Magnetic Field Structure , 1991 .

[9]  C. Nardone Multichannel fluctuation data analysis by the singular value decomposition method. Application to MHD modes in JET , 1992 .

[10]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Kanti V. Mardia,et al.  Mixtures of concentrated multivariate sine distributions with applications to bioinformatics , 2012 .

[13]  S. Haskey,et al.  A multichannel magnetic probe system for analysing magnetic fluctuations in helical axis plasmas. , 2013, The Review of scientific instruments.

[14]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[15]  Teruhisa Hochin,et al.  Search and retrieval method of similar plasma waveforms , 2004 .

[16]  S. Ohdachi,et al.  Studies of MHD Stability Using Data Mining Technique in Helical Plasmas , 2010 .

[17]  David L. Dowe,et al.  Intrinsic classification by MML - the Snob program , 1994 .

[18]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  W. W. Heidbrink,et al.  Basic physics of Alfvén instabilities driven by energetic particles in toroidally confined plasmas , 2008 .

[21]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[22]  L. G. Eliseev,et al.  Alfvén eigenmodes measured in the TJ-II stellarator , 2011 .

[23]  B. D. Blackwell,et al.  A data mining algorithm for automated characterisation of fluctuations in multichannel timeseries , 2009, Comput. Phys. Commun..

[24]  J. Vega,et al.  Overview of intelligent data retrieval methods for waveforms and images in massive fusion databases , 2009 .

[25]  J. Howard,et al.  Fluctuations and stability of plasmas in the H-1NF heliac , 2004 .

[26]  Paul,et al.  Excitation of toroidal Alfvén eigenmodes in TFTR. , 1991, Physical review letters.

[27]  Allen H. Boozer,et al.  Guiding center drift equations , 1980 .

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  A. Boozer What is a stellarator , 1998 .

[30]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[31]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[32]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[33]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[34]  Boyd Blackwell,et al.  H-1 design and construction , 1990 .