Discovering the Trading Pattern of Financial Market Participants: Comparison of Two Co-Clustering Methods

Co-clustering is rapidly becoming a powerful data analysis technique in varied fields, such as gene expression analysis, data and web mining, and market baskets analysis. In this paper, two co-clustering methods based on smooth plaid model (SPM) and parallel factor decomposition with sparse latent factors (SLF-PARAFAC) are respectively applied to synthetic data set and investors’ transaction-level data set from the China Financial Futures Exchange. We present the comparison between two methodologies. Both SLF-PARAFAC and SPM are efficient, robust, and well suited for discovering trading ecosystems in modern financial markets. We recognize temporal pattern differences of various trader types. The results help to develop a thorough understanding of trading behaviors, and to detect patterns and irregularities.

[1]  Kai Chen,et al.  Joint Prediction of Rating and Popularity for Cold-Start Item by Sentinel User Selection , 2016, IEEE Access.

[2]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[3]  Qingshan Liu,et al.  Joint Active Learning with Feature Selection via CUR Matrix Decomposition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Hong Yan,et al.  Coclustering of Multidimensional Big Data: A Useful Tool for Genomic, Financial, and Other Data Analysis , 2017, IEEE Systems, Man, and Cybernetics Magazine.

[5]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[6]  George Michailidis,et al.  Discovering the Ecosystem of an Electronic Financial Market with a Dynamic Machine-Learning Method , 2011, Algorithmic Finance.

[7]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[8]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[9]  G. Golub,et al.  A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies , 2007, Proceedings of the National Academy of Sciences.

[10]  Hongyuan Zha,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Towards Effective Prioritizing Water Pipe Replacement and Rehabilitation ∗ , 2022 .

[11]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[12]  Tao Wu,et al.  General Tensor Spectral Co-clustering for Higher-Order Data , 2016, NIPS.

[13]  Allen Carrion Very Fast Money: High-Frequency Trading on the NASDAQ , 2013 .

[14]  Terrence Hendershott,et al.  Informed Trading and Portfolio Returns , 2010 .

[15]  Hung-Chia Chen,et al.  Identification of Bicluster Regions in a Binary Matrix and Its Applications , 2013, PloS one.

[16]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[17]  Hong Yan,et al.  Subdimension-based similarity measure for DNA microarray data clustering. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Xiaogang Wang,et al.  Linear grouping using orthogonal regression , 2006, Comput. Stat. Data Anal..

[19]  G. Sherlock Analysis of large-scale gene expression data. , 2000, Current opinion in immunology.

[20]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[21]  Xin Liu,et al.  On Predictive Patent Valuation: Forecasting Patent Citations and Their Types , 2017, AAAI.

[22]  Pierre Comon,et al.  Tensors : A brief introduction , 2014, IEEE Signal Processing Magazine.

[23]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[24]  Hong Yan,et al.  Discovering biclusters in gene expression data based on high-dimensional linear geometries , 2008, BMC Bioinformatics.

[25]  A. Kyle,et al.  The Flash Crash: The Impact of High Frequency Trading on an Electronic Market , 2011 .

[26]  O. Alter,et al.  A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms , 2011, PloS one.

[27]  Adam D. Clark-Joseph,et al.  Exploratory Trading , 2014 .

[28]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[29]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[30]  Changsheng Li,et al.  On Modeling and Predicting Individual Paper Citation Count over Time , 2016, IJCAI.