Relationship discovery in public opinion and actual behavior for social media stock data space

With the rise of social data media, the cyber world nearly parallels to the real world. The trajectory of a hot event is reflected in social media by Public Opinion Data Space (OS) and Actual Behavior Data Space (BS). However, the relationships with a variety of mechanisms in each space or between them are often unknown. To solve the above issues, the traditional methods for inferring relationship are by performing a statistical similarity analysis of time sequence from dynamic elements. In specially, the research of clustering nonlinear correlation data object is rare, so we propose Matrix Similarity Clustering Algorithm (MSCA) based on random matrix theory and combined with sliding window technology to cluster the similarity of multidimensional time sequences. This method is effective to detect the trend relationship of time sequences with multiple dynamic elements. In addition, we construct a knowledge map to analyse the relationships in OS and BS.

[1]  H. Stanley,et al.  Statistical properties of cross-correlation in the Korean stock market , 2010, 1010.2048.

[2]  J. Bouchaud,et al.  Theory Of Financial Risk And Derivative Pricing , 2000 .

[3]  Raj Kumar Pan,et al.  Self-organization of price fluctuation distribution in evolving markets , 2007 .

[4]  Wei-Xing Zhou,et al.  Dynamic Evolution of Cross-Correlations in the Chinese Stock Market , 2013, PloS one.

[5]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[6]  Bachir Boucheham Reduced data similarity-based matching for time series patterns alignment , 2010, Pattern Recognit. Lett..

[7]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[8]  Rosario N. Mantegna,et al.  Book Review: An Introduction to Econophysics, Correlations, and Complexity in Finance, N. Rosario, H. Mantegna, and H. E. Stanley, Cambridge University Press, Cambridge, 2000. , 2000 .

[9]  Liu Xiao-ying Fast Subsequence Matching in Time-series Database , 2008 .

[10]  Lei Wang,et al.  An Estimate Method of Event Influence Scope Based on Special Field , 2015, 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI).

[11]  M. Timme,et al.  Revealing networks from dynamics: an introduction , 2014, 1408.2963.

[12]  M. Oshikawa,et al.  Random matrix theory analysis of cross correlations in financial markets. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Ivo Grosse,et al.  Time-lag cross-correlations in collective phenomena , 2010 .

[14]  Xiong-Fei Jiang,et al.  Anti-correlation and subsector structure in financial systems , 2012, 1201.6418.

[15]  Arkady Pikovsky,et al.  Network reconstruction from random phase resetting. , 2010, Physical review letters.

[16]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[17]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[18]  Boris Podobnik,et al.  Systemic risk and spatiotemporal dynamics of the US housing market , 2013, Scientific Reports.

[19]  William T. Shaw,et al.  Correlation structure and dynamics in volatile markets , 2010 .

[20]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[21]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[22]  I. Jolliffe Principal Component Analysis , 2002 .

[23]  Cyrus Shahabi,et al.  Feature subset selection and feature ranking for multivariate time series , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  V. Plerou,et al.  Random matrix approach to cross correlations in financial data. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[26]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[27]  Bo Zheng,et al.  Cross-correlation in financial dynamics , 2009, 1202.0344.

[28]  Insup Lee,et al.  Cyber-physical systems: The next computing revolution , 2010, Design Automation Conference.

[29]  Yoshi Fujiwara,et al.  Fluctuation-dissipation theory of input-output interindustrial relations. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[31]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[32]  J. Bouchaud,et al.  Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management , 2011 .