Discovering shakers from evolving entities via cascading graph inference

In an interconnected and dynamic world, the evolution of one entity may cause a series of significant value changes for some others. For example, the currency inflation of Thailand caused the currency slump of other Asian countries, which eventually led to the financial crisis of 1997. We call such high impact entities shakers. To discover shakers, we first introduce the concept of a cascading graph to capture the causality relationships among evolving entities over some period of time, and then infer shakers from the graph. In a cascading graph, nodes represent entities and weighted links represent the causality effects. In order to find hidden shakers in such a graph, two scoring functions are proposed, each of which estimates how much the target entity can affect the values of some others. The idea is to artificially inject a significant change on the target entity, and estimate its direct and indirect influence on the others, by following an inference rule under the Markovian assumption. Both scoring functions are proven to be only dependent on the structure of a cascading graph and can be calculated in polynomial time. Experiments included three datasets in social sciences. Without directly applicable previous methods, we modified three graphical models as baselines. The two proposed scoring functions can effectively capture those high impact entities. For example, in the experiment to discover stock market shakers, the proposed models outperform the three baselines by as much as 50% in accuracy with the ground truth obtained from Yahoo!~Finance.

[1]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[2]  D. Miles,et al.  Macroeconomics: Understanding the Wealth of Nations , 2001 .

[3]  Glenn Shafer,et al.  Causal Logic , 1998, ECAI.

[4]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[5]  Masahiro Kimura,et al.  Extracting Influential Nodes for Information Diffusion on a Social Network , 2007, AAAI.

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  J. Pearl Causal inference in statistics: An overview , 2009 .

[9]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[10]  Yu Wang,et al.  Community-based greedy algorithm for mining top-K influential nodes in mobile social networks , 2010, KDD.

[11]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[12]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[13]  Quentin Smith,et al.  Causation and the Logical Impossibility of a Divine Cause , 1996 .

[14]  Dimitrios Gunopulos,et al.  Mining Time Series Data , 2005, Data Mining and Knowledge Discovery Handbook.

[15]  Philip S. Yu,et al.  Identifying the influential bloggers in a community , 2008, WSDM '08.

[16]  Éva Tardos,et al.  Influential Nodes in a Diffusion Model for Social Networks , 2005, ICALP.

[17]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Dimitrios Gunopulos,et al.  Finding effectors in social networks , 2010, KDD.

[19]  Stacy Williams,et al.  Dynamical clustering of exchange rates , 2009 .

[20]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[21]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[22]  Philip S. Yu,et al.  Detecting Leaders from Correlated Time Series , 2010, DASFAA.

[23]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[24]  Ioannis Tsamardinos Causal Data Mining in Bioinformatics , 2007, ERCIM News.

[25]  Yi Zhang,et al.  Normalizing the polynomial-match for the non-linear signal in transducers , 2005 .

[26]  Junyi Shen,et al.  Study on Representation of Time Series Based on Subsection Polynomial Fitting , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[27]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[28]  Christos Faloutsos,et al.  Patterns of Cascading Behavior in Large Blog Graphs , 2007, SDM.