Relating interesting quantitative time series patterns with text events and text features

In many application areas, the key to successful data analysis is the integrated analysis of heterogeneous data. One example is the financial domain, where time-dependent and highly frequent quantitative data (e.g., trading volume and price information) and textual data (e.g., economic and political news reports) need to be considered jointly. Data analysis tools need to support an integrated analysis, which allows studying the relationships between textual news documents and quantitative properties of the stock market price series. In this paper, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which reflect quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a-priori method. First, based on heuristics we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a-priori method supports the discovery of such sequential temporal patterns. Then, various text features like the degree of sentence nesting, noun phrase complexity, the vocabulary richness, etc. are extracted from the news to obtain meta patterns. Meta patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time-, cluster- and sequence visualization and analysis functionality. We provide two case studies, showing the effectiveness of our combined quantitative and textual analysis work flow. The workflow can also be generalized to other application domains such as data analysis of smart grids, cyber physical systems or the security of critical infrastructure, where the data consists of a combination of quantitative and textual time series data.

[1]  Pak Chung Wong,et al.  Visualizing sequential patterns for text mining , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[2]  Tobias Schreck,et al.  Visual Cluster Analysis of Trajectory Data with Interactive Kohonen Maps , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[3]  Martin Wattenberg,et al.  TIMELINESTag clouds and the case for vernacular visualization , 2008, INTR.

[4]  Michał Dzieliński,et al.  The Role of Information Intermediaries in Financial Markets , 2013 .

[5]  Ben Shneiderman,et al.  Interactive Exploration of Time Series Data , 2001, Discovery Science.

[6]  Yen-Liang Chen,et al.  Discovering hybrid temporal patterns from sequences consisting of point- and interval-based events , 2009, Data Knowl. Eng..

[7]  Fernando Scarpati The Efficient Market Hypothesis-EMH , 2013 .

[8]  Teuvo Kohonen,et al.  Visual Explorations in Finance , 1998 .

[9]  T. Kohonen,et al.  Visual Explorations in Finance with Self-Organizing Maps , 1998 .

[10]  Yaw-Ling Lin,et al.  Hybrid Temporal Pattern Mining with Time Grain on Stock Index , 2011, 2011 Fifth International Conference on Genetic and Evolutionary Computing.

[11]  M. Mitchell,et al.  The Impact of Public Information on the Stock Market , 1994 .

[12]  Robert Kincaid,et al.  SignalLens: Focus+Context Applied to Electronic Time Series , 2010, IEEE Transactions on Visualization and Computer Graphics.

[13]  Daniel A. Keim,et al.  Visual Readability Analysis: How to Make Your Writings Easier to Read , 2010, IEEE Transactions on Visualization and Computer Graphics.

[14]  Ferdinand Graf,et al.  Mechanically Extracted Company Signals and their Impact on Stock and Credit Markets , 2011 .

[15]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  Pak Chung Wong,et al.  Visualizing association rules for text mining , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[18]  Sang Jeong Lee,et al.  Aspect-level news browsing: understanding news events from multiple viewpoints , 2010, IUI '10.

[19]  Teuvo Kohonen,et al.  Essentials of the self-organizing map , 2013, Neural Networks.

[20]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[21]  Matthew O. Ward,et al.  Interactive data visualization , 2010 .

[22]  R. Quirk A Grammar of contemporary English , 1974 .

[23]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[24]  Dieter W. Fellner,et al.  Feature-based automatic identification of interesting data segments in group movement data , 2014, Inf. Vis..

[25]  Matthew O. Ward,et al.  Interactive Data Visualization - Foundations, Techniques, and Applications , 2010 .

[26]  Daniela Oelke,et al.  Visual document analysis: towards a semantic analysis of large document collections , 2010 .

[27]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[28]  Daniel A. Keim,et al.  Mastering the Information Age - Solving Problems with Visual Analytics , 2010 .

[29]  P. Gloor,et al.  Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear” , 2011 .

[30]  Tobias Schreck,et al.  Visual exploration of local interest points in sets of time series , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[31]  Allison Woodruff,et al.  Guidelines for using multiple views in information visualization , 2000, AVI '00.