Recognizing the intended message of line graphs: methodology and applications

Information graphics (line graphs, bar charts, etc.) are common in popular media and periodicals. They are usually included in such documents to convey a message. This dissertation discusses the processing of one kind of information graphic, namely a line graph. It presents a learned model for segmenting a line graph into visually distinguishable trends and a Bayesian network inference model that hypothesizes the intended message of the graph based on communicative signals in the graphic. Besides recognizing the intended message of line graphs, this dissertation also presents a method for identifying the paragraph in the document that is most relevant to its information graphic. The research results provided by this dissertation can be used for several purposes: to give blind individuals access to information graphics in an article, to provide the basis for a longer summary of the graphic, to build a summary that captures both the article and its containing information graphics, and to indicate a graphic's content when indexing it for retrieval in a digital library.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Carlos A. R. Diniz ROBUSTNESS OF TWO-PHASE REGRESSION TESTS , .

[3]  Zhihong Li,et al.  Representing and Querying Line Graphs in Natural Language: The iGraph System , 2006, Smart Graphics.

[4]  Xiaoyan Li,et al.  Enhancing Relevance Models with Adaptive Passage Retrieval , 2008, ECIR.

[5]  Jim Hunter,et al.  Generating English summaries of time series data using the Gricean maxims , 2003, KDD '03.

[6]  Wen-Hsiang Lu,et al.  Question Intention Analysis and Entropy-Based Paragraph Extraction for Medical Question Answering , 2010 .

[7]  Bernadette Bouchon-Meunier,et al.  Time-Series Segmentation and Symbolic Representation, from Process-Monitoring to Data-Mining , 2001, Fuzzy Days.

[8]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[9]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[10]  Tak-Chung Fu,et al.  An evolutionary approach to pattern-based time series segmentation , 2004, IEEE Transactions on Evolutionary Computation.

[11]  Peng Wu,et al.  Accessible bar charts for visually impaired users , 2008 .

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[14]  R. H. Moore,et al.  Testing for a Single Outlier in Simple Linear Regression , 1973 .

[15]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[17]  Carlo Combi,et al.  Data mining with Temporal Abstractions: learning rules from time series , 2007, Data Mining and Knowledge Discovery.

[18]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[19]  Dominic Widdows,et al.  Orthogonal Negation in Vector Spaces for Modelling Word-Meanings and Document Retrieval , 2003, ACL.

[20]  D C Bradley,et al.  OOPSEG: a data smoothing program for quantitation and isolation of random measurement error. , 1995, Computer methods and programs in biomedicine.

[21]  Peng Wu,et al.  A Browser Extension for Providing Visually Impaired Users Access to the Content of Bar Charts on the Web , 2018, WEBIST.

[22]  Daniel L. Chester,et al.  Getting Computers to See Information Graphics So Users Do Not Have to , 2005, ISMIS.

[23]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[24]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[25]  Claudio Sartori,et al.  Detecting outbreaks by time series analysis , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[26]  Paolo Rosso,et al.  Answering questions with an n-gram based passage retrieval engine , 2009, Journal of Intelligent Information Systems.

[27]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[28]  Chris Mellish,et al.  Choosing the content of textual summaries of large time-series data sets , 2006, Natural Language Engineering.

[29]  Frederick J. Gravetter,et al.  Essentials of Statistics for the Behavioral Sciences , 1991 .

[30]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[31]  David D. Jensen,et al.  Mining of Concurrent Text and Time Series , 2008 .

[32]  Leo Ferres,et al.  (Natural language) interaction with graphical representations of statistical data , 2007, W4A '07.

[33]  Raymond W. Kulhavy,et al.  Comprehension of Graphics , 2011 .

[34]  Stephanie Elzer Schwartz,et al.  Communicative Signals as the Key to Automated Understanding of Simple Bar Charts , 2006, Diagrams.

[35]  Victor Lavrenko,et al.  A Generative Theory of Relevance , 2008, The Information Retrieval Series.

[36]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Xiaoming Xi,et al.  Graph Structure Supports Graph Description , 2002 .

[38]  Johanna D. Moore,et al.  AutoBrief: an experimental system for the automatic generation of briefings in integrated text and information graphics , 2004, Int. J. Hum. Comput. Stud..

[39]  R. Quandt Tests of the Hypothesis That a Linear Regression System Obeys Two Separate Regimes , 1960 .

[40]  U. Maichle Chapter 13 Cognitive Processes in Understanding Line Graphs , 1994 .

[41]  M. Corio,et al.  Generation of texts for information graphics , 1999 .

[42]  Stephen M. Kosslyn,et al.  Elements of graph design , 1993 .

[43]  Ruey S. Tsay,et al.  Analysis of Financial Time Series , 2005 .

[44]  Kathleen F. McCoy,et al.  A Discourse-Aware Graph-Based Content-Selection Framework , 2010, INLG.

[45]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[46]  Avi Parush,et al.  Helping People with Visual Impairments Gain Access to Graphical Information Through Natural Language: The iGraph System , 2006, ICCHP.

[47]  Hong Yu,et al.  Accessing bioscience images from abstract sentences , 2006, ISMB.

[48]  Huaiqing Wang,et al.  Novel Online Methods for Time Series Segmentation , 2008, IEEE Transactions on Knowledge and Data Engineering.

[49]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[50]  Ingrid Zukerman,et al.  A Probabilistic Framework for Recognizing Intention in Information Graphics , 2005, IJCAI.

[51]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.

[52]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[53]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[54]  Christopher J. C. Burges,et al.  Ranking as Learning Structured Outputs , 2005, NIPS 2005.

[55]  S. Carberry,et al.  Effectively Realizing the Inferred Message of an Information Graphic , 2007 .

[56]  Kathleen F. McCoy,et al.  Interactive SIGHT into information graphics , 2010, W4A.

[57]  Charles L. A. Clarke,et al.  Question Answering By Passage Selection , 2008 .

[58]  Evimaria Terzi,et al.  Efficient Algorithms for Sequence Segmentation , 2006, SDM.

[59]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[60]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[61]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[62]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[63]  Jim Hunter,et al.  Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[64]  Kathleen F. McCoy,et al.  Interactive SIGHT demo: textual summaries of simple bar charts , 2010, ASSETS '10.

[65]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[66]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[67]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval , 2008, NAACL.

[68]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[69]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[70]  Steven Pinker,et al.  A theory of graph comprehension. , 1990 .

[71]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[72]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[73]  Rong Yan,et al.  Multimedia Search with Pseudo-relevance Feedback , 2003, CIVR.

[74]  Tat-Seng Chua,et al.  Mining dependency relations for query expansion in passage retrieval , 2006, SIGIR.

[75]  R. Quandt The Estimation of the Parameters of a Linear Regression System Obeying Two Separate Regimes , 1958 .

[76]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[77]  C. Melody Carswell,et al.  Stimulus complexity and information integration in the spontaneous interpretations of line graphs , 1993 .

[78]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[79]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[80]  Jim Hunter,et al.  Automatic Generation of Textual Summaries from Neonatal Intensive Care Data , 2007, AIME.

[81]  Tak-Chung Fu,et al.  Stock time series pattern matching: Template-based vs. rule-based approaches , 2007, Eng. Appl. Artif. Intell..

[82]  E. Vieth Fitting piecewise linear regression functions to biological responses. , 1989, Journal of applied physiology.

[83]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[84]  Barbara Tversky,et al.  Gratuitous graphics? Putting preferences in perspective , 1996, CHI.

[85]  Kevyn Collins-Thompson,et al.  Estimation and use of uncertainty in pseudo-relevance feedback , 2007, SIGIR.

[86]  Ingrid Zukerman,et al.  Exploring and Exploiting the Limited Utility of Captions in Recognizing Intention in Information Graphics , 2005, ACL.

[87]  Junshui Ma,et al.  Online novelty detection on temporal sequences , 2003, KDD '03.

[88]  Wai Lam,et al.  News Sensitive Stock Trend Prediction , 2002, PAKDD.

[89]  R. Cook,et al.  Testing for Two-Phase Regressions , 1979 .

[90]  Johanna D. Moore,et al.  Describing Complex Charts in Natural Language: A Caption Generation System , 1998, CL.

[91]  Jeffery. M. Zacks,et al.  Bars and lines: A study of graphic communication , 1999, Memory & cognition.

[92]  P. Shah,et al.  Review of Graph Comprehension Research: Implications for Instruction , 2002 .

[93]  Johanna D. Moore,et al.  Generating Explanatory Captions for Information Graphics , 1995, IJCAI.

[94]  Suzan Verberne,et al.  Passage Retrieval for Question Answering using Sliding Windows , 2008, COLING 2008.

[95]  K. Worsley Testing for a Two-Phase Multiple Regression , 1983 .

[96]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[97]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[98]  Ehud Reiter,et al.  An Architecture for Data-to-Text Systems , 2007, ENLG.

[99]  Heikki Mannila,et al.  Time series segmentation for context recognition in mobile devices , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[100]  Stephanie Elzer Schwartz,et al.  Information graphics: an untapped resource for digital libraries , 2006, SIGIR.

[101]  Gitte Lindgaard,et al.  Improving accessibility to statistical graphs: the iGraph-Lite system , 2007, Assets '07.

[102]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[103]  W. Bruce Croft,et al.  Relevance Feedback and Personalization: A Language Modeling Perspective , 2001, DELOS.