TIARA: a visual exploratory text analytic system

In this paper, we present a novel exploratory visual analytic system called TIARA (Text Insight via Automated Responsive Analytics), which combines text analytics and interactive visualization to help users explore and analyze large collections of text. Given a collection of documents, TIARA first uses topic analysis techniques to summarize the documents into a set of topics, each of which is represented by a set of keywords. In addition to extracting topics, TIARA derives time-sensitive keywords to depict the content evolution of each topic over time. To help users understand the topic-based summarization results, TIARA employs several interactive text visualization techniques to explain the summarization results and seamlessly link such results to the original text. We have applied TIARA to several real-world applications, including email summarization and patient record analysis. To measure the effectiveness of TIARA, we have conducted several experiments. Our experimental results and initial user feedback suggest that TIARA is effective in aiding users in their exploratory text analytic tasks.

[1]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[2]  Robert Kosara,et al.  Preconceptions and Individual Differences in Understanding Visual Metaphors , 2009, Comput. Graph. Forum.

[3]  Marti A. Hearst,et al.  Scatter/gather browsing communicates the topic structure of a very large text collection , 1996, CHI.

[4]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Desney S. Tan,et al.  FacetLens: exposing trends and relationships to support sensemaking within faceted datasets , 2009, CHI.

[7]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[8]  Martin Wattenberg,et al.  The Word Tree, an Interactive Visual Concordance , 2008, IEEE Transactions on Visualization and Computer Graphics.

[9]  Fernando Pereira,et al.  Generating summary keywords for emails using topics , 2008, IUI '08.

[10]  Jing Hua,et al.  Exemplar-based Visualization of Large Document Corpus (InfoVis2009-1115) , 2009, IEEE Transactions on Visualization and Computer Graphics.

[11]  Shimei Pan,et al.  Topic and keyword re-ranking for LDA-based topic modeling , 2009, CIKM.

[12]  Weimao Ke,et al.  Dynamicity vs. effectiveness: studying online clustering for scatter/gather , 2009, SIGIR.

[13]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[14]  Furu Wei,et al.  Constrained co-clustering for textual documents , 2010, AAAI 2010.

[15]  김종덕,et al.  Interactive. , 1996, Nursing older people.

[16]  Tao Jin,et al.  A new visual search interface for web browsing , 2009, WSDM '09.

[17]  Desney S. Tan,et al.  FacetMap: A Scalable Search and Browse Visualization , 2006, IEEE Transactions on Visualization and Computer Graphics.

[18]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[19]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[20]  Martin Wattenberg,et al.  Mapping Text with Phrase Nets , 2009, IEEE Transactions on Visualization and Computer Graphics.

[21]  Naonori Ueda,et al.  Probabilistic latent semantic visualization: topic model for visualizing documents , 2008, KDD.

[22]  C. Elkan,et al.  Topic Models , 2008 .

[23]  James D. Foley,et al.  ResultMaps: Visualization for Search Interfaces , 2009, IEEE Transactions on Visualization and Computer Graphics.

[24]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[25]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[26]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[27]  Shimei Pan,et al.  Interactive, topic-based visual text summarization and analysis , 2009, CIKM.

[28]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.