Supporting exploratory text analysis in literature study

We present WordSeer, an exploratory analysis environment for literary text. Literature study is a cycle of reading, interpretation, exploration, and under- standing. While there is now abundant technological support for reading and interpreting literary text in new ways through text-processing algorithms, the other parts of the cycle—exploration and understanding—have been relatively neglected. We are motivated by the literature on sensemaking, an area of com- puter science devoted to supporting open-ended analysis on large collections of data. Our software system integrates tools for algorithmic processing of text with interaction techniques that support the interpretive, exploratory, and note-taking aspects of scholarship. At present, the system supports grammatical search and contextual similarity determination, visualization of patterns of word context, and examination and organization of the source material for comparison and hypothesis building. This article illustrates its capabilities by analyzing language-use differences between male and female characters in Shakespeare's plays. We find that when love is a major plot point, the language Shakespeare uses to refer to women becomes more physical, and the language referring to men becomes more sentimental. Future work will incorporate additional sensemaking tools to aid comparison, exploration, grouping, and pattern recognition.

[1]  Jean-Daniel Fekete,et al.  Compus: visualization and analysis of structured documents for understanding social life in the 16th century , 2000, DL '00.

[2]  Catherine Plaisant,et al.  What's being said near “Martha”? Exploring name entities in literary text collections , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[3]  Tanya E. Clement 'A thing not beginning and not ending': using digital tools to distant-read Gertrude Stein's The Making of Americans , 2008, Lit. Linguistic Comput..

[4]  Stephen G. Eick,et al.  Graphically Displaying Text , 1994 .

[5]  Geoffrey Rockwell,et al.  What is Text Analysis, Really? , 2003, Lit. Linguistic Comput..

[6]  David S. Kaufer,et al.  Computer-Aided Rhetorical Analysis , 2012 .

[7]  Catherine Plaisant,et al.  Exploring erotics in Emily Dickinson's correspondence with text mining and visual interfaces , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[8]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[9]  P. Pirolli,et al.  The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis , 2007 .

[10]  James Lewis,et al.  Data and text mining Text similarity : an alternative way to search MEDLINE , 2006 .

[11]  Xavier Llorà,et al.  Meandre: Semantic-Driven Data-Intensive Flows in the Clouds , 2008, 2008 IEEE Fourth International Conference on eScience.

[12]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[13]  Martin Wattenberg,et al.  The Word Tree, an Interactive Visual Concordance , 2008, IEEE Transactions on Visualization and Computer Graphics.

[14]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[15]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[16]  S. Ishizaki Arab Women in Arab News: Old Stereotypes and New Media , 2012 .

[17]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[18]  Marti A. Hearst Supporting the Search Process , 2009 .

[19]  Robin Jeffries,et al.  Orienteering in an information landscape: how information seekers get from here to there , 1993, INTERCHI.

[20]  Jonathan Hope,et al.  The Very Large Textual Object: A Prosthetic Reading of Shakespeare , 2004 .

[21]  Stéfan Sinclair,et al.  Ubiquitous Text Analysis , 2010 .

[22]  Stuart K. Card,et al.  The cost structure of sensemaking , 1993, INTERCHI.

[23]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[24]  Martin Mueller Digital Shakespeare, or towards a literary informatics , 2008 .