Different Strokes of Different Folks: Searching for Health Narratives in Weblogs

The utility of storytelling in the interaction between healthcare providers and patients is now firmly established, but the potential use of large-scale story collections for health-related inquiry has not yet been explored. In particular, the enormous scale of storytelling in personal web logs offers investigators in health-related fields new opportunities to study the behavior and beliefs of diverse patient populations outside of clinical settings. In this paper we address the technical challenges in identifying personal stories about specific health issues from corpora of millions of web log posts. We describe a novel infrastructure for collecting and indexing the stories posted each day to English-language web logs, coupled with user interfaces designed to support targeted searches of these collections. We evaluate the effectiveness of this search technology in an effort to identify hundreds of first person and third person accounts of strokes, for the purpose of studying gender differences in the way that these health emergencies are described. Results indicate that the use of relevance feedback significantly improves the effectiveness of the search. We conclude with a discussion of sample biases that are inherent in web log storytelling and heightened by our approach, and propose ways to mitigate these biases.

[1]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[2]  Philip W. Brickner,et al.  Narrative Medicine: Honoring the Stories of Illness , 2008 .

[3]  Carol Thomas,et al.  Negotiating the contested terrain of narrative methods in illness contexts. , 2010, Sociology of health & illness.

[4]  E. Miller,et al.  Diagnosis blog: checking up on health blogs in the blogosphere. , 2010, American journal of public health.

[5]  Sean A. Munson,et al.  The Prevalence of Political Discourse in Non-Political Blogs , 2011, ICWSM.

[6]  Sharon McKinley,et al.  Patient Recognition of and Response to Symptoms of TIA or Stroke , 2006, Neuroepidemiology.

[7]  L. Lisabeth,et al.  Acute Stroke Symptoms: Comparing Women and Men , 2009, Stroke.

[8]  Erin E. Hollenbaugh Motives for Maintaining Personal Journal Blogs , 2011, Cyberpsychology Behav. Soc. Netw..

[9]  Wanda Pratt,et al.  Personal health information management , 2006, CACM.

[10]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[11]  R. Swanson,et al.  Identifying Personal Stories in Millions of Weblog Entries , 2009, ICWSM 2009.

[12]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[13]  Paul Atkinson,et al.  Narrative Turn or Blind Alley? , 1997 .

[14]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[15]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[16]  Samantha A. Adams,et al.  Blog-based applications and health information: Two case studies that illustrate important questions for Consumer Health Informatics (CHI) research , 2010, Int. J. Medical Informatics.

[17]  Iadh Ounis,et al.  Research directions in Terrier: a search engine for advanced retrieval on the Web , 2007 .

[18]  Reid Swanson,et al.  StoryUpgrade: Finding Stories in Internet Weblogs , 2008, ICWSM.