Assessing Unmet Information Needs of Breast Cancer Survivors: Exploratory Study of Online Health Forums Using Text Classification and Retrieval

Background Patient education materials given to breast cancer survivors may not be a good fit for their information needs. Needs may change over time, be forgotten, or be misreported, for a variety of reasons. An automated content analysis of survivors' postings to online health forums can identify expressed information needs over a span of time and be repeated regularly at low cost. Identifying these unmet needs can guide improvements to existing education materials and the creation of new resources. Objective The primary goals of this project are to assess the unmet information needs of breast cancer survivors from their own perspectives and to identify gaps between information needs and current education materials. Methods This approach employs computational methods for content modeling and supervised text classification to data from online health forums to identify explicit and implicit requests for health-related information. Potential gaps between needs and education materials are identified using techniques from information retrieval. Results We provide a new taxonomy for the classification of sentences in online health forum data. 260 postings from two online health forums were selected, yielding 4179 sentences for coding. After annotation of data and training alternative one-versus-others classifiers, a random forest-based approach achieved F1 scores from 66% (Other, dataset2) to 90% (Medical, dataset1) on the primary information types. 136 expressions of need were used to generate queries to indexed education materials. Upon examination of the best two pages retrieved for each query, 12% (17/136) of queries were found to have relevant content by all coders, and 33% (45/136) were judged to have relevant content by at least one. Conclusions Text from online health forums can be analyzed effectively using automated methods. Our analysis confirms that breast cancer survivors have many information needs that are not covered by the written documents they typically receive, as our results suggest that at most a third of breast cancer survivors’ questions would be addressed by the materials currently provided to them.

[1]  Mary S Vaughan Sarrazin,et al.  Patient Perspectives of Dabigatran: Analysis of Online Discussion Forums , 2013, The Patient - Patient-Centered Outcomes Research.

[2]  David R. Traum,et al.  20 Questions on Dialogue Act Taxonomies , 2000, J. Semant..

[3]  Emily M. Cramer,et al.  A two-way text-messaging system answering health questions for low-income pregnant women. , 2013, Patient education and counseling.

[4]  C. V. van Uden-Kraan,et al.  Self-Reported Differences in Empowerment Between Lurkers and Posters in Online Patient Support Groups , 2008, Journal of medical Internet research.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Halil Kilicoglu,et al.  Automatically Classifying Question Types for Consumer Health Questions , 2014, AMIA.

[7]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[8]  Julie Hepworth,et al.  An Interpretative Phenomenological Analysis of Participation in a Pro-anorexia Internet Site and Its Relationship with Disordered Eating , 2006, Journal of health psychology.

[9]  Emmanuel Joseph Fong,et al.  Unmet Supportive Care Needs among Breast Cancer Survivors of Community-Based Support Group in Kuching, Sarawak , 2016, International journal of breast cancer.

[10]  Byeong‐Woo Park,et al.  Unmet Needs of Breast Cancer Patients Relative to Survival Duration , 2011, Yonsei medical journal.

[11]  Li Chen,et al.  A Linear-Chain CRF-Based Learning Approach for Web Opinion Mining , 2010, WISE.

[12]  ChengXiang Zhai,et al.  Understanding User Intents in Online Health Forums , 2015, IEEE Journal of Biomedical and Health Informatics.

[13]  Susan McRoy,et al.  Toward automated classification of consumers’ cancer-related questions with a new taxonomy of expected answer types , 2016, Health Informatics J..

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Halil Kilicoglu,et al.  Decomposing Consumer Health Questions , 2014, BioNLP@ACL.

[16]  K. White,et al.  Perceived information needs and social support of Chinese-Australian breast cancer survivors , 2014, Supportive Care in Cancer.

[17]  A. Girgis,et al.  Brief assessment of adult cancer patients' perceived needs: development and validation of the 34-item Supportive Care Needs Survey (SCNS-SF34). , 2009, Journal of evaluation in clinical practice.

[18]  T. Furukawa,et al.  Patients' supportive care needs and psychological distress in advanced breast cancer patients in Japan. , 2011, Japanese journal of clinical oncology.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  B. Rimer,et al.  How Cancer Survivors Provide Support on Cancer-Related Internet Mailing Lists , 2007, Journal of medical Internet research.

[21]  William M. Pottenger,et al.  Posting Act Tagging Using Transformation-Based Learning , 2005, Foundations of Data Mining and knowledge Discovery.

[22]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[23]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[24]  Eric N. Forsyth Improving automated lexical and discourse analysis of online chat dialog , 2007 .

[25]  Craig H. Martell,et al.  Lexical and Discourse Analysis of Online Chat Dialog , 2007, International Conference on Semantic Computing (ICSC 2007).