Identifying Unproven Cancer Treatments on the Health Web: Addressing Accuracy, Generalizability and Scalability

Building machine learning models that identify unproven cancer treatments on the Health Web is a promising approach for dealing with the dissemination of false and dangerous information to vulnerable health consumers. Aside from the obvious requirement of accuracy, two issues are of practical importance in deploying these models in real world applications. (a) Generalizability: The models must generalize to all treatments (not just the ones used in the training of the models). (b) Scalability: The models can be applied efficiently to billions of documents on the Health Web. First, we provide methods and related empirical data demonstrating strong accuracy and generalizability. Second, by combining the MapReduce distributed architecture and high dimensionality compression via Markov Boundary feature selection, we show how to scale the application of the models to WWW-scale corpora. The present work provides evidence that (a) a very small subset of unproven cancer treatments is sufficient to build a model to identify unproven treatments on the web; (b) unproven treatments use distinct language to market their claims and this language is learnable; (c) through distributed parallelization and state of the art feature selection, it is possible to prepare the corpora and build and apply models with large scalability.

[1]  E Ernst,et al.  Assessing websites on complementary and alternative medicine for cancer. , 2004, Annals of oncology : official journal of the European Society for Medical Oncology.

[2]  Naoky Tsai,et al.  Fatal Hepatorenal Failure Associated with Hydrazine Sulfate , 2000, Annals of Internal Medicine.

[3]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[4]  Richard Whittington,et al.  A multi-institutional study of Internet utilization by radiation oncology patients. , 2003, International Journal of Radiation Oncology, Biology, Physics.

[5]  Alex Sparreboom,et al.  Herbal remedies in the United States: potential adverse interactions with anticancer agents. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[6]  S. Singletary,et al.  Complementary/alternative medicine use in a comprehensive cancer center and the implications for oncology. , 2000, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Jonathan Bromley,et al.  Life-Threatening Interaction Between Complementary Medicines: Cyanide Toxicity Following Ingestion of Amygdalin and Vitamin C , 2005, The Annals of pharmacotherapy.

[9]  Eun Mi Nam,et al.  Assessing cancer treatment related information online: unintended retrieval of complementary and alternative medicine web sites. , 2009, European journal of cancer care.