Analysis of Wikipedia pageviews to identify popular chemicals

A new approach to assess popularity relies on analysis of the number of times a web article is viewed. Here, a strategy is described to identify chemicals of widespread interest. The strategy makes use of Wikipedia, a rapidly growing publicly editable web encyclopedia that has become an influential knowledge base. While the total number of chemicals mentioned in Wikipedia is unknown, use of the Wikipedia Chemical Structure Explorer (WCSE) developed by Novartis enables identification of those that are described in an Infobox or Chembox along with a Simplified Molecular-Input Line-Entry system (SMILES) code. Using a Python script, all so-listed chemicals (16,243) in Wikipedia were identified and then sorted on the basis of their pageview rankings. Of the 16,243 chemicals, 846 (5.2%) belonged to controlled substances (United States Drug Enforcement Administration), WHO essential medicines, or the top 300 US drugs. These 846 chemicals received 220 million pageviews, which is 41.4% of the pageviews for all members of the Wikipedia chemical list. The number of chemicals described in the entire corpus of Wikipedia remains a tiny fraction of the <107 known chemicals. Much remains to be done to make the venerable literature and data of chemistry readily accessible. Regardless, identification of popular chemicals in this manner can be used to create selected databases, to tailor educational curricula, or to create targeted informational materials (such as safety brochures); such considerations of public demand are likely to engender corresponding widespread interest.

[1]  Andrew G. West,et al.  Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language , 2015, Journal of medical Internet research.

[2]  Luc Patiny,et al.  Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia , 2015, Journal of Cheminformatics.

[3]  Lisa Palmisano,et al.  Analysis of the accuracy and readability of herbal supplement information on Wikipedia. , 2014, Journal of the American Pharmacists Association : JAPhA.

[4]  Francesco Brigo,et al.  What can Google Trends and Wikipedia-Pageview analysis tell us about the landscape of epilepsy surgery over time? , 2020, Epilepsy & Behavior.

[5]  James M. Hyman,et al.  Forecasting the 2013–2014 Influenza Season Using Wikipedia , 2014, PLoS Comput. Biol..

[6]  H. Eugene Stanley,et al.  Quantifying Wikipedia Usage Patterns Before Stock Market Moves , 2013, Scientific Reports.

[7]  Mounia Lalmas,et al.  Reader preferences and behavior on Wikipedia , 2014, HT.

[8]  J. Lindsey,et al.  Database of Absorption and Fluorescence Spectra of >300 Common Compounds for use in PhotochemCAD , 2018, Photochemistry and photobiology.

[9]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[10]  Richard Grenyer,et al.  Using Wikipedia page views to explore the cultural importance of global reptiles , 2016 .

[11]  Masahiko Taniguchi,et al.  Absorption and fluorescence spectra of organic compounds from 40 sources: archives, repositories, databases, and literature search engines , 2020, BiOS.

[12]  Tara L. Pummer,et al.  Reliability of Wikipedia as a medication information source for pharmacy students , 2011 .

[13]  Coye Cheshire,et al.  Readers are not free-riders: reading as a form of participation on wikipedia , 2010, CSCW '10.

[14]  James M. Dixon,et al.  PhotochemCAD 2: A Refined Program with Accompanying Spectral Databases for Photochemical Calculations¶ , 2005, Photochemistry and photobiology.

[15]  Martin A. Walker Wikipedia as a Resource for Chemistry , 2010 .

[16]  Renke Maas,et al.  Accuracy and Completeness of Drug Information in Wikipedia: A Comparison with Standard Textbooks of Pharmacology , 2014, PloS one.

[17]  Richard Grenyer,et al.  A season for all things: Phenological imprints in Wikipedia usage and their relevance to conservation , 2019, PLoS biology.

[18]  Jan W Kantelhardt,et al.  The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks , 2015, PloS one.

[19]  Thomas Shafee,et al.  Evolution of Wikipedia’s medical content: past, present and future , 2017, Journal of Epidemiology & Community Health.

[20]  C. Haigh Wikipedia as an evidence source for nursing and healthcare students. , 2011, Nurse education today.

[21]  X. Zhang,et al.  Group Size and Incentives to Contribute: A Natural Experiment at Chinese Wikipedia , 2010 .

[22]  Jonathan S. Lindsey,et al.  Developing a user community in the photosciences: a website for spectral data and access to PhotochemCAD , 2019, BiOS.

[23]  Taha Yasseri,et al.  Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data , 2012, PloS one.

[24]  Finn Årup Nielsen,et al.  “The sum of all human knowledge”: A systematic review of scholarly research on the content of Wikipedia , 2015, J. Assoc. Inf. Sci. Technol..

[25]  J. Lindsey,et al.  PhotochemCAD ‡ : A Computer‐Aided Design and Research Tool in Photochemistry , 1998 .

[26]  Benjamin K. Smith,et al.  Using Wikipedia to Predict Election OutcomesOnline Behavior as a Predictor of Voting , 2017 .

[27]  Emily Hawkins,et al.  A Dictionary of the Hawaiian Language , 2003 .

[28]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..