Multiscale Entropy Analysis of Page Views: A Case Study of Wikipedia

In this study, the Wikipedia page views for four selected topics, namely, education, the economy/finance, medicine, and nature/environment from 2016–2018 are collected and the sample entropies of the three years’ page views are estimated and investigated using a short-time series multiscale entropy (sMSE) algorithm for a comprehensible understanding of the complexity of human website searching activities. The sample entropies of the selected topics are found to exhibit different temporal variations. In the past three years, the temporal characteristics of the sample entropies are vividly revealed, and the sample entropies of the selected topics follow the same tendencies and can be quantitatively ranked. By taking the 95% confidence interval into account, the temporal variations of sample entropies are further validated by statistical analysis (non-parametric), including the Wilcoxon signed-rank test and the Mann-Whitney U-test. The results suggest that the sample entropies estimated by the sMSE algorithm are feasible for analyzing the temporal variations of complexity for certain topics, whereas the regular variations of estimated sample entropies of different selected topics can’t simply be accepted as is. Potential explanations and paths in forthcoming studies are also described and discussed.

[1]  Xiaobo Zhang,et al.  Spatial Inequality in Education and Health Care in China , 2003 .

[2]  J. Richman,et al.  Physiological time-series analysis using approximate entropy and sample entropy. , 2000, American journal of physiology. Heart and circulatory physiology.

[3]  Anne Humeau-Heurtier,et al.  The Multiscale Entropy Algorithm and Its Variants: A Review , 2015, Entropy.

[4]  Madalena Costa,et al.  Multiscale entropy analysis of biological signals. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Brice Isableu,et al.  Sample Entropy, Univariate, and Multivariate Multi-Scale Entropy in Comparison with Classical Postural Sway Parameters in Young Healthy Adults , 2017, Front. Hum. Neurosci..

[6]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[7]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[8]  Jun Zhang,et al.  Simlarity Search for Web Services , 2004, VLDB.

[9]  Andrea Fronzetti Colladon,et al.  Using four different online media sources to forecast the crude oil price , 2018, J. Inf. Sci..

[10]  Taha Yasseri,et al.  Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data , 2012, PloS one.

[11]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[12]  D. Pearce,et al.  World without end : economics, environment, and sustainable development - summary , 1993 .

[13]  Hui Ma,et al.  Weighted multivariate composite multiscale sample entropy analysis for the complexity of nonlinear times series , 2018, Physica A: Statistical Mechanics and its Applications.

[14]  Teich,et al.  Fractal renewal processes generate 1/f noise. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[15]  Kjetil Nørvåg,et al.  WikiPop: personalized event detection system based on Wikipedia page view statistics , 2010, CIKM '10.

[16]  Jan W Kantelhardt,et al.  The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks , 2015, PloS one.

[17]  Pengjian Shang,et al.  Symbolic phase transfer entropy method and its application , 2017, Commun. Nonlinear Sci. Numer. Simul..

[18]  Dong-xiu Niu,et al.  Higher education for sustainable development in China , 2010 .

[19]  C. Peng,et al.  What is physiologic complexity and how does it change with aging and disease? , 2002, Neurobiology of Aging.

[20]  Madalena Costa,et al.  Multiscale entropy analysis of complex physiologic time series. , 2002, Physical review letters.

[21]  Luca Faes,et al.  Efficient Computation of Multiscale Entropy over Short Biomedical Time Series Based on Linear State-Space Models , 2017, Complex..

[22]  Pere Caminal,et al.  Refined Multiscale Entropy: Application to 24-h Holter Recordings of Heart Period Variability in Healthy and Aortic Stenosis Subjects , 2009, IEEE Transactions on Biomedical Engineering.

[23]  Pengjian Shang,et al.  Multiscale Symbolic Phase Transfer Entropy in Financial Time Series Classification , 2017 .

[24]  Dominic Moran,et al.  Economic Values and the Environment in the Developing World , 1997 .

[25]  H. Eugene Stanley,et al.  Quantifying Wikipedia Usage Patterns Before Stock Market Moves , 2013, Scientific Reports.

[26]  Peter H. Raven,et al.  China's Environmental Challenges and Implications for the World , 2010 .

[27]  Holger Lausen,et al.  Web Service Search on Large Scale , 2009, ICSOC/ServiceWave.

[28]  Hsien-Tsai Wu,et al.  Application of a Modified Entropy Computational Method in Assessing the Complexity of Pulse Wave Velocity Signals in Healthy and Diabetic Subjects , 2014, Entropy.

[29]  Stefanie Nowak,et al.  Using one-class SVM outliers detection for verification of collaboratively tagged image training sets , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[30]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .