Classification of abrupt changes along viewing profiles of scientific articles

With the expansion of electronic publishing, a new dynamics of scientific articles dissemination was initiated. Nowadays, many works are widely disseminated even before publication, in the form of preprints. Another important new element concerns the views of published articles. Thanks to the availability of respective data by some journals, such as PLoS ONE, it became possible to develop investigations on how scientific works are viewed along time, often before the first citations appear. This provides the main theme of the present work. More specifically, our research was motivated by preliminary observations that the view profiles along time tend to present a piecewise linear nature. A methodology was then delineated in order to identify the main segments in the view profiles, which allowed several related measurements to be derived. In particular, we focused on the inclination and length of each subsequent segment. Basic statistics indicated that the inclination can vary substantially along subsequent segments, while the segment lengths resulted more stable. Complementary joint statistics analysis, considering pairwise correlations, provided further information about the properties of the views. In order to better understand the view profiles, we performed respective multivariate statistical analysis, including principal component analysis and hierarchical clustering. The results suggest that a portion of the polygonal views are organized into clusters or groups. These groups were characterized in terms of prototypes indicating the relative increase or decrease along subsequent segments. Four respective distinct models were then developed for representing the observed segments. It was found that models incorporating joint dependencies between the properties of the segments provided the most accurate results among the considered alternatives.

[1]  Vito M. R. Muggeo,et al.  Efficient change point detection for genomic sequences of continuous measurements , 2011, Bioinform..

[2]  Luciano da Fontoura Costa,et al.  Learning about knowledge: A complex network approach , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Gunther Eysenbach,et al.  Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact , 2011, Journal of medical Internet research.

[4]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[5]  Johan Bollen,et al.  How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations , 2012, PloS one.

[6]  Mike Thelwall,et al.  Evaluating altmetrics , 2013, Scientometrics.

[7]  Stevan Harnad,et al.  Earlier Web Usage Statistics as Predictors of Later Citation Impact , 2005, J. Assoc. Inf. Sci. Technol..

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  Cesar H. Comin,et al.  Revisiting Agglomerative Clustering , 2020, Physica A: Statistical Mechanics and its Applications.

[10]  Cesar H. Comin,et al.  Principal Component Analysis , 2018, ACM Comput. Surv..

[11]  Joost C. F. de Winter,et al.  The relationship between tweets, citations, and article views for PLOS ONE articles , 2014, Scientometrics.

[12]  A. Watson Comparing citations and downloads for individual articles at the Journal of Vision , 2009 .

[13]  Johan Bollen,et al.  Toward alternative metrics of journal impact: A comparison of download and citation data , 2005, Inf. Process. Manag..

[14]  Diego R. Amancio,et al.  Comparing the topological properties of real and artificially generated scientific manuscripts , 2015, Scientometrics.

[15]  José Luis Ortega,et al.  The life cycle of altmetric impact: A longitudinal study of six metrics from PlumX , 2018, J. Informetrics.