An architecture for the aggregation and analysis of scholarly usage data

Although recording of usage data is common in scholarly information services, its exploitation for the creation of value-added services remains limited due to concerns regarding, among others, user privacy, data validity, and the lack of accepted standards for the representation, sharing and aggregation of usage data. This paper presents a technical, standards-based architecture for sharing usage information, which we have designed and implemented. In this architecture, OpenURL-compliant linking servers aggregate usage information of a specific user community as it navigates the distributed information environment that it has access to. This usage information is made OAI-PMH harvestable so that usage information exposed by many linking servers can be aggregated to facilitate the creation of value-added services with a reach beyond that of a single community or a single information service. This paper also discusses issues that were encountered when implementing the proposed approach, and it presents preliminary results obtained from analyzing a usage data set containing about 3,500,000 requests aggregated by a federation of linking servers at the California State University system over a 20 month period

[1]  Herbert Van de Sompel,et al.  Open Linking in the Scholarly Information Environment Using the OpenURL Framework , 2001, D Lib Mag..

[2]  J. Avery,et al.  The long tail. , 1995, Journal of the Tennessee Medical Association.

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Ravi Kothari,et al.  On using Page Cooccurrences for Computing Clickstream Similarity , 2003, SDM.

[5]  J. Benichou,et al.  Reading factor: a new bibliometric criterion for managing digital libraries. , 2002, Journal of the Medical Library Association : JMLA.

[6]  Johan Bollen,et al.  Toward alternative metrics of journal impact: A comparison of download and citation data , 2005, Inf. Process. Manag..

[7]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[8]  Luísa Araújo,et al.  The Literacy Development of Kindergarten English-Language Learners , 2002 .

[9]  A. Goldbeter,et al.  Modeling the mammalian circadian clock: sensitivity analysis and multiplicity of oscillatory mechanisms. , 2004, Journal of Theoretical Biology.

[10]  Johan Bollen,et al.  Evaluation of Digital Library Impact and User Communities by Analysis of Usage Patterns , 2002, D Lib Mag..

[11]  Tao Luo,et al.  Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.

[12]  Johan Bollen,et al.  Detecting Research Trends in Digital Library Readership , 2003, ECDL.

[13]  K. Oishi,et al.  Effect of feeding on peripheral circadian rhythms and behaviour in mammals , 2004, Genes to cells : devoted to molecular & cellular mechanisms.

[14]  Edward A. Fox,et al.  An XML Log Standard and Tool for Digital Library Logging Analysis , 2002, ECDL.

[15]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[16]  Herbert Van de Sompel,et al.  Using the OAI-PMH ... Differently , 2003, D Lib Mag..

[17]  Brenda K. Gorman,et al.  Phonological Awareness in Spanish , 2003 .

[18]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[19]  San-Yih Hwang,et al.  A prototype WWW literature recommendation system for digital libraries , 2003, Online Inf. Rev..

[20]  Jock D. Mackinlay,et al.  Visualizing the evolution of Web ecologies , 1998, CHI.