Ten challenges in modeling bibliographic data for bibliometric analysis

The complexity and variety of bibliographic data is growing, and efforts to define new methodologies and techniques for bibliometric analysis are intensifying. In this complex scenario, one of the most crucial issues is the quality of data and the capability of bibliometric analysis to cope with multiple data dimensions. Although the problem of enforcing a multidimensional approach to the analysis and management of bibliographic data is not new, a reference design pattern and a specific conceptual model for multidimensional analysis of bibliographic data are still missing. In this paper, we discuss ten of the most relevant challenges for bibliometric analysis when dealing with multidimensional data, and we propose a reference data model that, according to different goals, can help analysis designers and bibliographic experts in working with large collections of bibliographic data.

[1]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[2]  Mónica Benito,et al.  Improving quality assessment of composite indicators in university rankings: a case study of French and German universities of excellence , 2011, Scientometrics.

[3]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[4]  Dietmar Wolfram Applications of SQL for informetric frequency distribution processing , 2006, Scientometrics.

[5]  Silvana Castano,et al.  On the Ontology Instance Matching Problem , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[6]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[7]  Matthew E Falagas,et al.  Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses , 2007, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[8]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[9]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[10]  J. J. Hubert,et al.  Bibliometric models for journal productivity , 1977 .

[11]  Lei Wang,et al.  Three options for citation tracking: Google Scholar, Scopus and Web of Science , 2006, Biomedical digital libraries.

[12]  Timos K. Sellis,et al.  A survey of logical models for OLAP databases , 1999, SGMD.

[13]  Christina M. Mastrangelo,et al.  Multilevel Statistical Models, 4th edition , 2011 .

[14]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[15]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[16]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[17]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[18]  Harvey Goldstein,et al.  League Tables and Their Limitations: Statistical Issues in Comparisons of Institutional Performance , 1996 .

[19]  Nicolai Mallig,et al.  A relational database for bibliometric analysis , 2010, J. Informetrics.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Silvia Salini,et al.  Bibliometric indicators for statisticians : critical assessment in the Italian context , 2010 .

[22]  Massimo Franceschet,et al.  A cluster analysis of scholar and journal bibliometric indicators , 2009, J. Assoc. Inf. Sci. Technol..

[23]  Harvey Goldstein,et al.  Multilevel Statistical Models: Goldstein/Multilevel Statistical Models , 2010 .

[24]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[25]  Gideon S. Mann,et al.  Bibliometric impact measures leveraging topic analysis , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[26]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[27]  Marco Geraci,et al.  Where do Italian universities stand? An in-depth statistical analysis of national and international rankings , 2011, Scientometrics.

[28]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[29]  E. F. Codd,et al.  Providing OLAP to User-Analysts: An IT Mandate , 1998 .

[30]  Ron S. Kenett,et al.  Modern analysis of customer satisfaction surveys: comparison of models and integrated analysis , 2011 .

[31]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007, J. Assoc. Inf. Sci. Technol..

[32]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[33]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[34]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[35]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[36]  Tomás Aluja,et al.  Book review: Multiple correspondence analysis and related methods. Greenacre, M. and Blasius, J. Chapman & Hall/CRC, 2006. , 2006 .

[37]  C. Elkan,et al.  Topic Models , 2008 .

[38]  Panos Vassiliadis,et al.  Modeling multidimensional databases, cubes and cube operations , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[39]  H. Goldstein Multilevel Statistical Models , 2006 .

[40]  Jean-François Molinari,et al.  A new methodology for ranking scientific institutions , 2008, Scientometrics.

[41]  Gaj Vidmar,et al.  OLAP and bibliographic databases , 2003, Scientometrics.

[42]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[43]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[44]  Wolfgang Glänzel,et al.  A new classification scheme of science fields and subfields designed for scientometric evaluation purposes , 2004, Scientometrics.

[45]  Peter Vinkler,et al.  The Evaluation of Research by Scientometric Indicators , 2010 .

[46]  Ourania Filippakou,et al.  The world‐class league tables and the sustaining of international reputations in higher education , 2009 .

[47]  B. Martin,et al.  Foresight in Science: Picking the Winners , 1984 .

[48]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[49]  Matteo Golfarelli,et al.  Data Warehouse Design: Modern Principles and Methodologies , 2009 .

[50]  Fletcher T. H. Cole,et al.  Object-relational data modelling for informetric databases , 2008, J. Informetrics.

[51]  Hamish Coates,et al.  Universities on the Catwalk: Models for Performance Ranking in Australia , 2007 .

[52]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .