Identification of Potentially Relevant Citeable Articles using Association Rule Mining

Due to the increasingly larger and more interdisciplinary nature of scientific reporting, it is becoming more difficult to identify all the potentially relevant, citeable articles in reference lists of publications such as scientific papers, reports, grant proposals and patent applications. Authors may miss and/or give inaccurate citations, potentially hindering progress in a discipline and on a personal level, and change the importance and impact of an investigator’s work. Given the emphasis on quantitative means for assessing productivity, including the number of literature citations, efforts are needed to assist authors in the identification of potentially relevant articles to cite. Prior work has analyzed citation network structure and characteristic features and correlated these with other variables, such as country of origin, journal impact factor and open access status. As a result, problems have been revealed, such as underrepresentation of third-world countries, a high incidence of self-citation, and unsystematic quotation habits in review articles. With the exception of gross plagiarism detection software, however, no attempt has been made to develop a practical solution to identifying potentially relevant, citeable articles that may have been missed. Here, we use statistical methods to help in the retrieval of relevant literature from existing publications. Specifically, we exploit the fact that publications reporting specific findings are typically quoted together as grouped-co-citations in their respective contexts. Our approach can automatically construct rules for co-citation by automatically extracting co-citation overrepresentations in manuscripts. This approach should help authors and reviewers identify potentially relevant, citeable articles.

[1]  J. Ravetz Sociology of Science , 1972, Nature.

[2]  Tobias Scheffer,et al.  Finding association rules that trade support optimally against confidence , 2001, Intell. Data Anal..

[3]  A. Baltussen,et al.  Citation classics in critical care medicine , 2004, Intensive Care Medicine.

[4]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[5]  A. Baltussen,et al.  Citation Classics in Anesthetic Journals , 2004, Anesthesia and analgesia.

[6]  Katherine W. McCain Assessing an author's influence using time series historiographic mapping: The oeuvre of conrad hal waddington (1905-1975) , 2008, J. Assoc. Inf. Sci. Technol..

[7]  A. Malviya,et al.  Who publishes in leading general surgical journals? The divide between the developed and developing worlds. , 2006, Asian journal of surgery.

[8]  V. Patel,et al.  Contribution of low- and middle-income countries to research published in leading general psychiatry journals, 2002–2004 , 2007, British Journal of Psychiatry.

[9]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  K. Kaltenborn Commentary III - Validity and fairness of the impact factor – - a comment on the article by Decker et al. , 2003, Sozial- und Präventivmedizin.

[11]  V. Patel,et al.  International representation in psychiatric literature , 2001, British Journal of Psychiatry.

[12]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[13]  James E Andrews,et al.  An author co-citation analysis of medical informatics. , 2003, Journal of the Medical Library Association : JMLA.

[14]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[15]  Robert L. Goldstone,et al.  The simultaneous evolution of author and paper networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Abel L Packer,et al.  Articles with authors affiliated to Brazilian institutions published from 1994 to 2003 with 100 or more citations: I - the weight of international collaboration and the role of the networks. , 2006, Anais da Academia Brasileira de Ciencias.

[17]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[18]  Katherine W. McCain,et al.  Descriptor and citation retrieval in the Medical Behavioral Sciences literature: Retrieval overlaps and novelty distribution , 1989, JASIS.

[19]  Julian Birkinshaw,et al.  Scientific Collaboration Results in Higher Citation Rates of Published Articles , 2006, Pharmacotherapy.

[20]  Charles Oppenheim,et al.  The citation advantage of open-access articles , 2008, J. Assoc. Inf. Sci. Technol..

[21]  M. Schein,et al.  One Hundred Citation Classics in General Surgical Journals , 2002, World Journal of Surgery.

[22]  Katherine W. McCain,et al.  Mapping authors in intellectual space: A technical overview , 1990, Journal of the American Society for Information Science.

[23]  D. Laband,et al.  A citation analysis of the impact of blinded peer review. , 1994, JAMA.

[24]  Chien-Chang Lee,et al.  Top-cited articles in emergency medicine. , 2006, The American journal of emergency medicine.

[25]  Eugene Garfield,et al.  THE USE OF CITATION DATA IN WRITING THE HISTORY OF SCIENCE , 1964 .

[26]  Kathleen M. Carley,et al.  Research Paper: A Longitudinal Social Network Analysis of the Editorial Boards of Medical Informatics and Bioinformatics Journals , 2007, J. Am. Medical Informatics Assoc..

[27]  O. Tutarel Composition of the editorial boards of leading medical education journals , 2004, BMC medical research methodology.

[28]  D. Horrobin Something rotten at the core of science? , 2001, Trends in pharmacological sciences.

[29]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[30]  Jean Tague-Sutcliffe,et al.  An Introduction to Informetrics , 1992, Inf. Process. Manag..

[31]  Vikram Patel,et al.  Under-representation of developing countries in the research literature: ethical issues arising from a survey of five leading medical journals , 2004, BMC medical ethics.

[32]  Abel L Packer,et al.  Articles with authors affiliated to Brazilian institutions published from 1994 to 2003 with 100 or more citations: II - identification of thematic nuclei of excellence in Brazilian science. , 2006, Anais da Academia Brasileira de Ciencias.

[33]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[34]  R Brian Haynes,et al.  Author self-citation in the diabetes literature , 2004, Canadian Medical Association Journal.

[35]  C Scully,et al.  Impact factors and their significance; overrated or misused? , 2005, British Dental Journal.

[36]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Carlos Guestrin,et al.  Beyond keyword search: discovering relevant scientific literature , 2011, KDD.

[38]  Eugene Garfield,et al.  Historiographic Mapping of Knowledge Domains Literature , 2004, J. Inf. Sci..

[39]  R. Wears,et al.  Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. , 2002, JAMA.