Automated content analysis: addressing the big literature challenge in ecology and evolution

Summary 1.The exponential growth of scientific literature—which we call the “big literature” phenomenon—has created great challenges in literature comprehension and synthesis. The traditional manual literature synthesis processes are often unable to take advantage of big literature due to human limitations in time and cognition, creating the need for new literature synthesis methods to address this challenge. 2.In this paper, we discuss a highly useful literature synthesis approach, Automated Content Analysis (ACA), which has not yet been widely adopted in the fields of ecological and evolutionary biology. ACA is a suite of machine-learning tools for the qualitative and quantitative synthesis of big literature commonly used in the social sciences and in medical research. 3.Our goal is to introduce ecologists and evolutionary biologists to ACA and illustrate its capacity to synthesize overwhelming volumes of literature. First, we provide a brief history of the ACA method and summarize the fundamental process of ACA. Next, we present two ACA studies to illustrate the utility and versatility of ACA in synthesizing ecological and evolutionary literature. Finally, we discuss how to maximize the utility and contributions of ACA, as well as potential research directions that may help to advance the use of ACA in future ecological and evolutionary research. 4.Unlike manual methods of literature synthesis, ACA is able to process high volumes of literature at substantially shorter timespans, while helping to mitigate human biases. The overall efficiency and versatility of this method allows for a broad range of applications for literature review and synthesis, including both exploratory reviews and systematic reviews aiming to address more targeted research questions. By allowing for more extensive and comprehensive review of big literature, ACA has the potential to fill an important methodological gap and to therefore contribute to the advancement of ecological and evolutionary research. This article is protected by copyright. All rights reserved.

[1]  Kerrie Mengersen,et al.  Handbook of Meta-analysis in Ecology and Evolution , 2013 .

[2]  Michelle Rita Grech,et al.  Human Error in Maritime Operations: Analyses of Accident Reports Using the Leximancer Tool , 2002 .

[3]  Shonil A. Bhagwat,et al.  The history of deforestation and forest fragmentation: a global perspective. , 2014 .

[4]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[5]  Sorrek Penn-Edwards,et al.  Computer Aided Phenomenography: The R ole of Leximancer Computer Soft ware in Phenomenographic Investigation , 2010 .

[6]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[7]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[8]  C. Gallois,et al.  Mapping a 40-Year History With Leximancer: Themes and Concepts in the Journal of Cross-Cultural Psychology , 2010 .

[9]  Jia Zeng,et al.  A New Approach to Speeding Up Topic Modeling , 2012, ArXiv.

[10]  Kurt Hornik,et al.  topicmodels : An R Package for Fitting Topic Models , 2016 .

[11]  M. Vilà,et al.  Plant invasions in the landscape , 2011, Landscape Ecology.

[12]  Jeffrey Braithwaite,et al.  Clinical governance: a review of key concepts in the literature , 2011 .

[13]  Volker Walter,et al.  Object-based classification of remote sensing data for change detection , 2004 .

[14]  Janet Wiles,et al.  Use of an automatic content analysis tool: A technique for seeing both local and global scope , 2009, Int. J. Hum. Comput. Stud..

[15]  Andrew E. Smith,et al.  Evaluation of unsupervised semantic mapping of natural language with Leximancer concept mapping , 2006, Behavior research methods.

[16]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[17]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[18]  Lina Tang,et al.  Market-oriented forestry in China promotes forestland productivity , 2014, New Forests.

[19]  G. Arnqvist,et al.  Meta-analysis: synthesizing research findings in ecology and evolution. , 1995, Trends in ecology & evolution.

[20]  Timothy Baldwin,et al.  Evaluating topic models for digital libraries , 2010, JCDL '10.

[21]  A. Pullin,et al.  Guidelines for Systematic Review in Conservation and Environmental Management , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Andrew E. Smith Automatic Extraction of Semantic Networks from Text using Leximancer , 2003, NAACL.

[24]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[25]  C. W. Roberts,et al.  A Conceptual Framework for Quantitative Text Analysis , 2000 .

[26]  Weizhong Zhao,et al.  Topic modeling for cluster analysis of large biological and medical datasets , 2014, BMC Bioinformatics.

[27]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Songlin Fei,et al.  Evaluating the evolution of forest restoration research in a changing world: a “big literature” review , 2015, New Forests.

[30]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[31]  Cornelia Zuell,et al.  Text Analysis Software: Commonalities, Differences and Limitations: The Results of a Review , 2000 .

[32]  B Downe-Wamboldt,et al.  Content analysis: method, applications, and issues. , 1992, Health care for women international.

[33]  Timothy D. Wilson,et al.  Telling more than we can know: Verbal reports on mental processes. , 1977 .

[34]  Thomas Blaschke,et al.  Object based image analysis for remote sensing , 2010 .