A clustering approach for topic filtering within systematic literature reviews

Within a systematic literature review (SLR), researchers are confronted with vast amounts of articles from scientific databases, which have to be manually evaluated regarding their relevance for a certain field of observation. The evaluation and filtering phase of prevalent SLR methodologies is therefore time consuming and hardly expressible to the intended audience. The proposed method applies natural language processing (NLP) on article meta data and a k-means clustering algorithm to automatically convert large article corpora into a distribution of focal topics. This allows efficient filtering as well as objectifying the process through the discussion of the clustering results. Beyond that, it allows to quickly identify scientific communities and therefore provides an iterative perspective for the so far linear SLR methodology.• NLP and k-means clustering to filter large article corpora during systematic literature reviews.• Automated clustering allows filtering very efficiently as well as effectively compared to manual selection.• Presentation and discussion of the clustering results helps to objectify the nontransparent filtering step in systematic literature reviews.

[1]  N. Eberhardt Conducting Research Literature Reviews From The Internet To Paper , 2016 .

[2]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[3]  Guy Paré,et al.  Synthesizing information systems knowledge: A typology of literature reviews , 2015, Inf. Manag..

[4]  Angela Boland,et al.  Doing a Systematic Review. A Student's Guide. Third Edition , 2023, Revue d'Épidémiologie et de Santé Publique.

[5]  M. Ali Fauzi,et al.  Optimizing K-means text document clustering using latent semantic indexing and pillar algorithm , 2017, 2017 5th International Symposium on Computational and Business Intelligence (ISCBI).

[6]  Björn Niehaves,et al.  Reconstructing the giant: On the importance of rigour in documenting the literature search process , 2009, ECIS.

[7]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[10]  Cesar H. Comin,et al.  Clustering algorithms: A comparative approach , 2016, PloS one.

[11]  Yair Levy,et al.  A Systems Approach to Conduct an Effective Literature Review in Support of Information Systems Research , 2006, Informing Sci. Int. J. an Emerg. Transdiscipl..

[12]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[13]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[14]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[15]  R. Bellman Dynamic programming. , 1957, Science.

[16]  Andrew Booth,et al.  "Brimful of STARLITE": toward standards for reporting literature searches. , 2006, Journal of the Medical Library Association : JMLA.

[17]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[18]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .