Text Mining Methods for Social Representation Analysis in Large Corpora

With mass text digitization (digital libraries, web, etc.), a huge amount of empirical data is now available for scientific inquiry. In social sciences and humanities, the use of statistical text mining methods to analyze these data has become unavoidable. Saadi Lahlou proposed in the mid-90s a coherent framework for the application of these methods to the study of social representation in large corpora. However, despite this initiative, text mining methods have remained marginal in this research program, partly due to a poor understanding of its methodological and theoretical assumptions. There are still many analyses which confound the software with the method. This paper presents an overview and a formalization of a statistical text mining method for the study of social representation, using Lahlou’s works as illustrations. The goal is to look into the software black box while analyzing the steps and the formal operations involved. The linguistic and methodological assumptions are made explicit and alternative algorithmic operationalizations are highlighted.