Search for Meaning Through the Study of Co-occurrences in Texts

In this paper, we combine several tools used in text-mining in order to study both the lexicon and the semantic structure of a set of medieval texts. On the one hand, the study of occurrences (Principal Component Analysis, Topic Models, Self-Organizing Maps, Hierarchical Cluster Analysis) allows a wide scope of tools to extract and display information from big data. On the other hand, the study of co-occurrences (words belonging to a sentence, a paragraph) allows to keep track of the structure of each text, but is more tedious to handle and often leads to messy visualizations. Here we use the SOM algorithm to reduce the size of the data (clustering, removal of fickle information) while preserving the semantic structure ; then we can rely on classical but slower algorithms (HCA, graph representation) to purpose data visualization.