Creation and clustering of proximity data for text data analysis