Towards a Better Similarity Measure for Keyword Profiling via Clustering

Automatic profiling for users and postings can help law enforcement units cluster and classify users and postings effectively so that potential problematic users and postings can be identified easily. A core problem in this application is to come up with effective profiles and a good measure to compare the similarity of two profiles. In this paper, we investigate an existing keyword-based user profiling scheme and identify its limitations. Then, we propose an improved version of it and demonstrate that our proposed version is more consistent than the existing approach with respect to the observed replied rates of a user to a posting based on the similarity of the profiles.