Exploratory novelty identification in human activity data streams

Heterogeneous human-generated data streams are the measurands which provide opportunities to identify patterns, detect novelties and explore evolution of complex social systems. Communication technologies with their very high penetration into society can serve as particularly rich sources of information. However, for a variety of observable communication channels one has little or no access to the content of human-to-human communications, while the data streams on the intensities of such events are more common. The paper presents a framework of methods useful for exploratory analysis and visualization of such data streams. Particularly, we demonstrate how untypical activity levels can be identified by fitting a non-homogeneous Markov-modulated Poisson process and spatialising the component corresponding to unusual bursts/lulls of activity via heat maps. This approach is illustrated with a case study devoted to the analysis of geo-referenced data streams of instant messaging activity on the internet.