The pervasive sources of data in today's networked computing environment provide many innovative opportunities, from mining patterns of individual behavior, to enabling data-intensive approaches for scientific discovery, to supporting new kinds of personal interactions and experiences. Passively collected metadata can also be mined for a variety of social analysis. However, due to the vast size and diversity of these data resources, they can pose serious computational challenges to researchers and analysts.
This paper highlights several of the key challenges involved in efficiently collecting, storing, and analyzing datasets consisting of millions of sparse files with spatial, temporal, and network features. We focus on the computational issues faced in analyzing Call Detail Records (CDRs), the metadata (i.e., log files) passively collected by mobile phone operators about transactions on their telecommunications networks. CDRs and related data provide a rich foundation for research in fields ranging from anthropology and sociology to electrical engineering and urban planning. After describing the data and its challenges, we present our current framework for computational analysis, and discuss opportunities for future work.
[1]
Marco Gonzalez,et al.
Author's Personal Copy Social Networks Tastes, Ties, and Time: a New Social Network Dataset Using Facebook.com
,
2022
.
[2]
Joshua Evan Blumenstock,et al.
Social and spatial ethnic segregation: a framework for analyzing segregation with large-scale spatial network data
,
2013,
ACM DEV-4 '13.
[3]
Kyumin Lee,et al.
You are where you tweet: a content-based approach to geo-locating twitter users
,
2010,
CIKM.
[4]
Carlo Ratti,et al.
Human mobility prediction based on individual and collective geographical preferences
,
2010,
13th International IEEE Conference on Intelligent Transportation Systems.
[5]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[6]
Hairong Kuang,et al.
The Hadoop Distributed File System
,
2010,
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[7]
Nathan Eagle,et al.
Mobile divides: gender, socioeconomic status, and mobile phone use in Rwanda
,
2010,
ICTD.