Analyzing mobile phone usage using clustering in Spark MLLib and Pig

K-means is a common method of clustering data points using a predefined number of clusters. Apache Spark is a computing technology used for fast computation of data. By making use of its machine learning library called MLLib, we analyze mobile data obtained from Opencellid.org by clustering according to latitude and longitude values ,using K-means algorithm. Once each data point is assigned its cluster number , the dataset is loaded into Apache Pig to calculate the number of users in each cluster. Thus, we can analyse the number of users using a mobile network in a particular range of latitude and longitude. Keywords: Spark, Pig, clustering, mobile, data, analysis