Data Analytics in Text Messages: A Mobile Network Operator Case Study

This paper explores the application of different data mining and machine learning algorithms to propose an effective technique to filter out spam SMSs. Due to high competitive nature of MNO business; filtering spam SMSs will have a great impact on the protection of business and profit making. This is mostly because subscribers refuse to use the services of MNOs that are not vigilant about spam SMSs. Based on the CRISP-DM method which is an open standard process model for data analytics projects, machine learning algorithms and data preparation methods have been conducted on a MNO unstructured dataset to transform characters, delete stop words, extract word stems, roots, N-Grams, and classification. Next, numerical Vector Space Models were created utilizing all four types of word vector creation methods. After producing test and train models with machine learning algorithms; accuracy and error rate, recall, precision and the area under curve for each classification algorithm has been measured. Finally, the Bagging algorithm by implementing Binary Term Occurrence vector space creation method showed the highest efficiency rate which can have the highest application in the big data ecosystem of the industry for spam filtering.

[1]  Qiang Yang,et al.  SMS Spam Detection Using Noncontent Features , 2012, IEEE Intelligent Systems.

[2]  Wei Li,et al.  A Vector Space Model based spam SMS filter , 2016, 2016 11th International Conference on Computer Science & Education (ICCSE).

[3]  Gordon V. Cormack,et al.  Spam filtering for short messages , 2007, CIKM '07.

[4]  Daniel Castro How to Stop the Billions Wasted Annually on Email Spam , 2013 .

[5]  Krys J. Kochut,et al.  A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques , 2017, ArXiv.

[6]  Wei Hu,et al.  Spam filtering by semantics-based text classification , 2016, 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI).

[7]  Hadeel Alazzam,et al.  A distributed Arabic text classification approach using latent semantic analysis for big data , 2017, 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT).

[8]  Michael Minelli,et al.  Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses , 2012 .

[9]  José María Gómez Hidalgo,et al.  Content based SMS spam filtering , 2006, DocEng '06.

[10]  José María Gómez Hidalgo,et al.  Evaluating cost-sensitive Unsolicited Bulk Email categorization , 2002, SAC '02.

[11]  Tiago A. Almeida,et al.  Towards SMS Spam Filtering: Results under a New Dataset , 2013 .

[12]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[13]  Houshmand Shirani-mehr,et al.  SMS Spam Detection Using Machine Learning Approach , 2024, INTERNATIONAL JOURNAL OF RESEARCH IN SCIENCE AND TECHNOLOGY.

[14]  Yanqing Zhang,et al.  Using Word2Vec to process big text data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[15]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[16]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[17]  Rüdiger Wirth,et al.  CRISP-DM: Towards a Standard Process Model for Data Mining , 2000 .

[18]  Gordon V. Cormack,et al.  Spam and the ongoing battle for the inbox , 2007, CACM.

[19]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[20]  Ajay Rana,et al.  SMS Spam Filtering Using Supervised Machine Learning Algorithms , 2018, 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence).

[21]  Patrick Traynor,et al.  Sending Out an SMS: Characterizing the Security of the SMS Ecosystem with Public Gateways , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[22]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[23]  Christos Faloutsos,et al.  Suspicious Behavior Detection: Current Trends and Future Directions , 2016, IEEE Intelligent Systems.

[24]  El-Sayed M. El-Alfy,et al.  Dendritic Cell Algorithm for Mobile Phone Spam Filtering , 2015, ANT/SEIT.