Exploring demographic information in online social networks for improving content classification

Abstract The daily interaction between users within online social networks (OSNs) is an effective way to analyze and interpret its context in real time in order to capture the interests, preferences, and concerns of the OSNs users. These offer a unique information source for several applications in several fields such as trendsetting, future prediction, recommendation systems, community detection, and marketing. Most of the existing studies on text classification in OSNs rely on content based approach, in order to capture users interests through exploiting and categorizing the unstructured textual content shared by those users according to their topics. Moreover, users public profiles available on OSNs often reveal their demographic attributes such as age, gender, education, marital status, etc., which can play an essential role in identifying users interests and preferences. User demographic attributes can provide some preferences for some topics of interests. People with different demographic attributes may be interested in different topics, while people with similar demographic attributes may have the same interests. Usually, young people are more interested in technology than old people, who are more interested in the political news than young people. In this paper, we propose a demographic-content-based approach which uses both users demographic attributes and the textual content to classify OSNs posts using six classifiers ANN, k-NN, Naive Bayes, Decision Tree, Decision rules and SVM. The experiments are done on a large Facebook dataset in order to analyze the effect of these demographic attributes on the performance of the categorization of the shared textual content in OSNs.

[1]  Fawaz S. Al-Anzi,et al.  Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing , 2017, J. King Saud Univ. Comput. Inf. Sci..

[2]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3]  Teruo Higashino,et al.  Twitter user profiling based on text and community mining for market analysis , 2013, Knowl. Based Syst..

[4]  Graça Bressan,et al.  Age Groups Classification in Social Network Using Deep Learning , 2017, IEEE Access.

[5]  F. Z. Laallam,et al.  Opinion Extraction and Classification of Real-Time YouTube Cooking Recipes Comments , 2018, AMLTA.

[6]  Fawaz S. Al-Anzi,et al.  Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach , 2018, Inf. Process. Manag..

[7]  Craig MacDonald,et al.  Using word embeddings in Twitter election classification , 2016, Information Retrieval Journal.

[8]  Kichun Lee,et al.  Opinion mining using ensemble text hidden Markov models for text classification , 2018, Expert Syst. Appl..

[9]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[10]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[11]  Rada Mihalcea,et al.  What Men Say, What Women Hear: Finding Gender-Specific Meaning Shades , 2016, IEEE Intelligent Systems.

[12]  Mohammed Al-Sarem,et al.  Feature selection using an improved Chi-square for Arabic text classification , 2020, J. King Saud Univ. Comput. Inf. Sci..

[13]  Adrian Bilski,et al.  A Review of Artificial Intelligence Algorithms in Document Classification , 2011 .

[14]  Bo Yu,et al.  Latent semantic analysis for text categorization using neural network , 2008, Knowl. Based Syst..

[15]  Gan Keng Hoon,et al.  Term weighting scheme for short-text classification: Twitter corpuses , 2019, Neural Computing and Applications.

[16]  Tianfang Yao,et al.  Gender Classification of Chinese Weibo Users , 2017, ICEEG 2017.

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18]  Pradip Kumar Bala,et al.  Gender classification of microblog text based on authorial style , 2017, Inf. Syst. E Bus. Manag..

[19]  Lun-Wei Ku,et al.  UTCNN: a Deep Learning Model of Stance Classification on Social Media Text , 2016, COLING.