Clustering Facebook for Biased Context Extraction

Facebook comments and shared posts often convey human biases, which play a pivotal role in information spreading and content consumption, where short information can be quickly consumed, and later ruminated. Such bias is nevertheless at the basis of human-generated content, and being able to extract contexts that does not amplify but represent such a bias can be relevant to data mining and artificial intelligence, because it is what shapes the opinion of users through social media. Starting from the observation that a separation in topic clusters, i.e. sub-contexts, spontaneously occur if evaluated by human common sense, especially in particular domains e.g. politics, technology, this work introduces a process for automated context extraction by means of a class of path-based semantic similarity measures which, using third party knowledge e.g. WordNet, Wikipedia, can create a bag of words relating to relevant concepts present in Facebook comments to topic-related posts, thus reflecting the collective knowledge of a community of users. It is thus easy to create human-readable views e.g. word clouds, or structured information to be readable by machines for further learning or content explanation, e.g. augmenting information with time stamps of posts and comments. Experimental evidence, obtained by the domain of information security and technology over a sample of 9M3k page users, where previous comments serve as a use case for forthcoming users, shows that a simple clustering on frequency-based bag of words can identify the main context words contained in Facebook comments identifiable by human common sense. Group similarity measures are also of great interest for many application domains, since they can be used to evaluate similarity of objects in term of the similarity of the associated sets, can then be calculated on the extracted context words to reflect the collective notion of semantic similarity, providing additional insights on which to reason, e.g. in terms of cognitive factors and behavioral patterns.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[3]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[4]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[5]  Lada A. Adamic,et al.  The role of social networks in information diffusion , 2012, WWW.

[6]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[7]  Valentina Franzoni,et al.  Multi-path traces in semantic graphs for latent knowledge elicitation , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[8]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[9]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[10]  Valentina Franzoni,et al.  Web-based Semantic Similarity for Emotion Recognition in Web Objects , 2016, ArXiv.

[11]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[12]  Paul H. Lewis,et al.  Surveying the Reality of Semantic Image Retrieval , 2005, VISUAL.

[13]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Valentina Franzoni,et al.  A Semantic Comparison of Clustering Algorithms for the Evaluation of Web-Based Similarity Measures , 2016, ICCSA.

[15]  Valentina Franzoni,et al.  PMING Distance: A Collaborative Semantic Proximity Measure , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[16]  Valentina Franzoni,et al.  Heuristic Semantic Walk - Browsing a Collaborative Network with a Search Engine-Based Heuristic , 2013, ICCSA.

[17]  Valentina Franzoni,et al.  Web-Based Similarity for Emotion Recognition in Web Objects , 2016, 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC).

[18]  Valentina Franzoni,et al.  Semantic Heuristic Search in Collaborative Networks: Measures and Contexts , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[19]  Valentina Franzoni,et al.  Set Similarity Measures for Images Based on Collective Knowledge , 2015, ICCSA.

[20]  Valentina Franzoni,et al.  Heuristic semantic walk for concept chaining in collaborative networks , 2014, Int. J. Web Inf. Syst..

[21]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[22]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[23]  Wei-Ying Ma,et al.  Image annotation by large-scale content-based image retrieval , 2006, MM '06.

[24]  Valentina Franzoni,et al.  Heuristics for Semantic Path Search in Wikipedia , 2014, ICCSA.

[25]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[26]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[27]  Valentina Franzoni,et al.  Collective Evolutionary Concept Distance Based Query Expansion for Effective Web Document Retrieval , 2013, ICCSA.

[28]  Yuanxi Li Semantic image similarity based on deep knowledge for effective image retrieval , 2014 .