论文信息 - Word Embeddings for User Profiling in Online Social Networks

Word Embeddings for User Profiling in Online Social Networks

User profiling in social networks can besignificantly augmented by using available full-text itemssuch as posts or statuses and ratings (in the form oflikes) that users give them. In this work, we applymodern natural language processing techniques basedon word embeddings to several problems related touser profiling in social networks. First, we present anapproach to create user profiles that measure a user’sinterest in various topics mined from the full texts of theitems. As a result, we get a user profile that can be used,e.g., for cold start recommendations for items, targetedadvertisement, and other purposes; our experimentsshow that the interests mining method performs on alevel comparable with collaborative algorithms while atthe same time being a cold start approach, i.e., itdoes not use the likes of an item being recommended.Second, we study the problem of predicting a user’sdemographic attributes such as age and gender basedon his or her full-text items. We evaluate theefficiency of various age prediction algorithms based onword2vec word embeddings and conduct an extensiveexperimental evaluation, comparing these algorithmswith each other and with classical baseline approaches.

Sergey I. Nikolenko | Anton Alekseev | S. Nikolenko | Anton M. Alekseev

[1] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[2] Xuanjing Huang,et al. Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model , 2015, IJCAI.

[3] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[4] Jianfeng Gao,et al. Modeling Interestingness with Deep Neural Networks , 2014, EMNLP.

[5] Benno Stein,et al. Overview of the 3rd Author Profiling Task at PAN 2015 , 2015, CLEF.

[6] Benno Stein,et al. Overview of the Author Profiling Task at PAN 2013 , 2013, CLEF.

[7] Zhaohui Wu,et al. Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia , 2015, AAAI.

[8] Devdatt P. Dubhashi,et al. Extractive Summarization using Continuous Vector Space Models , 2014, CVSC@EACL.

[9] Ted Pedersen,et al. Screening Twitter Users for Depression and PTSD with Lexical Decision Lists , 2015, CLPsych@HLT-NAACL.

[10] D. Sculley,et al. Web-scale k-means clustering , 2010, WWW '10.

[11] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12] Sergey I. Nikolenko,et al. User Profiling in Text-Based Recommender Systems Based on Distributed Word Representations , 2016, AIST.

[13] Pasquale Lops,et al. Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[14] Svitlana Volkova,et al. Inferring User Political Preferences from Streaming Communications , 2014, ACL.

[15] Jason Baldridge,et al. Hierarchical Discriminative Classification for Text-Based Geolocation , 2014, EMNLP.

[16] Teresa Gonçalves,et al. Author Profiling using SVMs and Word Embedding Averages , 2016, CLEF.

[17] Benno Stein,et al. Overview of the 2 nd Author Profiling Task at PAN 2014 , 2014 .

[18] Felice Dell'Orletta,et al. Linguistic Profiling based on General-purpose Features and Native Language Identification , 2013, BEA@NAACL-HLT.

[19] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20] Gemma Boleda,et al. Regular polysemy: A distributional model , 2012, *SEM@NAACL-HLT.

[21] William W. Cohen. Fast Effective Rule Induction , 1995, ICML.

[22] Jon Oberlander,et al. Whose Thumb Is It Anyway? Classifying Author Personality from Weblog Text , 2006, ACL.

[23] Paolo Rosso,et al. On the impact of emotions on author profiling , 2016, Inf. Process. Manag..

[24] Olav Bjørkøy. USER MODELING ON THE WEB An Exploratory Review of Recommendation Systems , 2010 .

[25] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[26] Gerhard Fischer,et al. User Modeling in Human–Computer Interaction , 2001, User Modeling and User-Adapted Interaction.

[27] Michael J. Pazzani,et al. Content-Based Recommendation Systems , 2007, The Adaptive Web.

[28] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29] Raymond J. Mooney,et al. Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[30] Derek Ruths,et al. Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[31] Rich Ling,et al. The socio-demographics of texting: An analysis of traffic data , 2012, New Media Soc..

[32] Michael J. Pazzani,et al. User Modeling for Adaptive News Access , 2000, User Modeling and User-Adapted Interaction.

[33] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[34] Benjamin Van Durme,et al. Using Conceptual Class Attributes to Characterize Social Media Users , 2013, ACL.

[35] J. Pennebaker,et al. LEXICAL PREDICTORS OFPERSONALITY TYPE , 2005 .

[36] Hans-Peter Kriegel,et al. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[37] Jussi Karlgren,et al. Inferring the location of authors from words in their texts , 2015, NODALIDA.

[38] Timothy Baldwin,et al. Twitter User Geolocation Using a Unified Text and Network Prediction Model , 2015, ACL.

[39] Geoffrey I. Webb,et al. # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Machine Learning for User Modeling , 1999 .

[40] Dorin Comaniciu,et al. Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[41] Michael J. Pazzani,et al. Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[42] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[44] Nikolaos Aletras,et al. An analysis of the user occupational class through Twitter content , 2015, ACL.

[45] Alfred Kobsa,et al. The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[46] William W. Cohen,et al. Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[47] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[48] Eric Brown,et al. Applying natural language processing (NLP) based metadata extraction to automatically acquire user preferences , 2001, K-CAP '01.

[49] John W. Sheppard,et al. Comparing Frequency- and Style-Based Features for Twitter Author Identification , 2013, FLAIRS.

[50] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[51] Yoav Goldberg,et al. A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[52] Malvina Nissim,et al. GronUP: Groningen User Profiling: Notebook for PAN at CLEF 2016 , 2016 .

[53] Weiran Xu,et al. Learning Word Vectors Efficiently Using Shared Representations and Document Representations , 2015, AAAI.

[54] Efstathios Stamatatos,et al. A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..