论文信息 - Learning multi-faceted representations of individuals from heterogeneous evidence using neural networks

Learning multi-faceted representations of individuals from heterogeneous evidence using neural networks

Inferring latent attributes of people online is an important social computing task, but requires integrating the many heterogeneous sources of information available on the web. We propose learning individual representations of people using neural nets to integrate rich linguistic and network evidence gathered from social media. The algorithm is able to combine diverse cues, such as the text a person writes, their attributes (e.g. gender, employer, education, location) and social relations to other people. We show that by integrating both textual and network evidence, these representations offer improved performance at four important tasks in social media inference on Twitter: predicting (1) gender, (2) occupation, (3) location, and (4) friendships for users. Our approach scales to large datasets and the learned representations can be used as general features in and have the potential to benefit a large number of downstream tasks including link prediction, community detection, or probabilistic reasoning over social networks.

[1] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[2] Omer Levy,et al. Dependency-Based Word Embeddings , 2014, ACL.

[3] Stephen Grossberg,et al. Recurrent neural networks , 2013, Scholarpedia.

[4] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6] Omer Levy,et al. Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[7] William Yang Wang,et al. Programming with personalized pagerank: a locally groundable first-order probabilistic logic , 2013, CIKM.

[8] David Yarowsky,et al. Classifying latent user attributes in twitter , 2010, SMUC '10.

[9] Mark Craven,et al. Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[10] Gisele L. Pappa,et al. Inferring the Location of Twitter Messages Based on User Relationships , 2011, Trans. GIS.

[11] Bart Selman,et al. Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[12] Matthew Richardson,et al. Markov logic networks , 2006, Machine Learning.

[13] Brendan T. O'Connor,et al. Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[14] Oren Etzioni,et al. Modeling Missing Data in Distant Supervision for Information Extraction , 2013, TACL.

[15] Thore Graepel,et al. Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[16] Jure Leskovec,et al. Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[17] John Miller,et al. Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.

[18] Jacob Ratkiewicz,et al. Political Polarization on Twitter , 2011, ICWSM.

[19] Claire Cardie,et al. Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[20] Zornitsa Kozareva,et al. Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns , 2010, ACL.

[21] M E J Newman,et al. Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[23] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[25] Stefan C. Kremer,et al. Recurrent Neural Networks , 2013, Handbook on Neural Information Processing.

[26] Lakhmi C. Jain,et al. Recurrent Neural Networks: Design and Applications , 1999 .

[27] Harith Alani,et al. Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[28] Nikolaos Aletras,et al. An analysis of the user occupational class through Twitter content , 2015, ACL.

[29] Diyi Yang,et al. That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets , 2015, EMNLP.

[30] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[31] Henry A. Kautz,et al. Finding your friends and following them to where you are , 2012, WSDM '12.

[32] Claire Cardie,et al. Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[33] Zornitsa Kozareva,et al. Not All Seeds Are Equal: Measuring the Quality of Text Mining Seeds , 2010, NAACL.