论文信息 - A Feasibility Study on Extracting Twitter Users' Interests Using NLP Tools for Serendipitous Connections

A Feasibility Study on Extracting Twitter Users' Interests Using NLP Tools for Serendipitous Connections

This paper presents our research on the feasibility of extracting Twitter users' interests for suggesting serendipitous connections using natural language processing (NLP) technology. Defined by Andel [1] as the art of making an unsought finding, serendipity has a positive role in scientific research and people's daily lives. Applications that facilitate serendipity would bring various benefits to us. In this work, we focus on the mining of users' interests from Twitter messages (tweets hereafter) to support the detection of serendipitous connections. To address the challenge, we explore a set of NLP tools to develop a real-time system for automatically extracting the users' interests in the form of named entities and core terms. We also examine the different contributions of three different information sources with regard to the user's interests. Furthermore, we examine the issue of determining the additional attribute of surprisingness/ unexpectedness of the terms and entities of interest which we deem critical for detecting serendipitous connections. Our prototype system was tested with a group of Twitter users involving approximately 2,300 tweets. Our algorithm achieved varying degrees of success on each of the users, demonstrating feasibility of identifying serendipitous interest terms and entities. For example, 27.5% of terms extracted for one of the users were judged to be serendipitous.

Jon Whittle | Scott Piao

[1] Birgitta König-Ries,et al. A Hybrid Approach to Identifying User Interests in Web Portals , 2009, IICS.

[2] Nitin Agarwal,et al. Twitter Quo Vadis: Is Twitter Bitter or Are Tweets Sweet? , 2010, 2010 Seventh International Conference on Information Technology: New Generations.

[3] A.R.M. Teutle,et al. Twitter: Network properties analysis , 2010, 2010 20th International Conference on Electronics Communications and Computers (CONIELECOMP).

[4] George Cybenko,et al. Discovering Influence in Communication Networks Using Dynamic Graph Analysis , 2010, 2010 IEEE Second International Conference on Social Computing.

[5] Christophe G. Giraud-Carrier,et al. Bonding vs. Bridging Social Capital: A Case Study in Twitter , 2010, 2010 IEEE Second International Conference on Social Computing.

[6] António Dias de Figueiredo,et al. Programming for Serendipity , 2002 .

[7] Alice Oh,et al. Analysis of Twitter Lists as a Potential Source for Discovering Latent Characteristics of Users , 2010 .

[8] P. Andel. Anatomy of the Unsought Finding. Serendipity: Orgin, History, Domains, Traditions, Appearances, Patterns and Programmability , 1994, The British Journal for the Philosophy of Science.

[9] Hideki Mima,et al. Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[10] Ziqi Zhang,et al. A Comparative Evaluation of Term Recognition Algorithms , 2008, LREC.

[11] T. Kuhn,et al. The Structure of Scientific Revolutions. , 1964 .

[12] Tomohiro Takagi,et al. Recommendations in Twitter using conceptual fuzzy sets , 2010, 2010 Annual Meeting of the North American Fuzzy Information Processing Society.

[13] Ioannis Korkontzelos,et al. Unsupervised learning of multiword expressions , 2010 .