Keyterm extraction from microblogs' messages using Wikipedia-based keyphraseness measure

The paper describes a method for keyterm extraction from messages of microblogs. The described approach utilizes the information obtained by the analysis of structure and content of Wikipedia. The algorithm is based on computation of “keyphraseness” measure for each term, i.e. an estimation of probability that it can be selected as a key in the text. The experimental study of the proposed technique demonstrated satisfactory results which significantly outpaces analogues. As a demonstration of possible application of the algorithm, the prototype of context-sensitive advertising system has been implemented. This system is able to obtain the descriptions of the goods relevant to the found keyterms from Amazon online store. Several suggestions are also made on how to utilize the information obtained by the analysis of Twitter messages in different auxiliary services.

[1]  Paul Mcfedries,et al.  Technically Speaking , 2007, IEEE Spectrum.

[2]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[3]  Martin Böhringer Really Social Syndication: A Conceptual View on Microblogging , 2009 .

[4]  Ali R. Hurson,et al.  TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[5]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[6]  Des Spence All a-Twitter , 2011, BMJ : British Medical Journal.

[7]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[8]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[9]  Martin Ebner,et al.  Microblogging - more than fun? , 2008 .

[10]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[11]  Mor Naaman,et al.  Is it really about me?: message content in social awareness streams , 2010, CSCW '10.

[12]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[13]  David R. Karger,et al.  What would it mean to blog on the semantic web? , 2005, J. Web Semant..

[14]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[15]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[16]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[17]  Maria P. Grineva,et al.  Effective Extraction of Thematically Grouped Key Terms From Text , 2009, AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0.

[18]  Susan C. Herring,et al.  Beyond Microblogging: Conversation and Collaboration via Twitter , 2009, 2009 42nd Hawaii International Conference on System Sciences.