A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data

Tag clouds have become an effective tool to quickly perceive the most prominent terms embedded within textual data. Tag clouds help grasp the main theme of a corpus without exploring the pile of documents. However, the effectiveness of tag clouds to conceptualize text corpora is directly proportional to the quality of the tags. In this paper, the authors propose a keyphrase-based tag cloud generation framework. In contrast to existing tag cloud generation systems that use single words as tags and their frequency counts to determine the font size of the tags, the proposed framework identifies feasible keyphrases and uses them as tags. The font-size of a keyphrase is determined as a function of its relevance weight. Instead of using partial or full parsing, which is inefficient for lengthy sentences and inaccurate for the sentences that do not follow proper grammatical structure, the proposed method applies n-gram techniques followed by various heuristics-based refinements to identify candidate phrases from text documents. A rich set of lexical and semantic features are identified to characterize the candidate phrases and determine their keyphraseness and relevance weights. The authors also propose a font-size determination function, which utilizes the relevance weights of the keyphrases to determine their relative font size for tag cloud visualization. The efficacy of the proposed framework is established through experimentation and its comparison with the existing state-of-the-art tag cloud generation methods.

[1]  Tutut Herawan,et al.  Mining Interesting Association Rules of Students Suffering Study Anxieties Using SLP-Growth Algorithm , 2012, Int. J. Knowl. Syst. Sci..

[2]  Georgia Koutrika,et al.  Data clouds: summarizing keyword search results over structured data , 2009, EDBT '09.

[3]  Muhammad Abulaish,et al.  A web content mining approach for tag cloud generation , 2011, iiWAS '11.

[4]  Owen Kaser,et al.  Tag-Cloud Drawing: Algorithms for Cloud Visualization , 2007, ArXiv.

[5]  Benjamin M. Good,et al.  Tag clouds for summarizing web search results , 2007, WWW '07.

[6]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[7]  Na Liu,et al.  Simulation on Knowledge Transfer Processes from the Perspectives of Individual's Mentality and Behavior , 2011, Int. J. Knowl. Syst. Sci..

[8]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[9]  Ivan Berlocher,et al.  TopicRank: bringing insight to users , 2008, SIGIR '08.

[10]  Nick Koudas,et al.  BlogScope: spatio-temporal analysis of the blogosphere , 2007, WWW '07.

[11]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[12]  George Papadakis,et al.  Efficient Term Cloud Generation for Streaming Web Content , 2010, ICWE.

[13]  Mor Naaman,et al.  Generating summaries and visualization for large collections of geo-referenced photographs , 2006, MIR '06.

[14]  Wolfgang Kienreich,et al.  On the Beauty and Usability of Tag Clouds , 2008, 2008 12th International Conference Information Visualisation.

[15]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[16]  Joongmin Choi,et al.  Web Document Clustering by Using Automatic Keyphrase Extraction , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[17]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[18]  Dimitrios Gunopulos,et al.  Searching for events in the blogosphere , 2009, WWW '09.

[19]  Aran Lunzer,et al.  Interdisciplinary Advances in Adaptive and Intelligent Assistant Systems: Concepts, Techniques, Applications, and Use , 2010 .

[20]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[21]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[22]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[23]  M. Ghiassi,et al.  A Web-Enabled, Mobile Intelligent Information Technology Architecture for On-Demand and Mass Customized Markets , 2011 .

[24]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[25]  Jean Véronis,et al.  Visualising a Text with a Tree Cloud , 2009 .

[26]  Mark S. Staveley,et al.  Phrasier: a system for interactive document retrieval using keyphrases , 1999, SIGIR '99.

[27]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[28]  Yusef Hassan-Montero,et al.  Improving Tag-Clouds as Visual Information Retrieval Interfaces , 2024, 2401.04947.

[29]  Yi-fang Brook Wu,et al.  Incorporating Document Keyphrases in Search Results , 2004, AMCIS.

[30]  S. Milgram Psychological maps of Paris , 1976 .

[31]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[32]  Mika Käki,et al.  Information search and re-access strategies of experienced web users , 2005, WWW '05.

[33]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[34]  Nick Koudas,et al.  Searching the Blogosphere , 2007, WebDB.

[35]  Dana J. Vanier,et al.  Use of Keyphrase Extraction Software for Creation of an AEC/FM Thesaurus , 2000, J. Inf. Technol. Constr..

[36]  Martin Wattenberg,et al.  Participatory Visualization with Wordle , 2009, IEEE Transactions on Visualization and Computer Graphics.

[37]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[38]  Gideon S. Mann,et al.  Analyses for elucidating current question answering technology , 2001, Natural Language Engineering.

[39]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[40]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[41]  Steffen Lohmann,et al.  Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration , 2009, INTERACT.

[42]  Malika Mahoui,et al.  Hierarchical document clustering using automatically extracted keyphrases , 2000 .

[43]  Adam Jatowt,et al.  Visualizing historical content of web pages , 2008, WWW.

[44]  M. Abulaish,et al.  A SUPERVISED LEARNING APPROACH FOR AUTOMATIC KEYPHRASE EXTRACTION , 2012 .

[45]  Akira Namatame,et al.  Diffusion and Emergence in Social Networks , 2010, Intelligent Systems for Automated Learning and Adaptation.

[46]  Raymond Chiong Intelligent Systems for Automated Learning and Adaptation: Emerging Trends and Applications , 2010, Intelligent Systems for Automated Learning and Adaptation.

[47]  Divesh Srivastava,et al.  What's on the grapevine? , 2009, SIGMOD Conference.