Improving Accuracy of Short Text Categorization Using Contextual Information

Categorization plays a major role in information retrieval. The abstracts of research documents have very few terms for the existing categorization algorithms to provide accurate results. This limitation of the abstracts leads to unsatisfactory categorization. This paper proposed a three-stage categorization scheme to improve the accuracy in categorizing the abstracts of research documents. The abstracts on most cases will be extending the context from the surrounding information. Initially, the context from the environment in which the abstract is present is extracted. The proposed system performs context gathering as a continuous process. In the next stage, the short text is subjected to general NLP techniques. The system divides the terms in the abstract into hierarchical levels of context. The terms contributing to the higher levels of context are taken forward to the further stages in categorization. Finally, the system applies weighted terms method to categorize the abstract. In case of uncertainties arising due to the limited number of terms, the context obtained in the initial stage will be used to eliminate the uncertainty. This relation of the context to the content in the short text will provide better accuracy and lead to effective filtering on content in information retrieval. Experiments conducted on categorization of short texts with the proposed method provided better accuracy than traditional feature-based categorization.

[1]  Franco Zambonelli,et al.  A Self-organizing Approach for Building and Maintaining Knowledge Networks , 2010, MOBILWARE.

[2]  Alexander Weber,et al.  Analysing Social Networks Within Bibliographical Data , 2006, DEXA.

[3]  Anind K. Dey,et al.  Designing mediation for context-aware applications , 2005, TCHI.

[4]  Paolo Rosso,et al.  On the difficulty of clustering company tweets , 2010, SMUC '10.

[5]  Lin Li,et al.  Improving Short Text Clustering Performance with Keyword Expansion , 2009, ISNN.

[6]  Maria Biryukov Co-author Network Analysis in DBLP: Classifying Personal Names , 2008, MCO.

[7]  Karl-Michael Schneider,et al.  Techniques for Improving the Performance of Naive Bayes for Text Classification , 2005, CICLing.

[8]  Paolo Rosso,et al.  An Approach to Clustering Abstracts , 2005, NLDB.

[9]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[10]  Rafal Scherer,et al.  LSTM Recurrent Neural Networks for Short Text and Sentiment Classification , 2017, ICAISC.

[11]  François Rousselot,et al.  PageRank for bibliographic networks , 2008, Scientometrics.

[12]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[13]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[14]  Paolo Rosso,et al.  Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance , 2009, CICLing.

[15]  Xindong Wu,et al.  Concept Based Short Text Stream Classification with Topic Drifting Detection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[16]  Andrew Skabar,et al.  Short-Text Similarity Measurement Using Word Sense Disambiguation and Synonym Expansion , 2010, Australasian Conference on Artificial Intelligence.

[17]  Pei-Chun Lee,et al.  Assessment of ontology-based knowledge network formation by Vector-Space Model , 2010, Scientometrics.

[18]  Alexander Weber,et al.  Managing the Quality of Person Names in DBLP , 2006, ECDL.

[19]  Dalibor Fiala,et al.  Mining citation information from CiteSeer data , 2011, Scientometrics.

[20]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[21]  Jie Yang,et al.  Short text classification based on LDA topic model , 2016, 2016 International Conference on Audio, Language and Image Processing (ICALIP).

[22]  Michael Hartley,et al.  A context gathering framework for context-aware mobile solutions , 2007, Mobility '07.

[23]  Paolo Rosso,et al.  A General Bio-inspired Method to Improve the Short-Text Clustering Task , 2010, CICLing.