Topic identification method for textual document

Abstract— Topic identification is a crucial task for discovering knowledge from textual document. Existing methods for topic identification suffer from word counting problem as they depend on the most frequent terms in the text to produce the topic keyword.Not all frequent terms are relevant. This paper proposes a topic identification method that filters the important terms from the preprocessed text and applied term weighting scheme to solve synonym problem.A rule generation algorithm is used to determine the appropriate topics based on the weighted terms.The text document used in the experiment is the English translated Quran.The topics identified from the proposed method were compared with topics identified using Rough Set and domain experts. From the findings, the proposed topic identification method was consistently able to identify topics that are mostly close to the topics that have been given by Rough Set and the experts.The result from the comparison proved that the proposed method was able to be used to capture topics for textual documents.

[1]  Claire Cardie,et al.  Topic Identification for Fine-Grained Opinion Analysis , 2008, COLING.

[2]  Henry Anaya-Sánchez,et al.  A New Document Clustering Algorithm for Topic Discovering and Labeling , 2008, CIARP.

[3]  P. Ramakanth Kumar,et al.  Solving the Noun Phrase and Verb Phrase Agreement in Kannada Sentences , 2009 .

[4]  Jafreezal Jaafar,et al.  Keywords Similarity Based Topic Identification for Indonesian News Documents , 2013, 2013 European Modelling Symposium.

[5]  Bali Ranaivo-Malançon,et al.  An Automatic Topic Identification Algorithm , 2011 .

[6]  Jinwoo Park,et al.  Automatic Text Categorization using the Importance of Sentences , 2002, COLING.

[7]  Markus Schaal,et al.  Topic Extraction from Online Reviews for Classification and Recommendation , 2013, IJCAI.

[8]  Shaidah Jusoh,et al.  Techniques , Applications and Challenging Issue in Text Mining , 2012 .

[9]  Kamel Smaïli,et al.  Contribution to topic identification by using word similarity , 2002, INTERSPEECH.

[10]  Wilson Wong,et al.  A Cognitive-Based Approach to Identify Topics in Text Using the Web as a Knowledge Source , 2011 .

[11]  M. Chidambaram,et al.  Text Mining: Concepts, Applications, Tools and Issues - An Overview , 2013 .

[12]  M. Hemalatha,et al.  Automatic Text categorization and summarization using rule reduction , 2012, IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM -2012).

[13]  Jyoti Pareek,et al.  Automatic Topic(s) Identification from Learning Material: An Ontological Approach , 2010, 2010 Second International Conference on Computer Engineering and Applications.

[14]  Shweta Taneja,et al.  U-STRUCT: A Framework for Conversion of Unstructured Text Documents into Structured Form , 2013 .

[15]  A. T. Sadiq,et al.  Hybrid Intelligent Technique for Text Categorization , 2012, 2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT).