Lexicon-based context-sensitive reference comments crawler

This paper proposes a novel system that aids in the writing of research papers by gathering and analysing other researchers’ comments for a given reference paper to provide some features, advantages or disadvantages of the referenced research. A lexicon-based reference comments crawler (LRCC) classifies the comments about a reference paper and the surrounding sentences using part-of-speech lexicons and a dynamic text window into four categories (normal, advantage, disadvantage and complex). The extraction of comments and surrounding sentences from research papers is effectively and efficiently carried out using the reference identifier and some simple extraction rules. In this paper, we considered the various types of reference identifiers, because a reference identifier is a key solution for the sentence extraction in the LRCC system. Several experiments were performed using published research papers to evaluate the LRCC’s precision and recall. The results showed that the LRCC can extract and classify comments with a high degree of precision and recall, as well as present them to the user in an effective and efficient manner.

[1]  Deepak Singh Tomar,et al.  Effective Focused Crawling Based on Content and Link Structure Analysis , 2009, ArXiv.

[2]  Christopher S. G. Khoo,et al.  Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[3]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[4]  Deren Chen,et al.  URL Rule Based Focused Crawler , 2008, 2008 IEEE International Conference on e-Business Engineering.

[5]  Elizabeth Chang,et al.  A survey in semantic web technologies-inspired focused crawlers , 2008, 2008 Third International Conference on Digital Information Management.

[6]  Aqil M. Azmi,et al.  Aara’– a system for mining the polarity of Saudi public opinion through e-newspaper comments , 2014, J. Inf. Sci..

[7]  Ruoming Jin,et al.  Topic level expertise search over heterogeneous networks , 2010, Machine Learning.

[8]  Bing Liu,et al.  The utility of linguistic rules in opinion mining , 2007, SIGIR.

[9]  Evangelos E. Milios,et al.  PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING , 2004, WIDM '04.

[10]  Nasser Ghasem-Aghaee,et al.  Exploiting reviewers’ comment histories for sentiment analysis , 2014, J. Inf. Sci..

[11]  Hui Zhang,et al.  WIDIT in TREC 2007 Blog Track: Combining Lexicon-Based Methods to Detect Opinionated Blogs , 2007, TREC.

[12]  Jong-Hyeok Lee,et al.  Improving Opinion Retrieval Based on Query-Specific Sentiment Lexicon , 2009, ECIR.

[13]  Elizabeth Chang,et al.  A Transport Service Ontology-based Focused Crawler , 2008, 2008 Fourth International Conference on Semantics, Knowledge and Grid.

[14]  Japinder Singh,et al.  Feature-based opinion mining and ranking , 2012, J. Comput. Syst. Sci..

[15]  Qiang Wang,et al.  Ontology-Based Focused Crawling , 2009, 2009 International Conference on Information, Process, and Knowledge Management.

[16]  Wang Beizhan,et al.  Efficient focused crawling strategy using combination of link structure and content similarity , 2008, 2008 IEEE International Symposium on IT in Medicine and Education.

[17]  Evangelos E. Milios,et al.  Using HMM to learn user browsing patterns for focused Web crawling , 2006, Data & Knowledge Engineering.

[18]  Xiaoyan Zhu,et al.  A query-specific opinion summarization system , 2009, 2009 8th IEEE International Conference on Cognitive Informatics.

[19]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[20]  Fermín L. Cruz,et al.  A knowledge-rich approach to feature-based opinion extraction from product reviews , 2010, SMUC '10.

[21]  Timothy N. Rubin,et al.  Statistical topic models for multi-label document classification , 2011, Machine Learning.

[22]  Beata Beigman Klebanov,et al.  Vocabulary Choice as an Indicator of Perspective , 2010, ACL.

[23]  Hsinchun Chen,et al.  A focused crawler for Dark Web forums , 2010 .

[24]  Chun Chen,et al.  Guide focused crawler efficiently and effectively using on-line topical importance estimation , 2008, SIGIR '08.

[25]  Patricia Bouyer,et al.  Improved undecidability results on weighted timed automata , 2006, Inf. Process. Lett..

[26]  Sheng-Yuan Yang,et al.  Ontology-Supported Focused-Crawler for Specified Scholar's Webpages , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[27]  Songbo Tan,et al.  Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples , 2008, SIGIR '08.