A Novel Approach for Crawling the Opinions from World Wide Web

Due to the sudden and explosive increase in web technologies, huge quantity of user generated content is available online. The experiences of people and their opinions play an important role in the decision making process. Although facts provide the ease of searching information on a topic but retrieving opinions is still a crucial task. Many studies on opinion mining have to be undertaken efficiently in order to extract constructive opinionated information from these reviews. The present work focuses on the design and implementation of an Opinion Crawler which downloads the opinions from various sites thereby, ignoring rest of the web. Besides, it also detects web pages which frequently undergo updation by calculating the timestamp for its revisit in order to extract relevant opinions. The performance of the Opinion Crawler is justified by taking real data sets that prove to be much more accurate in terms of precision and recall quality attributes.

[1]  Komal Kumar Bhatia,et al.  Strategies for mining opinions: A survey , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[2]  Hsinchun Chen,et al.  Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers , 2012, TOIS.

[3]  Panayiotis Tsaparas,et al.  Using micro-reviews to select an efficient set of reviews , 2013, CIKM.

[4]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[5]  G Swathi,et al.  EXTRACTING BUSINESS INTELLIGENCE FROM ONLINE PRODUCT REVIEWS , 2013 .

[6]  Seema Kolkur Web Data Extraction Using Tree Structure Algorithms - A Comparison , 2013 .

[7]  K. R. Remesh Babu,et al.  Design of a Metacrawler for web document retrieval , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[8]  Rudolf Mathar,et al.  Focused crawling for building Web comment corpora , 2013, 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC).

[9]  Dongil Han,et al.  Modeling Web Crawler Wrappers to Collect User Reviews on Shopping Mall with Various Hierarchical Tree Structure , 2009, 2009 International Conference on Web Information Systems and Mining.

[10]  Bakhtawar Seerat,et al.  Opinion Mining: Issues and Challenges (A survey) , 2012 .

[11]  Yuan Xiaohong,et al.  Research and implementation of the technology supporting MicroBlog data collection based on web crawler , 2012 .

[12]  M. Thinyane,et al.  Development of a Facebook Crawler for Opinion Trend Monitoring and Analysis Purposes: Case Study of Government Service Delivery in Dwesa , 2013 .

[13]  Komal Kumar Bhatia,et al.  A Framework for Incremental Domain-Specific Hidden Web Crawler , 2010, IC3.

[14]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[15]  K. Raghuveer,et al.  Web User Opinion Analysis for Product Features Extraction and Opinion Summarization , 2012 .

[16]  Stephen V. Stehman,et al.  Selecting and interpreting measures of thematic classification accuracy , 1997 .

[17]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[19]  Khairullah Khan,et al.  Mining opinion components from unstructured reviews: A review , 2014, J. King Saud Univ. Comput. Inf. Sci..

[20]  Charles R. Hildreth,et al.  Accounting for users' inflated assessments of on-line catalogue search performance and usefulness: an experimental study , 2001, Inf. Res..

[21]  ChengXiang Zhai,et al.  Opinion-based entity ranking , 2012, Information Retrieval.

[22]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[23]  Pooja Gupta,et al.  Implementation of Web Crawler , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[24]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[25]  Torsten Suel,et al.  Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  Pasquale De Meo,et al.  Web Data Extraction , Applications and Techniques : A Survey , 2010 .

[27]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[28]  Ashutosh Dixit Self Adjusting Refresh Time Based Architecture for Incremental Web Crawler , 2008 .