Classification and Visualization of Travel Blog Entries Based on Types of Tourism

We propose a method for classifying travel blog entries into one or more tourism types among six predetermined types by using textual and image information in each entry. Together with this information, we use Wikipedia entries, which are automatically linked from each travel blog entry by entity-linking technology, because information beneficial for classifying blog entries is often mentioned in Wikipedia entries, and we combine this information by using a deep-learning-based method. We conducted an experiment with a neural network using three types of input data. Using the Sparse Composite Document Vector (SCDV) technique, we obtained precision, recall, and F-measure scores of 0.743, 0.217, and 0.336, respectively. We also conducted ensemble learning by using SCDV and support vector machines (SVM), and obtained precision, recall, and F-measure scores of 0.807, 0.179, and 0.293, respectively. Finally, we constructed a system that enables travelers to look for travel blog entries from a map in terms of tourism type.

[1]  Shinsuke Mori,et al.  Wikification for Scriptio Continua , 2016, LREC.

[2]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[3]  Manabu Okumura,et al.  Travellers ’ Behaviour Analysis Based on Automatically Identified Attributes from Travel Blog Entries , 2016 .

[4]  Yu-ning Xiong,et al.  Personalized Intelligent Hotel Recommendation System for Online Reservation--A Perspective of Product and User Characteristics , 2010, 2010 International Conference on Management and Service Science.

[5]  Harish Karnick,et al.  SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations , 2016, EMNLP.

[6]  Julia E. Blose,et al.  Believe it or not: Credibility of blogs in tourism , 2008 .

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Tetsuya Araki,et al.  Analyzing Travel Behavior Using Multi-label Classification From Twitter , 2017, MEDES.

[9]  Hidetsugu Nanba,et al.  Enriching Travel Guidebooks with Travel Blog Entries and Archives of Answered Questions , 2016, ENTER.

[10]  Hiroyuki Shindo,et al.  Wikipedia2Vec: An Optimized Tool for Learning Embeddings of Words and Entities from Wikipedia , 2018, ArXiv.

[11]  Hidetsugu Nanba,et al.  Automatic Compilation of Travel Information from Automatically Identified Travel Blogs , 2009, ACL.

[12]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[13]  Hidetsugu Nanba,et al.  Investigating the effectiveness of computer-produced summaries obtained from multiple travel blog entries , 2019, J. Inf. Technol. Tour..

[14]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[15]  A. Wenger Analysis of travel bloggers' characteristics and their communication about Austria as a tourism destination , 2008 .