Geographic Information Retrieval and Text Mining on Chinese Tourism Web Pages

The World Wide Web WWW offers an enormous wealth of information and data, and assembles a tremendous amount of knowledge. Much of this knowledge, however, comprises either non-structured data or semistructured data. To make use of these unexploited or underexploited resources more efficiently, the management of information and data gathering has become an essential task for research and development. In this paper, the author examines the task of researching a hostel or homestay using the Google search web service as a base search engine. From the search results, mining, retrieving and sorting out location and semantic data were carried out by combining the Chinese Word Segmentation System with text mining technology to find geographic information gleaned from web pages. The results obtained from this particular searching method allowed users to get closer to the answers they sought and achieve greater accuracy, as the results included graphics and textual geographic information. In the future, this method may be suitable for and applicable to various types of queries, analyses, geographic data collection, and in managing spatial knowledge related to different keywords within a document.

[1]  Adam Jatowt,et al.  Utilizing Past Web for Knowledge Discovery , 2009 .

[2]  Peter Haider,et al.  Classifying search engine queries using the web as background knowledge , 2005, SKDD.

[3]  Arno Scharl Towards the Geospatial Web: Media Platforms for Managing Geotagged Knowledge Repositories , 2007, The Geospatial Web.

[4]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[5]  Mário J. Silva,et al.  A graph-ranking algorithm for geo-referencing documents , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Ram Periakaruppan,et al.  GTrace - A Graphical Traceroute Tool , 1999 .

[8]  Alberto H. F. Laender,et al.  The role of gazetteers in geographic knowledge discovery on the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[9]  Branimir Boguraev,et al.  Discourse segmentation in aid of document summarization , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[10]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[11]  Harith Alani,et al.  Geographical Information Retrieval with Ontologies of Place , 2001, COSIT.

[12]  R. Pradeep Kumar,et al.  Predictive Modeling of User Interaction Patterns for 3D Mesh Streaming , 2012, Int. J. Inf. Technol. Web Eng..

[13]  Robert Weibel,et al.  Spatial information retrieval and geographical ontologies an overview of the SPIRIT project , 2002, SIGIR '02.

[14]  Keh-Jiann Chen,et al.  Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[15]  Arno Scharl,et al.  Annotating and visualizing location data in geospatial web applications , 2008, LocWeb.

[16]  Katsumi Tanaka,et al.  Toward tighter integration of web search with a geographic information system , 2006, WWW '06.

[17]  Hussein Al-Bahadili,et al.  Development of a Novel Compressed Index-Query Web Search Engine Model , 2011, Int. J. Inf. Technol. Web Eng..

[18]  Allison Woodruff,et al.  GIPSY: automated geographic indexing of text documents , 1994 .

[19]  Fredric C. Gey,et al.  GeoCLEF: the CLEF 2005 Cross-Language Geographic Information Retrieval Track , 2005, CLEF.

[20]  James Reid geoXwalk - A Gazetteer Server and Service for UK Academia , 2003, ECDL.

[21]  Jiebo Luo,et al.  Data Mining. Multimedia, Soft Computing, and Bioinformatics , 2005, IEEE Transactions on Neural Networks.

[22]  Arthur Tatnall Web Technologies: Concepts, Methodologies, Tools and Applications , 2010 .

[23]  Hsin-Hsi Chen,et al.  Retrieval of Biomedical Documents by Prioritizing Key Phrases , 2005, TREC.

[24]  Zhonghua Yang,et al.  Building a Semantic-Rich Service-Oriented Manufacturing Environment , 2005, WISE.

[25]  Luis Gravano,et al.  Exploiting Geographical Location Information of Web Pages , 1999, WebDB.

[26]  Sushmita Mitra,et al.  Data Mining: Concepts and Algorithms From Multimedia to Bioinformatics , 2003 .

[27]  Guoray Cai,et al.  GeoVSM: An Integrated Retrieval Model for Geographic Information , 2002, GIScience.

[28]  Kevin S. McCurley,et al.  Geospatial mapping and navigation of the web , 2001, WWW '01.

[29]  David Hawking,et al.  Toward better weighting of anchors , 2004, SIGIR '04.

[30]  Abdelmajid Ben Hamadou,et al.  Adaptability and Adaptivity in The Generation of Web Applications , 2009, Int. J. Inf. Technol. Web Eng..

[31]  Katsumi Tanaka,et al.  Landmark Extraction: A Web Mining Approach , 2005, COSIT.

[32]  Derek Thompson,et al.  Fundamentals of spatial information systems , 1992, A.P.I.C. series.

[33]  Ray R. Larson,et al.  Geographic information retrieval and spatial browsing , 1996 .

[34]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[35]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .

[36]  Jan Stage,et al.  A Conceptual Tool for Usability Problem Identification in Website Development , 2009, Int. J. Inf. Technol. Web Eng..

[37]  Ghazi Alkhatib,et al.  Web Engineered Applications for Evolving Organizations : Emerging Knowledge , 2011 .

[38]  Abdallah Saleem Nawaf Al-Tahan Al-Nu'aimi Using Watermarking Techniques to prove Rightful Ownership of Web Images , 2011, Int. J. Inf. Technol. Web Eng..

[39]  Mike Thelwall,et al.  Handbook of Research on Web Log Analysis , 2009, J. Assoc. Inf. Sci. Technol..

[40]  Hussein Al-Bahadili,et al.  Analyzing the Effect of Node Density on the Performance of the LAR-1P Algorithm , 2012, Int. J. Inf. Technol. Web Eng..