Search Engine-Based Web Information Extraction

In this chapter we discuss approaches to find, extract, and structure information from natural language texts on the Web. Such structured information can be expressed and shared using the standard Semantic Web languages and hence be machine interpreted. In this chapter we focus on two tasks in Web information extraction. The first part focuses on mining facts from the Web, while in the second part, we present an approach to collect community-based meta-data. A search engine is used to retrieve potentially relevant texts. From these texts, instances and relations are extracted. The proposed approaches are illustrated using various case-studies, showing that we can reliably extract information from the Web using simple techniques. IntroductIon Suppose we are interested in ‘the countries where Burger King can be found’, ‘the Dutch cities with a university of technology’ or perhaps ‘the genre of the music of Miles Davis’. For such diverse factual information needs, the World Wide Web in general and a search engine in particular can provide a solution. Experienced users of search engines are able to construct queries that are likely to access documents containing the desired information. However, current search engines retrieve Web pages, not the information itself1. We have to search within the search results in order to acquire the information. Moreover, we make implicit use of our knowledge (e.g. of the language and the domain), to interpret the Web pages. DOI: 10.4018/978-1-60566-112-4.ch009

[1]  Seng Wai Loke,et al.  The Impact of Ontology on the Performance of Information Retrieval: A Case of Wordnet , 2008, Int. J. Inf. Technol. Web Eng..

[2]  San Murugesan,et al.  Handbook of Research on Web 2.0, 3.0, and X.0: Technologies, Business, and Social Applications , 2009 .

[3]  Bernhard Thalheim,et al.  Structural Media Types in the Development of Data-Intensive Web Information Systems , 2004 .

[4]  Yabing Jiang,et al.  Web-Based Corporate Governance Information Disclosure: An Empirical Investigation , 2009, Inf. Resour. Manag. J..

[5]  Hesham A. Ali,et al.  High Performance Scheduling Mechanism for Mobile Computing Based on Self-Ranking Algorithm , 2006, Int. J. Inf. Technol. Web Eng..

[6]  Matthias Klusch,et al.  Adaptive Hybrid Semantic Selection of SAWSDL Services with SAWSDL-MX2 , 2010, Int. J. Semantic Web Inf. Syst..

[7]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[8]  Elio Toppano,et al.  How Culture May Influence Ontology Co-Design: A Qualitative Study , 2011, Int. J. Inf. Technol. Web Eng..

[9]  David Rine,et al.  Secure Online DNS Dynamic Updates: Architecture and Implementation , 2007, Int. J. Inf. Technol. Web Eng..

[10]  Andrew McCallum,et al.  Information Extraction , 2005, ACM Queue.

[11]  Bob J. Wielinga,et al.  A redundancy-based method for the extraction of relation instances from the Web , 2007, Int. J. Hum. Comput. Stud..

[12]  Hemraj Saini,et al.  Class Level Test Case Generation in Object Oriented Software Testing , 2008, Int. J. Inf. Technol. Web Eng..

[13]  Hamidah Ibrahim,et al.  A Model for Ranking and Selecting Integrity Tests in Distributed Database , 2008, PDPTA.

[14]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[15]  Liliana Ardissono,et al.  Collaborative Service Clouds , 2010, Int. J. Inf. Technol. Web Eng..

[16]  Miguel Mira da Silva,et al.  A Survey of Web Information Systems , 1997, WebNet.

[17]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[18]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[19]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Hadas Weinberger,et al.  ECHO: A Layered Model for the Design of a Context-Aware Learning Experience , 2010 .

[21]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[22]  Ghazi Alkhatib,et al.  Web Engineered Applications for Evolving Organizations : Emerging Knowledge , 2011 .

[23]  Restyandito,et al.  Localized User Interface for Improving Cell phone Users' Device Competency , 2008, Int. J. Inf. Technol. Web Eng..