Web Information Extraction Using Eupeptic Data in Web Tables

By leveraging on the redundant information on the Web, we are building a Web information extraction system that concentrates on eupeptic data in Web tables. We use the term eupeptic to describe such representations of information that allow for easy interpretation of the subject-predicate-object nature of individual data items. The system mimics a human approach to information gathering. It explicitly uses vi- sual cues on rendered Web pages to locate tabular data; it uses keywords to identify relevant chunks of data that gets processed on a deeper level; and it expands its initial search to include more pages when it spots eupeptic data.