论文信息 - Generating Actionable Knowledge from Big Data

Generating Actionable Knowledge from Big Data

The last few years have seen a rapid increase of sheer amount of data produced and communicated over the Internet and the Web. While it is widely believed that the availability of such ``Big Data'' holds the potential to revolutionize many aspects of our modern society (e.g., intelligent transportation, environmental monitoring, and energy saving), many challenges need to be addressed before this potential can be realized. This PhD project focuses on one critical challenge, namely extracting actionable knowledge from Big Data. Tremendous efforts have been contributed on mining large-scale data on the Web and constructing comprehensive knowledge bases (KBs). However, existing knowledge extraction systems retrieve data from limited types of Web sources. In addition, data fusion approaches consider very little of the noises produced by those knowledge extraction systems. Consequently, the constructed KBs are far from being comprehensive and accurate. In this paper, we present our initial design of a framework for extracting machine-readable data with high precision and recall from four types of data sources, namely Web texts, Document Object Model (DOM) trees, existing KBs, and query stream. Confidence scores are attached to the resulting knowledge, which can be used to further improve the knowledge fusion results.

Xiu Susie Fang | X. Fang

[1] Gerhard Weikum,et al. Discovering and Exploring Relations on the Web , 2012, Proc. VLDB Endow..

[2] Bo Zhao,et al. A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[3] Erhard Rahm,et al. Schema Matching and Mapping , 2013, Schema Matching and Mapping.

[4] Divesh Srivastava,et al. Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[5] Wei-Ying Ma,et al. Simultaneous record detection and attribute labeling in web data extraction , 2006, KDD '06.

[6] Daniel S. Weld,et al. Automatically refining the wikipedia infobox ontology , 2008, WWW.

[7] Alicia Ageno,et al. Adaptive information extraction , 2006, CSUR.

[8] Gerhard Weikum,et al. WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[9] Oren Etzioni,et al. Open Information Extraction: The Second Generation , 2011, IJCAI.

[10] Haixun Wang,et al. Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[11] Dan Roth,et al. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .