论文信息 - Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction

Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction

End-to-end knowledge base construction systems using statistical inference are enabling more people to automatically extract high-quality domain-specific information from unstructured data. As a result of deploying DeepDive framework across several domains, we found new challenges in debugging and improving such end-to-end systems to construct high-quality knowledge bases. DeepDive has an iterative development cycle in which users improve the data. To help our users, we needed to develop principles for analyzing the system's error as well as provide tooling for inspecting and labeling various data products of the system. We created guidelines for error analysis modeled after our colleagues' best practices, in which data labeling plays a critical role in every step of the analysis. To enable more productive and systematic data labeling, we created Mindtagger, a versatile tool that can be configured to support a wide range of tasks. In this demonstration, we show in detail what data labeling tasks are modeled in our error analysis guidelines and how each of them is performed using Mindtagger.

Christopher Ré | Michael J. Cafarella | Jaeho Shin | C. Ré | Jaeho Shin

[1] Amir Sadeghian,et al. Feature Engineering for Knowledge Base Construction , 2014, IEEE Data Eng. Bull..

[2] Christopher De Sa,et al. Incremental Knowledge Base Construction Using DeepDive , 2015, The VLDB Journal.

[3] Michael Stonebraker,et al. Data Curation at Scale: The Data Tamer System , 2013, CIDR.