Debugging Missing Answers for Spark Queries over Nested Data with Breadcrumb

We present Breadcrumb, a system that aids developers in debugging queries through query-based explanations for missing answers. Given as input a query and an expected, but missing, query result, Breadcrumb identifies operators in the input query that are responsible for the failure to derive the missing answer. These operators form explanations that guide developers who can then focus their debugging efforts on fixing these parts of the query. Breadcrumb is implemented on top of Apache Spark. Our approach is the first that scales to big data dimensions and is capable of finding explanations for common errors in queries over nested and de-normalized data, e.g., errors based on misinterpreting schema semantics. PVLDB Reference Format: Ralf Diestelkämper, Seokki Lee, Boris Glavic, and Melanie Herschel. Debugging Missing Answers for Spark Queries over Nested Data with Breadcrumb. PVLDB, 14(12): 2731 2734, 2021. doi:10.14778/3476311.3476331 PVLDB Artifact Availability: The source code, data, and/or other artifacts have been made available at https://github.com/UniStuttgart-DataEngineering/breadcrumb.