论文信息 - Typing Massive JSON Datasets

Typing Massive JSON Datasets

Cloud-specific languages are usually untyped, and no guarantees about the correctness of complex jobs can be statically obtained. Datasets too are usually untyped and no schema information is needed for their manipulation. In this paper we sketch a typing algorithm for JSON datasets. Our approach can be used to infer a succinct type from scratch for a collection of JSON objects, as well as to validate a dataset against a human-designed type and, if necessary, to adapt and improve this type.

Dario Colazzo | Giorgio Ghelli | Carlo Sartiani

[1] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[2] Dominic Battré,et al. Nephele/PACTs: a programming model and execution framework for web-scale analytical processing , 2010, SoCC '10.

[3] Rob Pike,et al. Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[4] Dario Colazzo,et al. Efficient inclusion for a class of XML types with interleaving and counting , 2009, Inf. Syst..

[5] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6] Luca Cardelli,et al. Extensible records in a pure calculus of subtyping , 1994 .

[7] Thomas Schwentick,et al. Inference of concise regular expressions and DTDs , 2010, TODS.

[8] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.