Building Effective and Efficient Procedure for Preprocessing Marketplace Data

Rapid development of digitalization have enforced National Statistics Offices to utilize big data as one of new sources for producing official statistics. An alternative source is marketplace data which now growing rapidly. Many challenges exist for transforming these massive datasets into statistics for public policy. This paper aims to explain the challenges of analyzing marketplace data and building effective and efficient preprocessing procedure to analyses big data which can be used for public policy. An optimal pipeline for preprocessing including validating, cleaning and aggregating marketplace data have been developed.

[1]  Ziaul Hoq,et al.  The economic impact of E-commerce , 2005 .

[2]  Suriani Mohd Sam,et al.  Data Quality in Big Data: A Review , 2015, SOCO 2015.

[3]  Rob Kitchin What does big data mean for official statistics , 2015 .

[4]  Norhayati Hussin,et al.  Issues, Challenges and Solutions of Big Data in Information Management: An Overview , 2019, International Journal of Academic Research in Business and Social Sciences.

[5]  McKinney Wes,et al.  Python for Data Analysis , 2012 .

[6]  Jacques Defourny,et al.  STRUCTURAL PATH ANALYSIS AND MULTIPLIER DECOMPOSITION WITHIN A SOCIAL ACCOUNTING MATRIX FRAMEWORK , 1984 .

[7]  D. Boyd,et al.  Six Provocations for Big Data , 2011 .

[8]  G. Dileep Kumar,et al.  Effective Big Data Management and Opportunities for Implementation , 2016 .

[9]  Piet Daas,et al.  Official statistics and Big Data , 2014 .

[10]  Jong Gun Lee,et al.  Big data for government policy: Potential implementations of bigdata for official statistics in Indonesia , 2017, 2017 International Workshop on Big Data and Information Security (IWBIS).

[11]  Armando Fandango,et al.  Python Data Analysis , 2017 .

[12]  Abhay Bhadani,et al.  Big Data: Challenges, Opportunities and Realities , 2017, ArXiv.

[13]  Piet J. H. Daas,et al.  Big Data as a Source of Statistical Information , 2014 .

[14]  Brett Slatkin,et al.  Effective Python: 59 Specific Ways to Write Better Python , 2015 .

[15]  Antonino Virgillito,et al.  Placing Big Data in Official Statistics: A Big Challenge? Paper for the New Techniques and Technologies for Statistics conference (2013) , 2013 .