A First Look at Information Entropy-Based Data Pricing

Distribution of intangible information goods is experiencing tremendous growth in recent years, which has facilitated a blossoming of information goods economics. As big data develops, there are more and more information goods markets for data trading. In the current of data pricing policies in data trading, there are many metrics to measure the value of data goods, such as the data generation date, data volume, and data integrity, etc. However, it is very challenging to identify the amount of data information and its distribution, and the corresponding data pricing has rarely been discussed. In this paper, we propose a new data pricing metric, i.e., the data information entropy, which helps to make a reasonable price in the data trading. We first demonstrate a data information measurement method based on information entropy, and then propose a pricing function based on the result of data information measurement. To comprehensively understand the new data pricing metric and facilitate its application in data trading, we verify the rationality of the data information measurement method and give three concrete pricing functions. It is the first time to look at the information entropy-based data pricing, which can inspire the research concerning the pricing mechanism of data goods, further promoting the development of data products business.

[1]  Qiang Yang,et al.  Differential Privacy in Telco Big Data Platform , 2015, Proc. VLDB Endow..

[2]  Lori Bowen Ayre,et al.  Open Data: What It Is and Why You Should Care , 2017, Public Libr. Q..

[3]  Avanish Kushal,et al.  Pricing for Data Markets , 2011 .

[4]  Florian Stahl,et al.  Pricing Approaches for Data Markets , 2012, BIRTE.

[5]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[6]  Dan Suciu,et al.  Query-Based Data Pricing , 2015, J. ACM.

[7]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[8]  Dan Suciu,et al.  QueryMarket Demonstration: Pricing for Online Data Markets , 2012, Proc. VLDB Endow..

[9]  Qiang Yang,et al.  Test-cost sensitive naive Bayes classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[10]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[11]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[12]  Dan Suciu,et al.  Data Markets in the Cloud: An Opportunity for the Database Community , 2011, Proc. VLDB Endow..

[13]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[14]  Yannis Bakos,et al.  Bundling Information Goods: Pricing, Profits and Efficiency , 1998 .

[15]  H. Varian,et al.  Aggregation and Disaggregation of Information Goods: Implications for Bundling, Site Licensing, and Micropayment Systems , 2000 .

[16]  Hyunsoo Kim,et al.  Data Reduction in Support Vector Machines by a Kernelized Ionic Interaction Model , 2004, SDM.

[17]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .