Automatic Label Data Abstraction Based on Information Entropy (Application Paper)

There is currently a big demand for automating big data analysis. In the data analysis field, data abstraction or summarization playes an important role in the extraction of generalized information from large scale data. We developped an artificial intelligence computer system with the aim of automating big data analysis and came up with a method that can abstract numerical type data (age, height, time, etc.). However, it could not abstract or summarize label type data (customer ID, product code, name, etc.). In the present work, we have developed a label abstraction method based on information entropy. Experiments using open real data showed that the proposed method achieved an extraction accuracy of 80% evaluated by f measure. We intended to apply the proposed method to our artificial intelligence and perform further evaluations.

[1]  K. Yano,et al.  Measurement of Human Behavior: Creating a Society for Discovering Opportunities , 2009 .

[2]  Murphy Choy,et al.  Predicting Airline Passenger Load: A Case Study , 2014, 2014 IEEE 16th Conference on Business Informatics.

[3]  Kazuo Yano,et al.  Sensor-based Knowledge Discovery from a Large Quantity of Situational Variables , 2013, PACIS.

[4]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[5]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[6]  Fernando Almeida,et al.  The main challenges and issues of big data management , 2013 .

[7]  Martin L. Kersten,et al.  MonetDB/DataCell: Online Analytics in a Streaming Column-Store , 2012, Proc. VLDB Endow..

[8]  Sérgio Guerreiro Decision-Making in Partially Observable Environments , 2014, 2014 IEEE 16th Conference on Business Informatics.

[9]  Aditya Kalyanpur,et al.  Automatic knowledge extraction from documents , 2012, IBM J. Res. Dev..

[10]  Ian Horrocks,et al.  Distributed Query Processing on the Cloud: the Optique Point of View (Short Paper) , 2013, OWLED.

[11]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[12]  Jeffrey Heer,et al.  Predictive Interaction for Data Transformation , 2015, CIDR.

[13]  Ling Liu,et al.  Computing infrastructure for big data processing , 2013, Frontiers of Computer Science.

[14]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[15]  Tomoaki Akitomi,et al.  An Artificial Intelligence Computer System for Analysis of Social-Infrastructure Data , 2015, 2015 IEEE 17th Conference on Business Informatics.

[16]  Christof Weinhardt,et al.  Decision-Making Based on Incident Data Analysis , 2014, 2014 IEEE 16th Conference on Business Informatics.