论文信息 - Exploring Data Analytics of Data Variety - 字舞流文

Exploring Data Analytics of Data Variety

The Internet allows organizations managers access to large amounts of data, and this data are presented in different formats, i.e., data variety, namely structured, semi-structured and unstructured. Based on the Internet, this data variety is partly derived from social networks, but not only, machines are also capable of sharing information among themselves, or even machines with people. The objective of this paper is to understand how to retrieve information from data analysis with data variety. An experiment was carried out, based on a dataset with two distinct data types, images and comments on cars. Techniques of data analysis were used, namely Natural Language Processing to identify patterns, and Sentimental and Emotional Analysis. The image recognition technique was used to associate a car model with a category. Next, OLAP cubes and their visualization through dashboards were created. This paper concludes that it is possible to extract a set of relevant information, namely identifying which cars people like more/less, among other information.

José Luís Pereira | Jorge Oliveira e Sá | Tiago Cruz | J. Pereira | Tiago Cruz

[1] Gerhard Weikum,et al. People on drugs: credibility of user statements in health communities , 2014, KDD.

[2] Bas Geerdink,et al. A reference architecture for big data solutions introducing a model to perform predictive analytics using big data technology , 2013, 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013).

[3] Varun Aggarwal,et al. Spoken English Grading: Machine Learning with Crowd Intelligence , 2015, KDD.

[4] Ram Akella,et al. Dynamically Modeling Patient's Health State from Electronic Medical Records: A Time Series Approach , 2015, KDD.

[5] Tomoharu Iwata,et al. Probabilistic latent network visualization: inferring and embedding diffusion networks , 2014, KDD.

[6] Hans-Peter Kriegel,et al. SigniTrend: scalable detection of emerging topics in textual streams by hashed significance thresholds , 2014, KDD.

[7] Vassilis Poulopoulos,et al. PaloPro: a platform for knowledge extraction from big social data and the news , 2017, Int. J. Big Data Intell..

[8] Bing Liu,et al. Mining topics in documents: standing on the shoulders of big data , 2014, KDD.

[9] Hans-Georg Kemper,et al. Management Support with Structured and Unstructured Data—An Integrated Business Intelligence Framework , 2008, Inf. Syst. Manag..

[10] Ling Liu,et al. Integrating Vertex-centric Clustering with Edge-centric Clustering for Meta Path Graph Analysis , 2015, KDD.

[11] ChengXiang Zhai,et al. Opinion-based entity ranking , 2012, Information Retrieval.

[12] Giuseppe Ottaviano,et al. Fast and Space-Efficient Entity Linking for Queries , 2015, WSDM.

[13] Jamil Salem Barbar,et al. Computer network traffic prediction: a comparison between traditional and deep learning neural networks , 2015, Int. J. Big Data Intell..

[14] Hiroki Arimura,et al. Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[15] George Dimitrakopoulos,et al. A knowledge-based integrated framework for increasing social management intelligence , 2017, Int. J. Big Data Intell..

[16] Wei Zhang,et al. Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[17] Hanghang Tong,et al. QUINT: On Query-Specific Optimal Networks , 2016, KDD.

[18] Wei Chu,et al. Learning to Recommend Related Entities to Search Users , 2015, WSDM.

[19] Shuai Wang,et al. Targeted Topic Modeling for Focused Analysis , 2016, KDD.

[20] Moustafa Ghanem,et al. Multivariate adaptive community detection in Twitter , 2016, Int. J. Big Data Intell..

[21] Panagiotis G. Ipeirotis,et al. Hiring Behavior Models for Online Labor Markets , 2015, WSDM.

[22] Peter J. Haas,et al. Predicting Future Scientific Discoveries Based on a Networked Analysis of the Past Literature , 2015, KDD.