Exploratory Analysis for Big Social Data Using Deep Network

Exploratory analysis is an important way to gain understanding and find unknown relationships from various data sources, especially in the era of big data. Traditional paradigms of social science data analysis follow the steps of feature selection, modeling, and prediction. In this paper, we propose a new paradigm that does not require feature selection so that data can speak for itself without manually picking out features. Besides, we propose using the deep network as a methodology to explore previously unknown relationships and capture complexity and non-linearity between target variables and a large number of input features for big social data. The new paradigm tends to be a relatively generic approach that can be widely used in different scenarios. In order to validate the feasibility of the paradigm, we use country-level indicators forecasting as a case study. The process includes: 1) data collection and preparation and 2) modeling and experiment. The data collection and preparation part builds a data warehouse and conducts the extract-transform-load process to eliminate data format inconsistency. The modeling and experiment part includes model setup and model structures change to achieve relatively high accuracy on prediction results at both model level and case level. We find some patterns about network capacity modification and the influence of time interval difference on the test results, whereas both of them deserve further research.

[1]  William A. Giovinazzo Object-Oriented Data Warehouse Design: Building A Star Schema , 2000 .

[2]  Yike Guo,et al.  TensorLayer: A Versatile Library for Efficient Deep Learning Development , 2017, ACM Multimedia.

[3]  Yue Gao,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016, IEEE Transactions on Image Processing.

[4]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[5]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[6]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[7]  Luisa Turrin Fernholz,et al.  Exploratory data analysis , 2018 .

[8]  Jake M. Hofman,et al.  Prediction and explanation in social systems , 2017, Science.

[9]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[10]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  Robert J. Kauffman,et al.  Understanding the paradigm shift to computational social science in the presence of big data , 2014, Decis. Support Syst..

[13]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[14]  Jonathan Levin,et al.  Economics in the age of big data , 2014, Science.

[15]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  J. Fox Applied Regression Analysis, Linear Models, and Related Methods , 1997 .

[18]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Sandra González-Bailón Social Science in the Era of Big Data , 2013 .

[20]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[21]  Yike Guo,et al.  Exploratory Analysis of Big Social Data Using MIC/MINE Statistics , 2017, SocInfo.

[22]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[23]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[24]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[25]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[26]  L. Manovich,et al.  Trending: The Promises and the Challenges of Big Social Data , 2012 .

[27]  Duncan Cramer,et al.  Quantitative data analysis for social scientists , 1990 .

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..