MMCNet: deep learning–based multimodal classification model using dynamic knowledge

Because of the growth of the business sector dealing in the distribution of movies, software, music, and other contents, a very large amount of contents has accumulated. Accordingly, recommendation systems for inducing user requests for contents are more important. In distribution businesses, accurate content recommendations are required to secure and retain users. To establish a highly accurate recommendation system, the recommended contents must be accurately classified. As classification methods, mainly techniques such as naive Bayes, SGD (stochastic gradient descent), and SVM (support vector machine), are utilized. If all of the information on recommended subjects is applied in the classification process, high-level accuracy can be expected, but heavy calculation, a long service time, and low scalability are incurred. Given this inefficiency, effective classification in which the metadata of contents are used is required. Metadata are expressed in the forms of the domain concept, relation, type, and attribute to allow the complicated relations between multimodal data (text, images, and video) to be processed efficiently. Most classification systems use single modal data to express one piece of knowledge for an item in a domain. Single modal data are limited in terms of improving classification accuracy, because they do not include the useful information provided by different knowledge types. Therefore, in this paper, we propose MMCNet, a deep learning–based multimodal classification model that uses dynamic knowledge. The proposed method consists of a classification model that applies the human learning principle-based CNN (convolution neural network) to multimodal data in combination with text and image knowledge. By using a Web robot agent, multimodal data are collected from the TMDb (The Movie Database) data set, which includes a variety of single modal data. In the preprocessing procedures, knowledge integration, knowledge conversion, and knowledge reduction are performed to create a quantified knowledge base. To handle text data, sentences are refined through morphological analysis and converted to numerical vectors by using word embedding. Image data are converted to numerical vectors using the library related to vector conversion. The converted feature vectors are utilized to create multimodal learning data and the classification model is used for learning. To solve the problem of memory operation resources, vector model-based meta-knowledge is expanded through expression, conversion, alignment, inference, and deep learning. To evaluate its performance, the proposed model was compared with conventional classification methods in terms of accuracy, recall, and F1-score. According to this evaluation, the proposed classification model improves the accuracy, recall, and F1-score rates more than the conventional methods. In addition, the proposed model was implemented as a deep learning–based multimodal classification system in a graphical user interface environment that allows users to provide feedback about the classification results by adjusting classification parameters. Through the convergence of the knowledge bases of various domains and multimodal deep learning, the dynamic knowledge that influences user preference is inferred.

[1]  Kun-Ho Yoon,et al.  Picocell based telemedicine health service for human UX/UI , 2014, Multimedia Tools and Applications.

[2]  Kyung-Yong Chung,et al.  Ontology-driven slope modeling for disaster management service , 2015, Cluster Computing.

[3]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  H. Robbins A Stochastic Approximation Method , 1951 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[7]  Kyung-Yong Chung,et al.  Associative Feature Information Extraction Using Text Mining from Health Big Data , 2019, Wirel. Pers. Commun..

[8]  Kyung-Yong Chung,et al.  PHR Based Diabetes Index Service Model Using Life Behavior Analysis , 2017, Wirel. Pers. Commun..

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Kyung-Yong Chung,et al.  Associative context mining for ontology-driven hidden knowledge discovery , 2016, Cluster Computing.

[12]  Vikas Sindhwani,et al.  Recommender Systems , 2010, Encyclopedia of Machine Learning and Data Mining.

[13]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[14]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Kyung-Yong Chung,et al.  Depression Index Service Using Knowledge Based Crowdsourcing in Smart Health , 2016, Wireless Personal Communications.

[17]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[18]  Kyung-Yong Chung,et al.  Emerging risk forecast system using associative index mining analysis , 2017, Cluster Computing.

[19]  Kyung-Yong Chung,et al.  Mining-based lifecare recommendation using peer-to-peer dataset and adaptive decision feedback , 2018, Peer-to-Peer Netw. Appl..

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[22]  D. Cox The Regression Analysis of Binary Sequences , 2017 .

[23]  Hoill Jung,et al.  Life style improvement mobile service for high risk chronic disease based on PHR platform , 2016, Cluster Computing.

[24]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[25]  Kyung-Yong Chung,et al.  Mining Based Time-Series Sleeping Pattern Analysis for Life Big-Data , 2018, Wirel. Pers. Commun..

[26]  Alfred Kobsa,et al.  The effect of personalization provider characteristics on privacy attitudes and behaviors: An Elaboration Likelihood Model approach , 2016, J. Assoc. Inf. Sci. Technol..

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Raouf Boutaba,et al.  Recent Trends in Digital Convergence Information System , 2014, Wireless Personal Communications.

[29]  Jung-Hyun Lee,et al.  Interactive Design Recommendation Using Sensor Based Smart Wear and Weather WebBot , 2013, Wireless Personal Communications.

[30]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[32]  Kyung-Yong Chung,et al.  Knowledge-based health service considering user convenience using hybrid Wi-Fi P2P , 2016, Inf. Technol. Manag..

[33]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[34]  Kyung-Yong Chung,et al.  Heart rate variability based stress index service model using bio-sensor , 2018, Cluster Computing.

[35]  Kyung-Yong Chung,et al.  Prediction Model of User Physical Activity using Data Characteristics-based Long Short-term Memory Recurrent Neural Networks , 2019, KSII Trans. Internet Inf. Syst..

[36]  Wenbin Yao,et al.  SORD: a new strategy of online replica deduplication in Cloud-P2P , 2018, Cluster Computing.

[37]  Kyung-Yong Chung,et al.  Blockchain Network Based Topic Mining Process for Cognitive Manufacturing , 2018, Wireless Personal Communications.

[38]  Fernando Ortega,et al.  A collaborative filtering approach to mitigate the new user cold start problem , 2012, Knowl. Based Syst..

[39]  Kyung-Yong Chung,et al.  Knowledge-based dietary nutrition recommendation for obese management , 2016, Inf. Technol. Manag..

[40]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[41]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.