Multimodal Fusion: A Review, Taxonomy, Open Challenges, Research Roadmap and Future Directions

The present work collects a plethora of previous research work in the field of multimodal fusion which despite a lot of research could not handle the imperfections. These imperfections could be at any stage initiating from the imperfections in data and its sources to imperfections in fusion strategies. Further, the work explores various applications of Neutrosophy in the field of handling imperfections along with description of previous work in this regard. These applications include the one which addresses the notion of imperfection and uncertainty among multimodal data which is being collected for fusion. In this way, the present work tries to incorporate neutrosophic logic and its applications in the field of computer vision including multimodal data fusion and information systems. It is assumed that if the notion of uncertainty is included in multimodal research, the development of newer algorithms for solving the problems of imperfections in multimodal systems will provide impetus to the existing research in this field.

[1]  Aasim Zafar,et al.  PESTEL Analysis to Identify Key Barriers to Smart Cities Development in India , 2021 .

[2]  W. B. Vasantha Kandasamy,et al.  A Novel Framework Using Neutrosophy for Integrated Speech and Text Sentiment Analysis , 2020, Symmetry.

[3]  Mouhammad Bakro,et al.  A Neutrosophic Approach to Digital Images , 2020 .

[4]  Aasim Zafar,et al.  Multimodal Information Access and Retrieval Notable Work and Milestones , 2019, 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[5]  Joonwhoan Lee,et al.  Music-Video Emotion Analysis Using Late Fusion of Multimodal , 2019, DEStech Transactions on Computer Science and Engineering.

[6]  Feiran Huang,et al.  Image-text sentiment analysis via deep multimodal attentive fusion , 2019, Knowl. Based Syst..

[7]  Ahmad Akbari,et al.  A Convolutional Neural Network model based on Neutrosophy for Noisy Speech Recognition , 2019, 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA).

[8]  Noura Hammas Traitement du signal et des images , 2018 .

[9]  Byoung-Tak Zhang,et al.  Multimodal Dual Attention Memory for Video Story Question Answering , 2018, ECCV.

[10]  A. A. Salama,et al.  Neutrosophic Approach to Grayscale Images Domain , 2018 .

[11]  Sen Wang,et al.  Multimodal sentiment analysis with word-level fusion and reinforcement learning , 2017, ICMI.

[12]  Byung Cheol Song,et al.  Multi-modal emotion recognition using semi-supervised learning and multiple neural networks in the wild , 2017, ICMI.

[13]  Frédéric Jurie,et al.  Temporal multimodal fusion for video emotion classification in the wild , 2017, ICMI.

[14]  Yanhui Guo,et al.  NS-k-NN: Neutrosophic Set-Based k-Nearest Neighbors Classifier , 2017, Symmetry.

[15]  F. Smarandache,et al.  Support-Neutrosophic Set: A New Concept in Soft Computing , 2017 .

[16]  Salah Bouzina,et al.  Fuzzy Logic vs. Neutrosophic Logic: Operations Logic , 2016 .

[17]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[18]  Anjan Mukherjee,et al.  A New Method of Measuring Similarity Between Two Neutrosophic Soft Sets and Its Application in Pattern Recognition Problems , 2015 .

[19]  Erik Cambria,et al.  Towards an intelligent framework for multimodal affective data analysis , 2015, Neural Networks.

[20]  Christian Wolf,et al.  Multi-scale Deep Learning for Gesture Detection and Localization , 2014, ECCV Workshops.

[21]  A. A. Salama,et al.  Introduction to Image Processing via Neutrosophic Techniques , 2014 .

[22]  I. Patras,et al.  High order pLSA for indexing tagged images , 2013, Signal Process..

[23]  Bernard Mérialdo,et al.  Fusion methods for multi-modal indexing of web data , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[24]  Emmanuel Dellandréa,et al.  Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme , 2013, Comput. Vis. Image Underst..


[26]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Liming Chen,et al.  Semantic Bag-of-Words Models for Visual Concept Detection and Annotation , 2012, 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems.

[28]  Hui Cheng,et al.  Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Rainer Lienhart,et al.  Multimodal Image Retrieval , 2012, International Journal of Multimedia Information Retrieval.

[30]  Xirong Li,et al.  Content-based visual search learned from social media , 2012, ACMMR.

[31]  Rada Mihalcea,et al.  Towards multimodal sentiment analysis: harvesting opinions from the web , 2011, ICMI '11.

[32]  Emmanuel Dellandréa,et al.  Associating Textual Features with Visual Ones to Improve Affective Image Classification , 2011, ACII.

[33]  Meng Wang,et al.  Tag Tagging: Towards More Descriptive Keywords of Image Content , 2011, IEEE Transactions on Multimedia.

[34]  Yanhui Guo,et al.  Color texture image segmentation based on neutrosophic set and wavelet transformation , 2011, Comput. Vis. Image Underst..

[35]  Stéphane Marchand-Maillet,et al.  Effective multimodal information fusion by structure learning , 2011, 14th International Conference on Information Fusion.

[36]  Motoaki Kawanabe,et al.  Multi-modal visual concept classification of images via Markov random walk over tags , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[37]  Liang-Tien Chia,et al.  Web image concept annotation with better understanding of tags and visual features , 2010, J. Vis. Commun. Image Represent..

[38]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[39]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[40]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[41]  Gareth J. F. Jones,et al.  A Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task , 2010, CLEF.

[42]  Gang Wang,et al.  Automatic Generation of Semantic Fields for Annotating Web Images , 2010, COLING.

[43]  C. V. Jawahar,et al.  Multi modal semantic indexing for image retrieval , 2010, CIVR '10.

[44]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Ming Zhang,et al.  A neutrosophic approach to image segmentation based on watershed method , 2010, Signal Process..

[46]  Kilian Q. Weinberger,et al.  Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases , 2009, WSMC '09.

[47]  Meng Wang,et al.  Visual tag dictionary: interpreting tags with visual words , 2009, WSMC '09.

[48]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[49]  Hao Xu,et al.  Tag refinement by regularized LDA , 2009, ACM Multimedia.

[50]  C. Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[51]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[52]  Rainer Lienhart,et al.  Multilayer pLSA for multimodal image retrieval , 2009, CIVR '09.

[53]  Dong Liu,et al.  Tag quality improvement for social images , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[54]  Derek Hoiem,et al.  Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Hugo Jair Escalante,et al.  Late fusion of heterogeneous methods for multimedia image retrieval , 2008, MIR '08.

[56]  Kilian Q. Weinberger,et al.  Resolving tag ambiguity , 2008, ACM Multimedia.

[57]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[58]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[59]  F. Smarandache A geometric interpretation of the neutrosophic set — A generalization of the intuitionistic fuzzy set , 2004, 2011 IEEE International Conference on Granular Computing.

[60]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[61]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[62]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[63]  D. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[64]  Abhijit Saha,et al.  Neutrosophic Soft Sets Applied on Incomplete Data , 2020 .

[65]  Aasim Zafar A Mathematical Model to Analyze the Role of Uncertain and Indeterminate Factors in the Spread of Pandemics like COVID-19 Using Neutrosophy: A Case Study of India , 2020 .

[66]  S. A. Hernández,et al.  Neutrosophic Psychology for Emotional Intelligence Analysis in Students of the Autonomous University of Los Andes, Ecuador , 2020 .

[67]  A. A. Salama,et al.  Neutrosophic Sets and Systems Neutrosophic Sets and Systems , 2020 .

[68]  F. Smarandache,et al.  Machine learning in Neutrosophic Environment: A Survey , 2019 .

[69]  G. Jayaparthasarathy,et al.  Neutrosophic Supra Topological Applications in Data Mining Process , 2019 .

[70]  Aasim Zafar,et al.  Neutrosophic Cognitive Maps for Situation Analysis , 2019 .

[71]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[72]  O. M. Khaled,et al.  Neutrosophic Correlation and Simple Linear Regression , 2014 .

[73]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[74]  Stefanie Nowak,et al.  The Fraunhofer IDMT at ImageCLEF 2011 Photo Annotation Task , 2011, CLEF.

[75]  Stefanie Nowak,et al.  The CLEF 2011 Photo Annotation and Concept-based Retrieval Tasks , 2011, CLEF.

[76]  Grigorios Tsoumakas,et al.  MLKD's Participation at the CLEF 2011 Photo Annotation and Concept-Based Retrieval Tasks , 2011, CLEF.

[77]  Stéphane Marchand-Maillet,et al.  Interactive Representations of Multimodal Databases , 2010 .

[78]  R. Krishnan,et al.  Case study in India , 2009 .

[79]  J. Charles,et al.  A Sino-German λ 6 cm polarization survey of the Galactic plane I . Survey strategy and results for the first survey region , 2006 .

[80]  Fabrice Souvannavong,et al.  Multi-modal classifier fusion for video shot content retrieval , 2005 .

[81]  Isabelle Bloch,et al.  Fusion of Image Information under Imprecision and Uncertainty: Numerical Methods , 2001, Data Fusion and Perception.

[82]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[83]  Dragomir R. Radev,et al.  of the Association for Computational Linguistics , 2022 .