论文信息 - A Novel Multi-Modal One-Shot Learning Method for Texture Recognition

A Novel Multi-Modal One-Shot Learning Method for Texture Recognition

Most machine learning algorithms require a large set of training samples in order to achieve satisfactory performance. However, this requirement may be difficult to satisfy in practice. Take the one-shot learning (OSL) problem on texture recognition for example; the machine learning algorithm is difficult to achieve satisfactory results. In order to solve this problem, a novel multi-modal one-shot learning method for texture recognition is presented. First, in order to improve the robustness of identification and the anti-interference to noise, we addressed the nontravel texture recognition challenges of learn information about object categories from only one training sample by fusing varied modalities data, including image, sound and acceleration, which provides rich information regarding textures. Second, a novel dictionary learning model is designed, which contains the various modalities information, and can simultaneously learn the latent common sparse code for the different modalities. Third, an original regularization term is developed to enhance the degree of distinction of different classes. Furthermore, the common features of the three modalities are evaluated in the case of one-shot learning and used as the basis for feature selection. In the end, experiments were performed based on a data set which was published openly to validate the effectiveness of the presented method.

[1] Joseph F. Murray,et al. Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[2] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3] M. Sayadi,et al. Color image segmentation based on Dempster-Shafer evidence theory , 2008, MELECON 2008 - The 14th IEEE Mediterranean Electrotechnical Conference.

[4] Fuchun Sun,et al. Multimodal Measurements Fusion for Surface Material Categorization , 2018, IEEE Transactions on Instrumentation and Measurement.

[5] Gerald E. Loeb,et al. Bayesian Exploration for Intelligent Identification of Textures , 2012, Front. Neurorobot..

[6] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[7] Jean-Philippe Thiran,et al. Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition , 2008, ICMI '08.

[8] Aiguo Song,et al. A Novel Texture Sensor for Fabric Texture Measurement and Classification , 2014, IEEE Transactions on Instrumentation and Measurement.

[9] Eckehard G. Steinbach,et al. A haptic texture database for tool-mediated texture recognition and classification , 2014, 2014 IEEE International Symposium on Haptic, Audio and Visual Environments and Games (HAVE) Proceedings.

[10] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[11] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[12] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[13] Eckehard G. Steinbach,et al. Content-based surface material retrieval , 2017, 2017 IEEE World Haptics Conference (WHC).

[14] Mohan S. Kankanhalli,et al. Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[15] Iasonas Kokkinos,et al. Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[16] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Nitish V. Thakor,et al. Unsupervised Learning and Adaptive Classification of Neuromorphic Tactile Encoding of Textures , 2018, 2018 IEEE Biomedical Circuits and Systems Conference (BioCAS).

[18] Eckehard G. Steinbach,et al. Multimodal Feature-Based Surface Material Classification , 2017, IEEE Transactions on Haptics.

[19] Di Guo,et al. Cross-Modal Zero-Shot-Learning for Tactile Object Recognition , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[20] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[21] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[22] Fuchun Sun,et al. Visual–Tactile Fusion for Object Recognition , 2017, IEEE Transactions on Automation Science and Engineering.

[23] Fabrício Martins Lopes,et al. Classification of texture based on Bag-of-Visual-Words through complex networks , 2019, Expert Syst. Appl..

[24] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Stephen Boyd,et al. A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[26] Maja Pantic,et al. The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[27] Joshua B. Tenenbaum,et al. One-shot learning by inverting a compositional causal process , 2013, NIPS.

[28] Stefan M. Rüger,et al. Information-theoretic semantic multimedia indexing , 2007, CIVR '07.

[29] Véronique Perdereau,et al. Tactile sensing in dexterous robot hands - Review , 2015, Robotics Auton. Syst..

[30] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31] Guillermo Sapiro,et al. Online dictionary learning for sparse coding , 2009, ICML '09.

[32] John R. Smith,et al. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[33] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.