Mapping client messages to a unified data model with mixture feature embedding convolutional neural network

Data mapping among different data standards in health institutes is often a necessity when data exchanges occur among different institutes. However, no matter rule-based approaches or traditional machine learning methods, none of these methods have achieved satisfactory results yet. In this work, we propose a deep learning method, mixture feature embedding convolutional neural network (MfeCNN), to convert the data mapping to a multiple classification problem. Multi-modal features were extracted from different semantic space with a medical NLP package and powerful feature embeddings were generated by MfeCNN. Classes as many as ten were classified simultaneously by a fully-connected soft-max layer based on multi-view embedding. Experimental results show that our proposed MfeCNN achieved best results than traditional state-of-the-art machine learning models and also much better results than the convolutional neural network of only using bag-of-words as inputs.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[3]  Phokion G. Kolaitis,et al.  Designing and refining schema mappings via data examples , 2011, SIGMOD '11.

[4]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[5]  Tong Zhang,et al.  Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding , 2015, NIPS.

[6]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[7]  James Geller,et al.  Rule-based support system for multiple UMLS semantic type assignments , 2013, J. Biomed. Informatics.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[10]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[11]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[12]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[13]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[14]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[15]  Erhard Rahm,et al.  Quickmig: automatic schema matching for data migration projects , 2007, CIKM '07.

[16]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  M Halper,et al.  Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies. , 2016, Methods of information in medicine.

[18]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.