Embedding-based Representation of Categorical Data by Hierarchical Value Coupling Learning

Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i.e., a coupled data embedding (CDE) method, for representing categorical data by hierarchical value-to-value cluster coupling learning. Unlike existing embedding- and similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster values with different granularities. It further models the couplings in value clusters within the same granularity and with different granularities to embed feature values into a new numerical space with independent dimensions. Substantial experiments show that CDE significantly outperforms three popular unsupervised embedding methods and three state-of-the-art similarity-based representation methods.

[1]  Longbing Cao,et al.  Non-IIDness Learning in Behavioral and Social Data , 2014, Comput. J..

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[4]  Hong Jia,et al.  A New Distance Metric for Unsupervised Learning of Categorical Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[6]  Ruggero G. Pensa,et al.  From Context to Distance: Learning Dissimilarity for Categorical Data Clustering , 2012, TKDD.

[7]  Maguelonne Teisseire,et al.  Data & Knowledge Engineering , 2015 .

[8]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[9]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[10]  Philip S. Yu,et al.  Coupled Behavior Analysis with Applications , 2012, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Fei Zhou,et al.  Coupled Attribute Similarity Learning on Categorical Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Irene Koshik,et al.  Journal of the american society for information science and technology-2012 , 2012 .

[15]  Ralph A. Szweda,et al.  Information processing management , 1972 .

[16]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[18]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[19]  Ivan Marsic,et al.  From Categorical to Numerical: Multiple Transitive Distance Learning and Embedding , 2015, SDM.

[20]  Huidong Jin,et al.  ZERO++: Harnessing the Power of Zero Appearances to Detect Anomalies in Large-Scale Data Sets , 2016, J. Artif. Intell. Res..

[21]  Longbing Cao,et al.  Coupling learning of complex interactions , 2015, Inf. Process. Manag..

[22]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[23]  Ling Chen,et al.  Unsupervised Feature Selection for Outlier Detection by Modelling Hierarchical Value-Feature Couplings , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[24]  Peter A. Chew,et al.  Term Weighting Schemes for Latent Dirichlet Allocation , 2010, NAACL.

[25]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[26]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Noah A. Smith,et al.  Proceedings of NIPS , 2010, NIPS 2010.

[29]  Ling Chen,et al.  Outlier Detection in Complex Categorical Data by Modeling the Feature Value Couplings , 2016, IJCAI.

[30]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Osmar R. Zaïane,et al.  A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[32]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[33]  AhmadAmir,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007 .

[34]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.