Learning Complex Rare Categories with Dual Heterogeneity

In the era of big data, it is often the case that the selfsimilar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named MLID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, MLID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Sham M. Kakade,et al.  An Information Theoretic Framework for Multi-view Learning , 2008, COLT.

[3]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[4]  Vikas Sindhwani,et al.  An RKHS for multi-view learning and manifold co-regularization , 2008, ICML '08.

[5]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6]  Jing Gao,et al.  On handling negative transfer and imbalanced distributions in multiple source transfer learning , 2014, SDM.

[7]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[8]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[9]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[10]  Beng Chin Ooi,et al.  BORDER: efficient computation of boundary points , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jintao Zhang,et al.  Inductive multi-task learning with multiple view data , 2012, KDD.

[12]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[13]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[14]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[15]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[16]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[17]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[18]  Jingrui He,et al.  Rare Category Characterization , 2010, 2010 IEEE International Conference on Data Mining.

[19]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[20]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[21]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[22]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[23]  Weng-Keen Wong,et al.  Category detection using hierarchical mean shift , 2009, KDD.

[24]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[25]  Jingrui He,et al.  Analysis of Rare Categories , 2012, Cognitive Technologies.

[26]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[27]  Jingrui He,et al.  A Graphbased Framework for Multi-Task Multi-View Learning , 2011, ICML.