An Inductive Framework for Semi-supervised Learning (Discussion Paper)

Distance-based machine learning methods have limited applicability to categorical data, since they do not capture the complexity of the relationships among different values of a categorical attribute. Nonetheless, categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Although distance learning algorithms exist for categorical data, they may disclose private information about individual records if applied to a secret dataset. To address this problem, we introduce a differentially private algorithm for learning distances between any pair of values of a categorical attribute according to the way they are co-distributed with the values of other categorical attributes forming the so-called context. We show empirically that our approach consumes little privacy budget while providing accurate distances.

[1]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[3]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[4]  Yoshua Bengio,et al.  Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[5]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[6]  Elisa Bertino,et al.  Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization , 2017, ACM Trans. Priv. Secur..

[7]  Ruggero G. Pensa,et al.  Positive and unlabeled learning in categorical data , 2016, Neurocomputing.

[8]  David W. Aha,et al.  A Probabilistic Framework for Memory-Based Reasoning , 1998, Artif. Intell..

[9]  Holger H. Hoos,et al.  A survey on semi-supervised learning , 2019, Machine Learning.

[10]  R. Pensa,et al.  ESA☆: A generic framework for semi-supervised inductive learning , 2021, Neurocomputing.

[11]  Balamurugan Anandan,et al.  Differentially Private Feature Selection for Data Mining , 2018, IWSPA@CODASPY.

[12]  Christos Faloutsos,et al.  CAMLP: Confidence-Aware Modulated Label Propagation , 2016, SDM.

[13]  Yun Li,et al.  Differentially private feature selection , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[14]  Zenglin Xu,et al.  Robust Graph Learning From Noisy Data , 2018, IEEE Transactions on Cybernetics.

[15]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[16]  Qiang Cheng,et al.  Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced Data. , 2020, IEEE transactions on neural networks and learning systems.

[17]  Yun Li,et al.  Local learning-based feature weighting with privacy preservation , 2016, Neurocomputing.

[18]  M. Cugmas,et al.  On comparing partitions , 2015 .

[19]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[20]  Zenglin Xu,et al.  Structured Graph Learning for Clustering and Semi-supervised Classification , 2020, Pattern Recognit..

[21]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[22]  Ruggero G. Pensa,et al.  A Semisupervised Approach to the Detection and Characterization of Outliers in Categorical Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Dino Ienco,et al.  Enhancing Graph-Based Semisupervised Learning via Knowledge-Aware Data Embedding , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Ruggero G. Pensa,et al.  From Context to Distance: Learning Dissimilarity for Categorical Data Clustering , 2012, TKDD.

[25]  Yücel Saygin,et al.  Differentially private nearest neighbor classification , 2017, Data Mining and Knowledge Discovery.