Ordinal Hyperplane Loss

The problem of ordinal classification occurs in a large and growing number of areas. Some of the most common source and applications of ordinal data include rating scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity, facial age estimation, etc. The problem of predicting ordinal classes is typically addressed by either performing n-1 binary classification for n ordinal classes or treating ordinal classes as continuous values for regression. However, the first strategy doesn’t fully utilize the ordering information of classes and the second strategy imposes a strong continuous assumption to ordinal classes. In this paper, we propose a novel loss function called Ordinal Hyperplane Loss (OHPL) that is particularly designed for data with ordinal classes. The proposal of OHPL is a significant advancement in predicting ordinal class data, since it enables deep learning techniques to be applied to the ordinal classification problem on both structured and unstructured data. By minimizing OHPL, a deep neural network learns to map data to an optimal space where the distance between points and their class centroids are minimized while a nontrivial ordinal relationship among classes are maintained. Experimental results show that deep neural network with OHPL not only outperforms the state-of-the-art alternatives on classification accuracy but also scales well to large ordinal classification problems.

[1]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[2]  Xavier Lladó,et al.  Automatic mass detection in mammograms using deep convolutional neural networks , 2019, Journal of medical imaging.

[3]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[4]  Arie Ben-David,et al.  Monotonicity maintenance in information-theoretic machine learning algorithms , 2004, Machine Learning.

[5]  Gianluca Pollastri,et al.  A neural network approach to ordinal regression , 2007, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[6]  F. Reichheld The one number you need to grow. , 2003, Harvard business review.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Ling Li,et al.  Reduction from Cost-Sensitive Ordinal Ranking to Weighted Binary Classification , 2012, Neural Computation.

[9]  Bernard De Baets,et al.  Distance metric learning for ordinal classification based on triplet constraints , 2017, Knowl. Based Syst..

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Jaime S. Cardoso,et al.  Learning to Classify Ordinal Data: The Data Replication Method , 2007, J. Mach. Learn. Res..

[12]  Frank E. Harrell,et al.  Ordinal Logistic Regression , 2001 .

[13]  Ying Xie,et al.  Deep Embedding Kernel , 2018, Neurocomputing.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[16]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Mark J. van der Laan,et al.  The relative performance of ensemble methods with deep convolutional neural networks for image classification , 2017, Journal of applied statistics.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[21]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[22]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tengyu Ma,et al.  CS229 Lecture notes , 2007 .

[24]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25]  Jaime S. Cardoso,et al.  INbreast: toward a full-field digital mammographic database. , 2012, Academic radiology.

[26]  Vladimir Pavlovic,et al.  Structured Output Ordinal Regression for Dynamic Facial Emotion Intensity Prediction , 2010, ECCV.

[27]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[28]  P. Warner Ordinal logistic regression , 2008, Journal of Family Planning and Reproductive Health Care.

[29]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[30]  Lingfeng Niu,et al.  Nonparallel Support Vector Ordinal Regression , 2017, IEEE Transactions on Cybernetics.

[31]  Jaime S. Cardoso,et al.  Ordinal Data Classification Using Kernel Discriminant Analysis: A Comparison of Three Approaches , 2012, 2012 11th International Conference on Machine Learning and Applications.

[32]  Bruce Cooil,et al.  A Longitudinal Examination of Net Promoter and Firm Revenue Growth , 2007 .

[33]  Pekka Orponen,et al.  Computational complexity of neural networks , 1994 .

[34]  Pedro Antonio Gutiérrez,et al.  Ordinal Regression Methods: Survey and Experimental Study , 2016, IEEE Transactions on Knowledge and Data Engineering.

[35]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[36]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[37]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Felice Dell'Orletta,et al.  Word Embeddings in Sentiment Analysis , 2018, CLiC-it.

[41]  Adil Çoban,et al.  Sentiment Analysis on IMDB Movie Comments and Twitter Data by Machine Learning and Vector Space Techniques , 2019, ArXiv.

[42]  Daniel L Rubin,et al.  A curated mammography data set for use in computer-aided detection and diagnosis research , 2017, Scientific Data.

[43]  Stephen M. Moore,et al.  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository , 2013, Journal of Digital Imaging.

[44]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[45]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[46]  Hui Li,et al.  Deep learning in breast cancer risk assessment: evaluation of convolutional neural networks on a clinical dataset of full-field digital mammograms , 2017, Journal of medical imaging.

[47]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Carlo Luschi,et al.  Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.

[49]  Jaime S. Cardoso,et al.  An All-at-once Unimodal SVM Approach for Ordinal Classification , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[50]  Yi-Ping Hung,et al.  Ordinal hyperplanes ranker with cost sensitivities for age estimation , 2011, CVPR 2011.

[51]  Vaibhav Kant Singh Proposing Solution to XOR Problem Using Minimum Configuration MLP , 2016 .

[52]  Li Shen Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography , 2019 .

[53]  Aleix M. Martínez,et al.  Multiple Ordinal Regression by Maximizing the Sum of Margins , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[54]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[55]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Ling Li,et al.  Large-Margin Thresholded Ensembles for Ordinal Regression: Theory and Practice , 2006, ALT.

[57]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[58]  Mihaela van der Schaar,et al.  MAMMO: A Deep Learning Solution for Facilitating Radiologist-Machine Collaboration in Breast Cancer Diagnosis , 2018, ArXiv.

[59]  Willem Waegeman,et al.  An ensemble of Weighted Support Vector Machines for Ordinal Regression , 2007 .

[60]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[61]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[62]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.