论文信息 - Ordinal Hyperplane Loss

Ordinal Hyperplane Loss

The problem of ordinal classification occurs in a large and growing number of areas. Some of the most common source and applications of ordinal data include rating scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity, facial age estimation, etc. The problem of predicting ordinal classes is typically addressed by either performing n-1 binary classification for n ordinal classes or treating ordinal classes as continuous values for regression. However, the first strategy doesn’t fully utilize the ordering information of classes and the second strategy imposes a strong continuous assumption to ordinal classes. In this paper, we propose a novel loss function called Ordinal Hyperplane Loss (OHPL) that is particularly designed for data with ordinal classes. The proposal of OHPL is a significant advancement in predicting ordinal class data, since it enables deep learning techniques to be applied to the ordinal classification problem on both structured and unstructured data. By minimizing OHPL, a deep neural network learns to map data to an optimal space where the distance between points and their class centroids are minimized while a nontrivial ordinal relationship among classes are maintained. Experimental results show that deep neural network with OHPL not only outperforms the state-of-the-art alternatives on classification accuracy but also scales well to large ordinal classification problems.

Ying Xie | Bob Vanderheyden

[1] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[2] Xavier Lladó,et al. Automatic mass detection in mammograms using deep convolutional neural networks , 2019, Journal of medical imaging.

[3] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[4] Arie Ben-David,et al. Monotonicity maintenance in information-theoretic machine learning algorithms , 2004, Machine Learning.

[5] Gianluca Pollastri,et al. A neural network approach to ordinal regression , 2007, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[6] F. Reichheld. The one number you need to grow. , 2003, Harvard business review.

[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8] Ling Li,et al. Reduction from Cost-Sensitive Ordinal Ranking to Weighted Binary Classification , 2012, Neural Computation.

[9] Bernard De Baets,et al. Distance metric learning for ordinal classification based on triplet constraints , 2017, Knowl. Based Syst..

[10] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11] Jaime S. Cardoso,et al. Learning to Classify Ordinal Data: The Data Replication Method , 2007, J. Mach. Learn. Res..

[12] Frank E. Harrell,et al. Ordinal Logistic Regression , 2001 .

[13] Ying Xie,et al. Deep Embedding Kernel , 2018, Neurocomputing.

[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[15] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[16] Yair Movshovitz-Attias,et al. No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17] Mark J. van der Laan,et al. The relative performance of ensemble methods with deep convolutional neural networks for image classification , 2017, Journal of applied statistics.

[18] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20] Eibe Frank,et al. A Simple Approach to Ordinal Classification , 2001, ECML.

[21] Klaus Obermayer,et al. Support vector learning for ordinal regression , 1999 .

[22] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Tengyu Ma,et al. CS229 Lecture notes , 2007 .

[24] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25] Jaime S. Cardoso,et al. INbreast: toward a full-field digital mammographic database. , 2012, Academic radiology.

[26] Vladimir Pavlovic,et al. Structured Output Ordinal Regression for Dynamic Facial Emotion Intensity Prediction , 2010, ECCV.

[27] Wes McKinney,et al. Data Structures for Statistical Computing in Python , 2010, SciPy.

[28] P. Warner. Ordinal logistic regression , 2008, Journal of Family Planning and Reproductive Health Care.

[29] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[30] Lingfeng Niu,et al. Nonparallel Support Vector Ordinal Regression , 2017, IEEE Transactions on Cybernetics.

[31] Jaime S. Cardoso,et al. Ordinal Data Classification Using Kernel Discriminant Analysis: A Comparison of Three Approaches , 2012, 2012 11th International Conference on Machine Learning and Applications.

[32] Bruce Cooil,et al. A Longitudinal Examination of Net Promoter and Firm Revenue Growth , 2007 .

[33] Pekka Orponen,et al. Computational complexity of neural networks , 1994 .

[34] Pedro Antonio Gutiérrez,et al. Ordinal Regression Methods: Survey and Experimental Study , 2016, IEEE Transactions on Knowledge and Data Engineering.

[35] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[36] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .

[37] Wei Chu,et al. New approaches to support vector ordinal regression , 2005, ICML.

[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Felice Dell'Orletta,et al. Word Embeddings in Sentiment Analysis , 2018, CLiC-it.

[41] Adil Çoban,et al. Sentiment Analysis on IMDB Movie Comments and Twitter Data by Machine Learning and Vector Space Techniques , 2019, ArXiv.

[42] Daniel L Rubin,et al. A curated mammography data set for use in computer-aided detection and diagnosis research , 2017, Scientific Data.

[43] Stephen M. Moore,et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository , 2013, Journal of Digital Imaging.

[44] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[45] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[46] Hui Li,et al. Deep learning in breast cancer risk assessment: evaluation of convolutional neural networks on a clinical dataset of full-field digital mammograms , 2017, Journal of medical imaging.

[47] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.

[49] Jaime S. Cardoso,et al. An All-at-once Unimodal SVM Approach for Ordinal Classification , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[50] Yi-Ping Hung,et al. Ordinal hyperplanes ranker with cost sensitivities for age estimation , 2011, CVPR 2011.

[51] Vaibhav Kant Singh. Proposing Solution to XOR Problem Using Minimum Configuration MLP , 2016 .

[52] Li Shen. Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography , 2019 .

[53] Aleix M. Martínez,et al. Multiple Ordinal Regression by Maximizing the Sum of Margins , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[54] Wei Chu,et al. Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[55] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Ling Li,et al. Large-Margin Thresholded Ensembles for Ordinal Regression: Theory and Practice , 2006, ALT.

[57] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[58] Mihaela van der Schaar,et al. MAMMO: A Deep Learning Solution for Facilitating Radiologist-Machine Collaboration in Breast Cancer Diagnosis , 2018, ArXiv.

[59] Willem Waegeman,et al. An ensemble of Weighted Support Vector Machines for Ordinal Regression , 2007 .

[60] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[61] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[62] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.