A statistical approach to set classification by feature selection with applications to classification of histopathology images

Set classification problems arise when classification tasks are based on sets of observations as opposed to individual observations. In set classification, a classification rule is trained with N sets of observations, where each set is labeled with class information, and the prediction of a class label is performed also with a set of observations. Data sets for set classification appear, for example, in diagnostics of disease based on multiple cell nucleus images from a single tissue. Relevant statistical models for set classification are introduced, which motivate a set classification framework based on context-free feature extraction. By understanding a set of observations as an empirical distribution, we employ a data-driven method to choose those features which contain information on location and major variation. In particular, the method of principal component analysis is used to extract the features of major variation. Multidimensional scaling is used to represent features as vector-valued points on which conventional classifiers can be applied. The proposed set classification approaches achieve better classification results than competing methods in a number of simulated data examples. The benefits of our method are demonstrated in an analysis of histopathology images of cell nuclei related to liver cancer.

[1]  Trevor F. Cox,et al.  Discriminant analysis using non-metric multidimensional scaling , 1993, Pattern Recognit..

[2]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[3]  Hao Helen Zhang,et al.  Weighted Distance Weighted Discrimination and Its Asymptotic Properties , 2010, Journal of the American Statistical Association.

[4]  J. S. Marron,et al.  Distance-Weighted Discrimination , 2007 .

[5]  M. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension , 2008 .

[6]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[7]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[8]  Andrew P. Bradley,et al.  Nearest neighbour group-based classification , 2010, Pattern Recognit..

[9]  J. Friedman Regularized Discriminant Analysis , 1989 .

[10]  Wei Wang,et al.  Detection and classification of thyroid follicular lesions based on nuclear structure from histopathology images , 2010, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[11]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[12]  I. Jolliffe Principal Component Analysis , 2002 .

[13]  George Karypis,et al.  The Set Classification Problem and Solution Methods , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[14]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[15]  Robert Tibshirani,et al.  Supervised multidimensional scaling for visualization, classification, and bipartite ranking , 2011, Comput. Stat. Data Anal..

[16]  Kuang-Hu Hu,et al.  Studies on quantitative analysis and automatic recognition of cell types of lung cancer. , 2006, Bio-medical materials and engineering.

[17]  Yasunori Fujikoshi,et al.  Multivariate analysis of variance with fewer observations than the dimension , 2006 .

[18]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[19]  Jiawang Nie,et al.  Global Optimization of Polynomial Functions and Applications , 2006 .

[20]  J. Marron,et al.  The maximal data piling direction for discrimination , 2010 .

[21]  Y. Chikuse Statistics on special manifolds , 2003 .

[22]  J. S. Marron,et al.  Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA , 2012, J. Multivar. Anal..

[23]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[24]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[25]  Makoto Aoshima,et al.  Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations , 2012, J. Multivar. Anal..

[26]  A. Dempster A significance test for the separation of two highly multivariate small samples , 1960 .

[27]  Tatsuya Kubokawa,et al.  Comparison of Discrimination Methods for High Dimensional Data , 2005 .

[28]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[29]  Myung Hee Lee Continuum direction vectors in high dimensional low sample size data , 2007 .

[30]  Gustavo K. Rohde,et al.  An Optimal Transportation Approach for Nuclear Structure-Based Pathology , 2011, IEEE Transactions on Medical Imaging.

[31]  Διονύσης Α. Κάβουρας,et al.  Morphological and wavelet features towards sonographic thyroid nodules evaluation , 2015 .