A Novel Hybrid Linear/Nonlinear Classifier for Two-Class Classification: Theory, Algorithm, and Applications

Classifier design for a given classification task needs to take into consideration both the complexity of the classifier and the size of the dataset that is available for training the classifier. With limited training data, as often is the situation in computer-aided diagnosis of medical images, a classifier with simple structure (e.g., a linear classifier) is more robust and therefore preferred. We propose a novel two-class classifier, which we call a hybrid linear/nonlinear classifier (HLNLC), that involves two stages: the input features are linearly combined to form a scalar variable in the first stage and then the likelihood ratio of the scalar variable is used as the decision variable for classification. We first develop the theory of HLNLC by assuming that the feature data follow normal distributions. We show that the commonly used Fisher's linear discriminant function is generally not the optimal linear function in the first stage of the HLNLC. We formulate an optimization problem to solve for the optimal linear function in the first stage of the HLNLC, i.e., the linear function that maximizes the area under the receiver operating characteristic (ROC) curve of the HLNLC. For practical applications, we propose a robust implementation of the HLNLC by making a loose assumption that the two-class feature data arise from a pair of latent (rather than explicit) multivariate normal distributions. The novel hybrid classifier fills a gap between linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) in the sense that both its theoretical performance and its complexity lie between those of the LDA and those of the QDA. Simulation studies show that the hybrid linear/nonlinear classifier performs better than LDA without increasing the classifier complexity accordingly. With a finite number of training samples, the HLNLC can perform better than that of the ideal observer due to its simplicity. Finally, we demonstrate the application of the HLNLC in computer-aided diagnosis of breast lesions in ultrasound images.

[1]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[2]  C E Metz,et al.  Gains in Accuracy from Replicated Readings of Diagnostic Images , 1992, Medical decision making : an international journal of the Society for Medical Decision Making.

[3]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[4]  Zhimin Huo,et al.  Computerized analysis of digitized mammograms of BRCA1 and BRCA2 gene mutation carriers. , 2002, Radiology.

[5]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[6]  Maryellen L. Giger,et al.  Hybrid linear classifier for jointly normal data: theory , 2008, SPIE Medical Imaging.

[7]  Maryellen L. Giger,et al.  Ideal observer approximation using Bayesian classification neural networks , 2001, IEEE Transactions on Medical Imaging.

[8]  J. Friedman Regularized Discriminant Analysis , 1989 .

[9]  You-yen. Yang Classification into two multivariate normal distributions with different covariance matrices , 1965 .

[10]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[11]  Maryellen L. Giger,et al.  Simulation studies of data classification by artificial neural networks: Potential applications in medical imaging and decision making , 2009, Journal of Digital Imaging.

[12]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[13]  M. Giger,et al.  Breast US computer-aided diagnosis workstation: performance with a large clinical diagnostic population. , 2008, Radiology.

[14]  M. Giger,et al.  Malignant and benign clustered microcalcifications: automated feature analysis and classification. , 1996, Radiology.

[15]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[16]  H. Sittek,et al.  Computer-aided diagnosis in mammography , 1997, Der Radiologe.

[17]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[18]  M. Giger,et al.  Computerized analysis of lesions in US images of the breast. , 1999, Academic radiology.

[19]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[20]  C. Metz,et al.  A New Approach for Testing the Significance of Differences Between ROC Curves Measured from Correlated Data , 1984 .

[21]  M L Giger,et al.  Feature selection with limited datasets. , 1999, Medical physics.

[22]  Hiroyuki Yoshida,et al.  Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps , 2001, IEEE Transactions on Medical Imaging.

[23]  Lorenzo L. Pesce,et al.  Reliable and computationally efficient maximum-likelihood estimation of "proper" binormal ROC curves. , 2007, Academic radiology.

[24]  Harrison H. Barrett,et al.  Foundations of Image Science , 2003, J. Electronic Imaging.

[25]  M. Giger,et al.  Computerized interpretation of breast MRI: investigation of enhancement-variance dynamics. , 2004, Medical physics.

[26]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[27]  R. F. Wagner,et al.  Reader Variability in Mammography and Its Implications for Expected Utility over the Population of Readers and Cases , 2004, Medical decision making : an international journal of the Society for Medical Decision Making.

[28]  R. F. Wagner,et al.  Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers. , 1999, Medical physics.

[29]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[30]  David M. Gay,et al.  Algorithm 611: Subroutines for Unconstrained Minimization Using a Model/Trust-Region Approach , 1983, TOMS.

[31]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[32]  Berkman Sahiner,et al.  Computer-aided detection of breast masses on full field digital mammograms. , 2005, Medical physics.

[33]  C A Roe,et al.  Statistical Comparison of Two ROC-curve Estimates Obtained from Partially-paired Datasets , 1998, Medical decision making : an international journal of the Society for Medical Decision Making.

[34]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[35]  Keinosuke Fukunaga,et al.  Effects of Sample Size in Classifier Design , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  N. Petrick,et al.  Computer-aided classification of mammographic masses and normal tissue: linear discriminant analysis in texture feature space. , 1995, Physics in medicine and biology.