Statistical Pattern Recognition: A Review

The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

[1]  Alfred Benjamin Garrett,et al.  The Flash of Genius , 2012 .

[2]  W. Bean The Flash of Genius. , 1964 .

[3]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[4]  Robert O. Winder,et al.  Enumeration of Seven-Argument Threshold Functions , 1965, IEEE Trans. Electron. Comput..

[5]  George Nagy,et al.  State of the art in pattern recognition , 1968 .

[6]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[7]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[8]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[9]  Herman Chernoff,et al.  The Use of Faces to Represent Points in k- Dimensional Space Graphically , 1973 .

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Thomas M. Cover,et al.  The Best Two Independent Measurements Are Not the Two Best , 1974, IEEE Trans. Syst. Man Cybern..

[12]  H. Akaike A new look at the statistical model identification , 1974 .

[13]  Laveen N. Kanal,et al.  Patterns in pattern recognition: 1968-1974 , 1974, IEEE Trans. Inf. Theory.

[14]  Michael Thompson,et al.  Frontiers of Pattern Recognition , 1975 .

[15]  Jan M. Van Campenhout,et al.  On the Possible Orderings in the Measurement Selection Problem , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[18]  Godfried T. Toussaint,et al.  The use of context in pattern recognition , 1978, Pattern Recognit..

[19]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Andrew K. C. Wong,et al.  DECA: A Discrete-Valued Data Clustering Algorithm , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Heinrich Niemann,et al.  Linear and nonlinear mapping of patterns , 1980, Pattern Recognit..

[22]  Sarunas Raudys,et al.  On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Michael L. Baird,et al.  Structural Pattern Recognition , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[25]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[26]  I. K. Sethi,et al.  Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  J. A. Anderson,et al.  7 Logistic discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[28]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[29]  Ryszard S. Michalski,et al.  Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[31]  Takayuki Ito,et al.  Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[32]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[33]  Robert M. Haralick,et al.  Decision Making in Context , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Stanley L. Sclove,et al.  Application of the Conditional Population-Mixture Model to Image Segmentation , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  George Nagy Candide's Practical Principles of Experimental Pattern Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[37]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[38]  D. J. Hand,et al.  Recent advances in error rate estimation , 1986, Pattern Recognit. Lett..

[39]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[40]  King-Sun Fu,et al.  A Step Towards Unification of Syntactic and Statistical Pattern Recognition , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[43]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[44]  J. Friedman Exploratory Projection Pursuit , 1987 .

[45]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[46]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Keinosuke Fukunaga,et al.  Leave-One-Out Procedures for Nonparametric Error Estimates , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[49]  Keinosuke Fukunaga,et al.  Effects of Sample Size in Classifier Design , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Ruzena Bajcsy,et al.  Multiresolution elastic matching , 1989, Comput. Vis. Graph. Image Process..

[51]  J. Friedman Regularized Discriminant Analysis , 1989 .

[52]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[53]  Keinosuke Fukunaga,et al.  The Reduced Parzen Classifier , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[55]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[56]  Edward J. Delp,et al.  An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[57]  James A. Anderson,et al.  Neurocomputing (vol. 2): directions for research , 1990 .

[58]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[59]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[60]  J.A. Anderson,et al.  Directions for research , 1990 .

[61]  Jack Sklansky,et al.  Automated design of linear tree classifiers , 1990, Pattern Recognit..

[62]  Anil K. Jain,et al.  Analysis and Interpretation of Range Images , 1989, Springer Series in Perception Engineering.

[63]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[64]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  David G. Lowe,et al.  Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Edward J. Delp,et al.  An Iterative Growing and Pruning Algorithm for Classification Tree Design , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[69]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[70]  Adele Cutler,et al.  Information Ratios for Validating Mixture Analysis , 1992 .

[71]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[72]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[73]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[74]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[75]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[76]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[77]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[78]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[79]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[80]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[81]  R. Gray,et al.  Using vector quantization for image processing , 1993, Proc. IEEE.

[82]  David A. Landgrebe,et al.  Feature Extraction Based on Decision Boundaries , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[84]  Rabab Kreidieh Ward,et al.  Vector Quantization Technique for Nonparametric Classifier Design , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[86]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[87]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[88]  Vijay V. Raghavan,et al.  An empirical study of the performance of heuristic methods for clustering , 1994 .

[89]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[90]  Francesc J. Ferri,et al.  Comparative study of techniques for large-scale feature selection* *This work was suported by a SERC grant GR/E 97549. The first author was also supported by a FPI grant from the Spanish MEC, PF92 73546684 , 1994 .

[91]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[92]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[93]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[94]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[95]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[96]  Anil K. Jain,et al.  Parsimonious network design and feature selection through node pruning , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[97]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[98]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[99]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[100]  Hazem M. Abbas,et al.  Neural networks for maximum likelihood clustering , 1994, Signal Process..

[101]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[102]  Andrew R. Webb,et al.  Multidimensional scaling by iterative majorization using radial basis functions , 1995, Pattern Recognit..

[103]  Ching Y. Suen,et al.  Optimal combinations of pattern classifiers , 1995, Pattern Recognit. Lett..

[104]  E. Backer,et al.  Computer-assisted reasoning in cluster analysis , 1995 .

[105]  Josef Kittler,et al.  Feature selection based on the approximation of class densities by finite mixtures of special type , 1995, Pattern Recognit..

[106]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[107]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[108]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[109]  S. Klinke,et al.  Exploratory Projection Pursuit , 1995 .

[110]  Howard B. Demuth,et al.  Neutral network toolbox for use with Matlab , 1995 .

[111]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[112]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[113]  R. Gray,et al.  Combining Image Compression and Classification Using Vector Quantization , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[114]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[115]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[116]  Hadar I. Avi-Itzhak,et al.  Arbitrarily Tight Upper and Lower Bounds on the Bayesian Probability of Error , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[117]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[118]  David A. Landgrebe,et al.  Covariance Matrix Estimation and Classification With Limited Training Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[119]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[120]  Robert P. W. Duin,et al.  A note on comparing classifiers , 1996, Pattern Recognit. Lett..

[121]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[122]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[123]  Anil K. Jain,et al.  Large-scale parallel data clustering , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[124]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[125]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[126]  Anil K. Jain,et al.  Object Matching Using Deformable Templates , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[127]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[128]  Jürgen Schürmann,et al.  Pattern classification , 2008 .

[129]  Aapo Hyvärinen,et al.  A Fast Fixed-Point Algorithm for Independent Component Analysis , 1997, Neural Computation.

[130]  Erkki Oja,et al.  The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[131]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[132]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[133]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[134]  Rosalind W. Picard Affective Computing , 1997 .

[135]  Matteo Golfarelli,et al.  On the Error-Reject Trade-Off in Biometric Verification Systems , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[136]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[137]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[138]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[139]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[140]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[141]  Robert M. Gray,et al.  Vector quantization and density estimation , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[142]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[143]  Robert P. W. Duin,et al.  Experiments with a featureless approach to pattern recognition , 1997, Pattern Recognit. Lett..

[144]  Giovanna Castellano,et al.  An iterative pruning algorithm for feedforward neural networks , 1997, IEEE Trans. Neural Networks.

[145]  Essaid Bouktache,et al.  A Fast Algorithm for the Nearest-Neighbor Classifier , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[146]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[147]  Vwani P. Roychowdhury,et al.  On self-organizing algorithms and networks for class-separability features , 1997, IEEE Trans. Neural Networks.

[148]  Jianchang Mao,et al.  Improving OCR performance using character degradation models and boosting algorithm , 1997, Pattern Recognit. Lett..

[149]  Andrzej Cichocki,et al.  Stability Analysis of Learning Algorithms for Blind Source Separation , 1997, Neural Networks.

[150]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[151]  Robert P. W. Duin,et al.  Sammon's mapping using neural networks: A comparison , 1997, Pattern Recognit. Lett..

[152]  Sarunas Raudys,et al.  Evolution and generalization of a single neurone: I. Single-layer perceptron as seven statistical classifiers , 1998, Neural Networks.

[153]  Kevin J. Dalton,et al.  Feature selection using expected attainable discrimination , 1998, Pattern Recognit. Lett..

[154]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[155]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[156]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[157]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[158]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[159]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[160]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[161]  Robert P. W. Duin,et al.  Handwritten digit recognition by combined classifiers , 1998, Kybernetika.

[162]  T. Ens,et al.  Blind signal separation : statistical principles , 1998 .

[163]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[164]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[165]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[166]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[167]  Anil K. Jain,et al.  Large-Scale Parallel Data Clustering , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[168]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[169]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[170]  Leonid I. Perlovsky,et al.  Conundrum of Combinatorial Complexity , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[171]  G. McLachlan,et al.  Pattern Classification: A Unified View of Statistical and Neural Approaches. , 1998 .

[172]  Jitender S. Deogun,et al.  Conceptual clustering in information retrieval , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[173]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[174]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[175]  José M. N. Leitão,et al.  On Fitting Mixture Models , 1999, EMMCVPR.

[176]  László Györfi,et al.  Lower Bounds for Bayes Error Estimation , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[177]  Shivakumar Vaithyanathan,et al.  Model Selection in Unsupervised Learning with Applications To Document Clustering , 1999, International Conference on Machine Learning.

[178]  Its'hak Dinstein,et al.  A comparative study of neural network based feature extraction paradigms , 1999, Pattern Recognit. Lett..

[179]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[180]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[181]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[182]  E. Oja,et al.  Independent Component Analysis , 2013 .

[183]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .