Interpretability in Multidimensional Classification

Generating rule-based models from data is an efficient way of inferring information from large datasets. In high-dimensional spaces, the complexity of the model itself can undermine the interpretability of this information. This chapter introduces metrics quantifying the information flow between inputs, feature dimensions and output classes. These metrics are used to estimate the contribution of individual input features to a fuzzy classification task without making explicit use of the data underlying the model. Application of these techniques to a speech classification problem shows that significant reduction in the model dimensionality can be achieved with minimal accuracy loss.

[1]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[2]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[3]  Larry M. Hyman,et al.  Phonology: Theory and Analysis , 1974 .

[4]  P. Ladefoged A course in phonetics , 1975 .

[5]  Lotfi A. Zadeh,et al.  A fuzzy-algorithmic approach to the definition of complex or imprecise concepts , 1976 .

[6]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7]  E. Ziegel,et al.  Artificial intelligence and statistics , 1986 .

[8]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[9]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[10]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[13]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Alfred Inselberg,et al.  Multidimensional Lines. I: Representation , 1994, SIAM J. Appl. Math..

[16]  Alfred Inselberg,et al.  Multidimensional Lines II: Proximity and Applications , 1994, SIAM J. Appl. Math..

[17]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[18]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[19]  Michael R. Berthold,et al.  Building precise classifiers with automatic rule extraction , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[20]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[21]  Robert M. Gray,et al.  Joint image classification and compression using hierarchical table-lookup vector quantization , 1996, Proceedings of Data Compression Conference - DCC '96.

[22]  P. Ladefoged,et al.  The sounds of the world's languages , 1996 .

[23]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[24]  Edwin P. D. Pednault,et al.  Decomposition of Heterogeneous Classification Problems , 1997, IDA.

[25]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[26]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[27]  Corey Miller,et al.  Pronunciation modeling in speech synthesis , 1998 .

[28]  Simon King,et al.  Speech recognition via phonetically featured syllables , 1998, ICSLP.

[29]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[30]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[31]  Cezary Z. Janikow,et al.  Fuzzy decision trees: issues and methods , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[32]  Michael R. Berthold,et al.  Discriminative Power of Input Features in a Fuzzy Model , 1999, IDA.

[33]  Robert M. Gray,et al.  Joint image compression and classification with vector quantization and a two dimensional hidden Markov model , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[34]  A. Rechtsteiner,et al.  Sleep Apnea Classi£cation Based on Frequency of Heart Rate Variability , 2000 .

[35]  Michael R. Berthold,et al.  Input features' impact on fuzzy decision processes , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[36]  Giuseppe Di Fatta,et al.  Learning to reason about data: The spring school on intelligent data analysis , 2001, Intell. Data Anal..

[37]  R. Wilson,et al.  Complete genome sequence of Salmonella enterica serovar Typhimurium LT2 , 2001, Nature.

[38]  María José del Jesús,et al.  Genetic feature selection in a fuzzy rule-based classification system learning process for high-dimensional problems , 2001, Inf. Sci..

[39]  Mirjam Wester,et al.  An elitist approach to articulatory-acoustic feature classification , 2001, INTERSPEECH.

[40]  Konstantinos Koumpis,et al.  The Role of Prosody in a Voicemail Summarization System , 2001 .

[41]  Jeff A. Bilmes,et al.  Hidden-articulator Markov models for speech recognition , 2003, Speech Commun..

[42]  Lawrence O. Hall,et al.  Visualizing fuzzy points in parallel coordinates , 2003, IEEE Trans. Fuzzy Syst..

[43]  L. A. Zadeh,et al.  Fuzzy logic and approximate reasoning , 1975, Synthese.

[44]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[45]  David J. Hand,et al.  Intelligent Data Analysis: An Introduction , 2005 .