Training Deep Nets with Imbalanced and Unlabeled Data

Training deep belief networks (DBNs) is normally done with large data sets. Our goal is to predict traces of the surface of the tongue in ultrasound images of human speech. Hand-tracing is labor-intensive; the dataset is highly imbalanced since many images are extremely similar. We propose a bootstrapping method which handles this imbalance by iteratively selecting a small subset of images to be hand-traced (thereby reducing human labor time), then (re)training the DBN, making use of an entropy-based diversity measure for the initial selection, thereby achieving over a two-fold reduction in human time required for tracing with human-level accuracy.

[1]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[2]  C. Kambhamettu,et al.  Automatic contour tracking in ultrasound images , 2005, Clinical linguistics & phonetics.

[3]  Sun-Young Oh,et al.  Toward universals in the gestural organization of syllables: A cross-linguistic study of liquids , 2006, J. Phonetics.

[4]  Ian R. Fasel,et al.  Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech , 2010, 2010 20th International Conference on Pattern Recognition.

[5]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Stone A guide to analysing tongue motion from ultrasound images , 2005, Clinical linguistics & phonetics.

[7]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[8]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[9]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[12]  Mariapaola D'Imperio,et al.  Variability and homogeneity in American English /ɹ/ allophony and /s/ retraction , 2010 .

[13]  Khalil Iskarous Detecting the edge of the tongue: A tutorial , 2005, Clinical linguistics & phonetics.