Autonomous Learning of Representations

Besides the core learning algorithm itself, one major question in machine learning is how to best encode given training data such that the learning technology can efficiently learn based thereon and generalize to novel data. While classical approaches often rely on a hand coded data representation, the topic of autonomous representation or feature learning plays a major role in modern learning architectures. The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively.

[1]  Thomas Villmann,et al.  Generalized relevance learning vector quantization , 2002, Neural Networks.

[2]  Barbara Hammer,et al.  Parametric nonlinear dimensionality reduction using kernel t-SNE , 2015, Neurocomputing.

[3]  Herbert Gish,et al.  Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery , 2014, Comput. Speech Lang..

[4]  Tatsuya Kawahara,et al.  Bayesian Learning of a Language Model from Continuous Speech , 2012, IEICE Trans. Inf. Syst..

[5]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[6]  Bhiksha Raj,et al.  Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling , 2013, ICRA 2013.

[7]  Yuan Shi,et al.  Sparse Compositional Metric Learning , 2014, AAAI.

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  P. Woodland,et al.  WSJCAM 0 Corpus and Recording , 2007 .

[10]  Peter Tiño,et al.  Architectural Bias in Recurrent Neural Networks: Fractal Analysis , 2002, Neural Computation.

[11]  Amaury Habrard,et al.  Robustness and generalization for metric learning , 2012, Neurocomputing.

[12]  Herbert Jaeger,et al.  Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[13]  Barbara Hammer,et al.  Data visualization by nonlinear dimensionality reduction , 2015, WIREs Data Mining Knowl. Discov..

[14]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[15]  Bhiksha Raj,et al.  Unsupervised word segmentation from noisy input , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[16]  Robert C. Wolpert,et al.  A Review of the , 1985 .

[17]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[18]  Klaus Obermayer,et al.  Soft Learning Vector Quantization , 2003, Neural Computation.

[19]  Franco Scarselli,et al.  On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Bhiksha Raj,et al.  Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification , 2011, INTERSPEECH.

[21]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[22]  Marc Sebban,et al.  Good edit similarity learning by loss minimization , 2012, Machine Learning.

[23]  Yee Whye Teh,et al.  A Bayesian Interpretation of Interpolated Kneser-Ney , 2006 .

[24]  Barbara Hammer,et al.  A Note on the Universal Approximation Capability of Support Vector Machines , 2003, Neural Processing Letters.

[25]  Naonori Ueda,et al.  Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[26]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[27]  Frank-Michael Schleif,et al.  Adaptive conformal semi-supervised vector quantization for dissimilarity data , 2014, Pattern Recognit. Lett..

[28]  Horst-Michael Groß,et al.  A life-long learning vector quantization approach for interactive learning of multiple categories , 2012, Neural Networks.

[29]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[30]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[31]  Sinan Kalkan,et al.  Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision? , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[33]  Michael Biehl,et al.  Analysis of Flow Cytometry Data by Matrix Relevance Learning Vector Quantization , 2013, PloS one.

[34]  Niko Wilbert,et al.  Slow feature analysis , 2011, Scholarpedia.

[35]  Kenneth Ward Church,et al.  A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[37]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[38]  Rishabh Mehrotra Sparse Coding , 2011 .

[39]  Benoît Frénay,et al.  Parameter-insensitive kernel in extreme learning for non-linear support vector regression , 2011, Neurocomputing.

[40]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[41]  Marc Sebban,et al.  Learning probabilistic models of tree edit distance , 2008, Pattern Recognit..

[42]  Alessandro Sperduti,et al.  Mining Structured Data , 2010, IEEE Computational Intelligence Magazine.

[43]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[44]  Guang-Bin Huang,et al.  Trends in extreme learning machines: A review , 2015, Neural Networks.

[45]  Pablo A. Estévez,et al.  A review of learning vector quantization classifiers , 2013, Neural Computing and Applications.

[46]  Samuel Kaski,et al.  Bankruptcy analysis with self-organizing maps in learning metrics , 2001, IEEE Trans. Neural Networks.

[47]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[48]  Bhiksha Raj,et al.  A hierarchical system for word discovery exploiting DTW-based initialization , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[49]  Frank-Michael Schleif,et al.  Learning vector quantization for (dis-)similarities , 2014, Neurocomputing.

[50]  Wei Yang,et al.  Fast neighborhood component analysis , 2012, Neurocomputing.

[51]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[52]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[53]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Nicolai Petkov,et al.  Adaptive Matrices and Filters for Color Texture Classification , 2012, Journal of Mathematical Imaging and Vision.

[55]  Thomas Martinetz,et al.  Sparse Coding and Selected Applications , 2012, KI - Künstliche Intelligenz.

[56]  Michael Biehl,et al.  Insightful stress detection from physiology modalities using Learning Vector Quantization , 2015, Neurocomputing.

[57]  Bhiksha Raj,et al.  Iterative Bayesian word segmentation for unsupervised vocabulary discovery from phoneme lattices , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[58]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[59]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[60]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[61]  Frank-Michael Schleif,et al.  Metric learning for sequences in relational LVQ , 2015, Neurocomputing.

[62]  Hugo Van hamme,et al.  An evaluation of unsupervised acoustic model training for a dysarthric speech interface , 2014, INTERSPEECH.