Efficient and Robust Methods for Audio and Video Signal Analysis

This thesis presents my research concerning audio and video signal processing and machine learning. Specifically, the topics of my research include computationally efficient classifier compounds, automatic speech recognition (ASR), music dereverberation, video cut point detection and video classification. Computational efficacy of information retrieval based on multiple measurement modalities has been considered in this thesis. Specifically, a cascade processing framework, including a training algorithm to set its parameters has been developed for combining multiple detectors or binary classifiers in computationally efficient way. The developed cascade processing framework has been applied on video information retrieval tasks of video cut point detection and video classification. The results in video classification, compared to others found in the literature, indicate that the developed framework is capable of both accurate and computationally efficient classification. The idea of cascade processing has been additionally adapted for the ASR task. A procedure for combining multiple speech state likelihood estimation methods within an ASR framework in cascaded manner has been developed. The results obtained clearly show that without impairing the transcription accuracy the computational load of ASR can be reduced using the cascaded speech state likelihood estimation process. Additionally, this thesis presents my work on noise robustness of ASR using a nonnegative matrix factorization (NMF) -based approach. Specifically, methods for transformation of sparse NMF-features into speech state likelihoods has been explored. The results reveal that learned transformations from NMF activations to speech state likelihoods provide better ASR transcription accuracy than dictionary label -based transformations. The results, compared to others in a noisy speech recognition -challenge show that NMF-based processing is an efficient strategy for noise robustness in ASR. The thesis also presents my work on audio signal enhancement, specifically, on removing the detrimental effect of reverberation from music audio. In the work, a linear prediction -based dereverberation algorithm, which has originally been developed for speech signal enhancement, was applied for music. The results obtained show that the algorithm performs well in conjunction with music signals and indicate that dynamic compression of music does not impair the dereverberation performance.

[1]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[2]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[3]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[5]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[6]  Douglas L. Jones,et al.  THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 , 2014 .

[7]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[9]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[11]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Yi-Hsuan Yang,et al.  Multipitch Estimation of Piano Music by Exemplar-Based Sparse Representation , 2012, IEEE Transactions on Multimedia.

[13]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[14]  R. Maas,et al.  A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.

[15]  Raphaël Féraud,et al.  A Fast and Accurate Face Detector Based on Neural Networks , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Hermann Ney,et al.  Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.

[18]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[19]  Yang Song,et al.  Taxonomic classification for web-based videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Ina Kodrasi,et al.  Robust partial multichannel equalization techniques for speech dereverberation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Alexander Kogan,et al.  Logical analysis of data – the vision of Peter L. Hammer , 2007, Annals of Mathematics and Artificial Intelligence.

[22]  Brendan J. Frey,et al.  Winner-Take-All Autoencoders , 2014, NIPS.

[23]  A. Pawley,et al.  The One-clause-at-a-time Hypothesis , 2000 .

[24]  Steven A. Shafer,et al.  Anatomy of a color histogram , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[27]  Peter L. Hammer,et al.  Boolean Functions - Theory, Algorithms, and Applications , 2011, Encyclopedia of mathematics and its applications.

[28]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[29]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[30]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[31]  Ken'ichi Furuya,et al.  Robust Speech Dereverberation Using Multichannel Blind Deconvolution With Spectral Subtraction , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[33]  Steve Rogers,et al.  Adaptive Filter Theory , 1996 .

[34]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[35]  Chengcui Zhang,et al.  Scene change detection by audio and video clues , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[36]  Fabio Roli,et al.  A Theoretical Analysis of Bagging as a Linear Combination of Classifiers , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jianhua Chen,et al.  An incremental learning algorithm for constructing Boolean functions from positive and negative examples , 2002, Comput. Oper. Res..

[39]  Ning Ma,et al.  The CHiME corpus: a resource and a challenge for computational hearing in multisource environments , 2010, INTERSPEECH.

[40]  Ullas Gargi,et al.  Performance characterization of video-shot-change detection methods , 2000, IEEE Trans. Circuits Syst. Video Technol..

[41]  László Böszörményi,et al.  State-of-the-art and future challenges in video scene detection: a survey , 2013, Multimedia Systems.

[42]  Martin Andersson,et al.  A comparison of nine PLS1 algorithms , 2009 .

[43]  Alicja Wieczorkowska,et al.  Music Information Retrieval , 2009, Encyclopedia of Data Warehousing and Mining.

[44]  Maja Pantic,et al.  Bimodal log-linear regression for fusion of audio and visual features , 2013, MM '13.

[45]  H. Nyquist,et al.  Certain Topics in Telegraph Transmission Theory , 1928, Transactions of the American Institute of Electrical Engineers.

[46]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[47]  Ying Zhang,et al.  Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.

[48]  Song-Chun Zhu,et al.  Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Mauricio G. C. Resende,et al.  A continuous approach to inductive inference , 1992, Math. Program..

[50]  Anton van den Hengel,et al.  Training Effective Node Classifiers for Cascade Classification , 2013, International Journal of Computer Vision.

[51]  Geoffrey E. Hinton Where Do Features Come From? , 2014, Cogn. Sci..

[52]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[53]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[54]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[55]  T. Lachmann,et al.  Effects of noise and reverberation on speech perception and listening comprehension of children and adults in a classroom-like setting. , 2010, Noise & health.

[56]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[57]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[58]  Jiri Matas,et al.  WaldBoost - learning for time constrained sequential detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[59]  Marc Moonen,et al.  On the relation between data-dependent beamforming and multichannel linear prediction for dereverberation , 2016 .

[60]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Maja Pantic,et al.  Comparison of single-model and multiple-model prediction-based audiovisual fusion , 2015, AVSP.

[62]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[63]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[64]  Andy Beach Real World Video Compression , 2008 .

[65]  Björn W. Schuller,et al.  Investigating NMF speech enhancement for neural network based acoustic models , 2014, INTERSPEECH.

[66]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[67]  James M. Rehg,et al.  Combining acoustic and visual features to detect laughter in adults' speech , 2015, AVSP.

[68]  Patrick A. Naylor,et al.  EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[69]  Mahdi Triki Blind dereverberation of quasi-periodic sources based on multichannel linear prediction , 2005 .

[70]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[71]  Hao Jiang,et al.  Video segmentation with the assistance of audio content analysis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[72]  A. Tikhonov,et al.  Numerical Methods for the Solution of Ill-Posed Problems , 1995 .

[73]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[74]  Annamaria Mesaros,et al.  Sound Event Detection in Multisource Environments Using Source Separation , 2011 .

[75]  Raymond N. J. Veldhuis,et al.  Threshold-optimized decision-level fusion and its application to biometrics , 2009, Pattern Recognit..

[76]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[77]  Kevin Brownlow,et al.  Silent Films–What Was the Right Speed? , 1990 .

[78]  Eugenia Leu,et al.  The automatic video editor , 2003, MULTIMEDIA '03.

[79]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[80]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[81]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[82]  D. Ward,et al.  ON THE USE OF LINEAR PREDICTION FOR DEREVERBERATION OF SPEECH , 2003 .

[83]  Eli Tzirkel-Hancock,et al.  Method and apparatus for pattern recognition , 1999 .

[84]  Daoqiang Zhang,et al.  Ensemble sparse classification of Alzheimer's disease , 2012, NeuroImage.

[85]  Kilian Q. Weinberger,et al.  Classifier Cascade for Minimizing Feature Evaluation Cost , 2012, AISTATS.

[86]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[87]  Emanuel A. P. Habets,et al.  Signal-Based Performance Evaluation of Dereverberation Algorithms , 2010, J. Electr. Comput. Eng..

[88]  Ioannis Pitas,et al.  Enhanced Eigen-Audioframes for Audiovisual Scene Change Detection , 2007, IEEE Transactions on Multimedia.

[89]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[90]  Jonathan Le Roux,et al.  Deep NMF for speech separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[91]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[92]  Tomohiro Nakatani,et al.  Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[93]  Kalle J. Palomäki,et al.  Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[94]  Yukio Iwaya,et al.  Wide-band dereverberation method based on multichannel linear prediction using prewhitening filter , 2012 .

[95]  R. Kubichek,et al.  Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.

[96]  Jaakko Astola,et al.  Analysis of the meter of acoustic musical signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[97]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[98]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[99]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[100]  E.A.P. Habets,et al.  Single-Channel Speech Dereverberation based on Spectral Subtraction , 2004 .

[101]  Tomohiro Nakatani,et al.  Music dereverberation using harmonic structure source model and Wiener filter , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[102]  Mark Sandler,et al.  Dereverberation of Musical Instrument Recordings for Improved Note Onset Detection and Instrument Recognition , 2011 .

[103]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[104]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[105]  Patrick A. Naylor,et al.  On the Use of Channel Shortening in Multichannel Acoustic System Equalization , 2010 .

[106]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[107]  Murat Dundar,et al.  Joint Optimization of Cascaded Classifiers for Computer Aided Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[108]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[109]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[110]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[111]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[112]  Ke Huang,et al.  Sparse Representation for Signal Classification , 2006, NIPS.

[113]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[114]  E. T RIANTAPHYLLOU,et al.  A Greedy Randomized Adaptive Search Procedure ( GRASP ) for Inferring Logical Clauses from Examples in Polynomial Time and some Extensions , 1998 .

[115]  Takuya Yoshioka,et al.  Dereverberation by Using Time-Variant Nature of Speech Production System , 2007, EURASIP J. Adv. Signal Process..

[116]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[117]  Tuomas Virtanen,et al.  Non-negative matrix deconvolution in noise robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[118]  Daniel P. W. Ellis,et al.  Beat Tracking by Dynamic Programming , 2007 .

[119]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[120]  Marco Treiber Dynamic Programming (DP) , 2013 .

[121]  Björn W. Schuller,et al.  Memory-Enhanced Neural Networks and NMF for Robust ASR , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[122]  Tuomas Virtanen,et al.  Separation of sound sources by convolutive sparse coding , 2004, SAPA@INTERSPEECH.

[123]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[124]  Leo L. Beranek,et al.  Concert halls and opera houses : music, acoustics, and architecture , 2005 .

[125]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[126]  Bhiksha Raj,et al.  Active-set newton algorithm for non-negative sparse coding of audio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[127]  Hermann Ney,et al.  On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[128]  Nicholas Costen,et al.  Sparse models for gender classification , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[129]  Bhiksha Raj,et al.  Non-negative matrix factorization based compensation of music for automatic speech recognition , 2010, INTERSPEECH.

[130]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[131]  Alfred Mertins,et al.  Room Impulse Response Shortening/Reshaping With Infinity- and $p$ -Norm Optimization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[132]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[133]  Anssi Klapuri,et al.  Music Tempo Estimation With $k$-NN Regression , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[134]  Toshihide Ibaraki,et al.  Logical analysis of numerical data , 1997, Math. Program..

[135]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[136]  Hugo Van hamme,et al.  Noise Robust Exemplar Matching Using Sparse Representations of Speech , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[137]  Bhiksha Raj,et al.  The Problem of Robustness in Automatic Speech Recognition , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.

[138]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[139]  François Fleuret,et al.  Joint Cascade Optimization Using A Product Of Boosted Classifiers , 2010, NIPS.

[140]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[141]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[142]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[143]  Nuno Vasconcelos,et al.  Learning Optimal Embedded Cascades , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[144]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[145]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[146]  Mohammed Ghanbari,et al.  Standard Codecs: Image Compression to Advanced Video Coding , 2003 .

[147]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[148]  Björn W. Schuller,et al.  Non-negative matrix factorization as noise-robust feature extractor for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[149]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[150]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[151]  Ulpu Remes,et al.  Techniques for Noise Robustness in Automatic Speech Recognition , 2012 .

[152]  J. Friedman Fast sparse regression and classification , 2012 .

[153]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.

[154]  Li Deng,et al.  Structure-based and template-based automatic speech recognition - comparing parametric and non-parametric approaches , 2007, INTERSPEECH.

[155]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[156]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[157]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[158]  Jon Barker,et al.  The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[159]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[160]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[161]  Steven N. Thorsen,et al.  A Boolean Algebra of receiver operating characteristic curves , 2007, 2007 10th International Conference on Information Fusion.

[162]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[163]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[164]  Tomohiro Nakatani,et al.  Harmonicity-Based Blind Dereverberation for Single-Channel Speech Signals , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[165]  Peter L. Hammer,et al.  Accelerated algorithm for pattern detection in logical analysis of data , 2006, Discret. Appl. Math..

[166]  Paul Hess Dedekind's problem: monotone Boolean functions on the lattice of divisors of an integer. , 1979 .

[167]  Mandy Eberhart,et al.  Speech Communications Human And Machine , 2016 .

[168]  Peter L. Hammer,et al.  Maximum patterns in datasets , 2008, Discret. Appl. Math..

[169]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[170]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[171]  Meng Wang,et al.  Movie2Comics: Towards a Lively Video Content Presentation , 2012, IEEE Transactions on Multimedia.

[172]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[173]  Björn W. Schuller,et al.  Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments , 2014, Comput. Speech Lang..

[174]  Tuomas Virtanen,et al.  Modelling non-stationary noise with spectral factorisation in automatic speech recognition , 2013, Comput. Speech Lang..

[175]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[176]  Paul A. Viola,et al.  Multiple-Instance Pruning For Learning Efficient Cascade Detectors , 2007, NIPS.

[178]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[179]  John Mourjopoulos,et al.  Blind single-channel suppression of late reverberation based on perceptual reverberation modeling. , 2011, The Journal of the Acoustical Society of America.