Voice modeling methods for automatic speaker recognition
暂无分享,去创建一个
[1] HongJiang Zhang. Multimedia content analysis and search: new perspectives and approaches , 2009, ACM Multimedia.
[2] T. Bayes. An essay towards solving a problem in the doctrine of chances , 2003 .
[3] D.P. Skinner,et al. The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.
[4] Fernando Pereira,et al. MPEG-7 the generic multimedia content description standard, part 1 - Multimedia, IEEE , 2001 .
[5] Bernd Freisleben,et al. The Web Service Browser: Automatic Client Generation and Efficient Data Transfer for Web Services , 2009, 2009 IEEE International Conference on Web Services.
[6] C A Pickover,et al. Examining Usability, Acceptability, and Adoption of a Self-Directed, Technology-Based Intervention for Upper Limb Rehabilitation After Stroke: Cohort Study , 1986, The Journal of the Acoustical Society of America.
[7] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .
[8] Andreas Stolcke,et al. THE SRI NIST 2008 speaker recognition evaluation system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[9] Roger K. Moore,et al. Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[10] David A. van Leeuwen,et al. NIST and NFI-TNO evaluations of automatic speaker recognition , 2006, Comput. Speech Lang..
[11] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.
[12] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[13] Aladdin M. Ariyaeeinia,et al. Discrimination Effectiveness of Speech Cepstral Features , 2008, BIOID.
[14] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[15] Hsin-Min Wang,et al. On the extraction of vocal-related information to facilitate the management of popular music collections , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[16] M. Köppen,et al. The Curse of Dimensionality , 2010 .
[17] Chin-Hui Lee,et al. Minimax classification with parametric neighborhoods for noisy speech recognition , 2001, INTERSPEECH.
[18] Bernd Freisleben,et al. Semantic video analysis for psychological research on violence in computer games , 2007, CIVR '07.
[19] Sacha Krstulovic,et al. Mptk: Matching Pursuit Made Tractable , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[20] D. Goldstein. Second Edition, Revised and Expanded , 2003 .
[21] Jr. J.P. Campbell,et al. Speaker recognition: a tutorial , 1997, Proc. IEEE.
[22] Marijn Huijbregts,et al. Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .
[23] Charu C. Aggarwal. A framework for classification and segmentation of massive audio data streams , 2007, KDD '07.
[24] S. R. Mahadeva Prasanna,et al. Multiple frame size and rate analysis for speaker recognition under limited data condition , 2009 .
[25] Daben Liu,et al. Online speaker clustering , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[26] M. Al-Akaidi. Fractal Speech Processing , 2004 .
[27] Clifford A. Pickover. Computers, Pattern, Chaos, and Beauty: Graphics from an Unseen World , 2001 .
[28] Hagai Aronowitz,et al. A distance measure between GMMs based on the unscented transform and its application to speaker recognition , 2005, INTERSPEECH.
[29] R. J. Niederjohn. Understanding speech corrupted by noise , 1996, Proceedings of the IEEE International Conference on Industrial Technology (ICIT'96).
[30] Younghun Kwon,et al. Similar speaker recognition using nonlinear analysis , 2004 .
[31] Manuel Duarte Ortigueira,et al. On the HHT, its problems, and some solutions , 2008 .
[32] Jean-François Bonastre,et al. NON DIRECTLY ACOUSTIC PROCESS FOR COSTLESS SPEAKER RECOGNITION AND INDEXATION , 1999 .
[33] D. O'Shaughnessy,et al. Pre-emphasis and speech recognition , 1995, Proceedings 1995 Canadian Conference on Electrical and Computer Engineering.
[34] Bernd Freisleben,et al. University of Marburg at TRECVID 2006: Shot Boundary Detection and Rushes Task Results , 2006, TRECVID.
[35] Yoseph Bar-Cohen,et al. Biomimetics : Biologically Inspired Technologies , 2011 .
[36] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..
[37] E. Ambikairajah,et al. Group delay features for speaker recognition , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.
[38] Andreas Stolcke,et al. Modeling duration patterns for speaker recognition , 2003, INTERSPEECH.
[39] Constantine Kotropoulos,et al. Speaker segmentation and clustering , 2008, Signal Process..
[40] Mitch Weintraub,et al. Filterbank-energy estimation using mixture and Markov models for recognition of noisy speech , 1993, IEEE Trans. Speech Audio Process..
[41] John H. L. Hansen,et al. A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..
[42] N. Huang,et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.
[43] Peter Ladefoged,et al. Vowels and Consonants , 2000, Manchu Grammar.
[44] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[45] Hsin-Min Wang,et al. Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[46] Phil Rose,et al. Technical forensic speaker recognition: Evaluation, types and testing of evidence , 2006, Comput. Speech Lang..
[47] Sridha Sridharan,et al. Making Confident Speaker Verification Decisions With Minimal Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[48] Constantine Kotropoulos,et al. Systematic comparison of BIC-based speaker segmentation systems , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.
[49] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[50] Huan Liu,et al. Searching for Interacting Features , 2007, IJCAI.
[51] Ananth N. Iyer,et al. Speaker distinguishing distances: a comparative study , 2007, Int. J. Speech Technol..
[52] Bernd Freisleben,et al. Videana: A Software Toolkit for Scientific Film Studies , 2009, Digital Tools in Media Studies.
[53] Scott L. Bain. Emergent Design: The Evolutionary Nature of Professional Software Development (paperback) , 2008 .
[54] Xiaodong Wang,et al. Monte Carlo methods for signal processing: a review in the statistical signal processing context , 2005, IEEE Signal Processing Magazine.
[55] Shrikanth S. Narayanan,et al. Strategies to Improve the Robustness of Agglomerative Hierarchical Clustering Under Data Source Variation for Speaker Diarization , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[56] Toshinori Munakata,et al. Fundamentals of the New Artificial Intelligence - Neural, Evolutionary, Fuzzy and More, Second Edition , 2007, Texts in Computer Science.
[57] John R. Kender,et al. Accommodating sample size effect on similarity measures in speaker clustering , 2008, 2008 IEEE International Conference on Multimedia and Expo.
[58] Jacob Benesty,et al. Springer handbook of speech processing , 2007, Springer Handbooks.
[59] Clifford A. Pickover,et al. Fractal characterization of speech waveform graphs , 1986, Comput. Graph..
[60] F. Yates. Contingency Tables Involving Small Numbers and the χ2 Test , 1934 .
[61] Yi Hu,et al. Subjective Comparison of Speech Enhancement Algorithms , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[62] Sridha Sridharan,et al. Factor analysis modelling for speaker verification with short utterances , 2008, Odyssey.
[63] Doh-Suk Kim. On the perceptually irrelevant phase information in sinusoidal representation of speech , 2001, IEEE Trans. Speech Audio Process..
[64] Mark J. F. Gales,et al. An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[65] Douglas A. Reynolds,et al. An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[66] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[67] Peter J. Bickel,et al. The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[68] Gabriel Rilling,et al. On empirical mode decomposition and its algorithms , 2003 .
[69] Bernd Freisleben,et al. WebVoice: A Toolkit for Perceptual Insights into Speech Processing , 2009, 2009 2nd International Congress on Image and Signal Processing.
[70] John W. Sammon,et al. A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.
[71] Jialong He,et al. On the use of orthogonal GMM in speaker recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[72] A. Hussain,et al. Nonlinear speech processing: Overview and applications , 2002 .
[73] T. Subba Rao,et al. Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB , 2004 .
[74] E. Candes,et al. 11-magic : Recovery of sparse signals via convex programming , 2005 .
[75] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .
[76] Stéphane H. Maes,et al. A distance measure between collections of distributions and its application to speaker recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[77] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[78] Daben Liu,et al. Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.
[79] John Tooby,et al. Are humans good intuitive statisticians after all , 1996 .
[80] Robert C. Holte,et al. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.
[81] Thomas Friese,et al. Grid Workflow Modelling Using Grid-Specific BPEL Extensions , 2007 .
[82] Douglas A. Reynolds,et al. The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[83] Sadaoki Furui,et al. 40 Years of Progress in Automatic Speaker Recognition , 2009, ICB.
[84] François Pachet,et al. The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.
[85] Paul Over,et al. High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .
[86] Satoshi Nakamura,et al. Efficient representation of short-time phase based on group delay , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[87] Christian A. Müller,et al. Prosodic and other Long-Term Features for Speaker Diarization , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[88] François Pachet,et al. Improving Timbre Similarity : How high’s the sky ? , 2004 .
[89] Stephan Baumann. Artificial Listening Systems - Modellierung und approximation der individuellen Perzeption von Musikähnlichkeit , 2005 .
[90] Bernd Freisleben,et al. Self-Supervised Learning of Face Appearances in TV Casts and Movies , 2007, Int. J. Semantic Comput..
[91] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..
[92] Leonidas J. Guibas,et al. A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).
[93] Haizhou Li,et al. An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..
[94] M. Faundez-Zanuy,et al. State-of-the-art in speaker recognition , 2005, IEEE Aerospace and Electronic Systems Magazine.
[95] S. Guruprasad,et al. AANN models for speaker recognition based on difference cepstrals , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..
[96] Eamonn J. Keogh,et al. Towards parameter-free data mining , 2004, KDD.
[97] C. Sekhar,et al. Speaker Change Detection using Support Vector Machines , 2005 .
[98] Sharon Gannot,et al. Speech enhancement using a mixture-maximum model , 1999, IEEE Trans. Speech Audio Process..
[99] Douglas E. Sturim,et al. SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[100] N. Otsu. A threshold selection method from gray level histograms , 1979 .
[101] Mark J. F. Gales,et al. Progress in the CU-HTK broadcast news transcription system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[102] Sadaoki Furui,et al. Fifty years of progress in speech and speaker recognition , 2004 .
[103] Alexander J. Smola,et al. Learning with kernels , 1998 .
[104] Thomas Hofmann,et al. Probabilistic Latent Semantic Analysis , 1999, UAI.
[105] Patrick Kenny,et al. Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations , 2007, IEEE Signal Processing Letters.
[106] Thomas Friese,et al. Flex-SwA: Flexible Exchange of Binary Data Based on SOAP Messages with Attachments , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).
[107] Christian Wellekens,et al. DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..
[108] Michael Fink,et al. Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification , 2006 .
[109] Douglas E. Sturim,et al. The MIT lincoln laboratory 2008 speaker recognition system , 2009, INTERSPEECH.
[110] Francesco Camastra,et al. Machine Learning for Audio, Image and Video Analysis - Theory and Applications , 2007, Advanced Information and Knowledge Processing.
[111] Ying Li,et al. Content-based movie analysis and indexing based on audiovisual cues , 2004, IEEE Transactions on Circuits and Systems for Video Technology.
[112] Tony Andrews. Business Process Execution Language for Web Services Version 1.1 , 2003 .
[113] Rubo Zhang,et al. Speech Enhancement Based on Hilbert-Huang Transform Theory , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).
[114] Alvin F. Martin,et al. NIST speaker recognition evaluation chronicles , 2004, Odyssey.
[115] S. R. Mahadeva Prasanna,et al. Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..
[116] Bernd Freisleben,et al. University of Marburg at TRECVID 2008: High-Level Feature Extraction , 2008, TRECVID.
[117] Douglas E. Sturim,et al. Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[118] Kah-Chye Tan,et al. Postprocessing method for suppressing musical noise generated by spectral subtraction , 1998, IEEE Trans. Speech Audio Process..
[119] Douglas A. Reynolds,et al. Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..
[120] Xu Shao,et al. Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end , 2006, Speech Commun..
[121] Hsin-Min Wang,et al. Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[122] Shrikanth S. Narayanan,et al. Signature cluster model selection for incremental Gaussian mixture cluster modeling in agglomerative hierarchical speaker clustering , 2009, INTERSPEECH.
[123] Zhiwu Lu,et al. Semantic concept annotation based on audio PLSA model , 2009, MM '09.
[124] Hsin-Min Wang,et al. Improving GMM-UBM speaker verification using discriminative feedback adaptation , 2009, Comput. Speech Lang..
[125] Ponani S. Gopalakrishnan,et al. Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[126] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[127] 张国亮,et al. Comparison of Different Implementations of MFCC , 2001 .
[128] Gunnar Fant,et al. Acoustic Theory Of Speech Production , 1960 .
[129] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .
[130] Douglas A. Reynolds,et al. Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.
[131] Mark J. F. Gales,et al. Model-based techniques for noise robust speech recognition , 1995 .
[132] Sadaoki Furui,et al. Digital Speech Processing, Synthesis, and Recognition , 1989 .
[133] Iasonas Kokkinos,et al. Nonlinear speech analysis using models for chaotic systems , 2005, IEEE Transactions on Speech and Audio Processing.
[134] Jonathan Foote,et al. Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[135] Kishore Prahallad,et al. AANN: an alternative to GMM for pattern recognition , 2002, Neural Networks.
[136] Michael Picheny,et al. Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..
[137] Shrikanth S. Narayanan,et al. Language-adaptive persian speech recognition , 2003, INTERSPEECH.
[138] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[139] Dalei Wu,et al. Discriminative preprocessing of speech: towards improving biometric authentication , 2006 .
[140] André Adami,et al. Modeling prosodic differences for speaker recognition , 2007, Speech Commun..
[141] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[142] Abeer Alwan,et al. Speech Coding: Fundamentals and Applications , 2003 .
[143] S. Kizhner,et al. On the Hilbert-Huang transform data processing system development , 2004, 2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No.04TH8720).
[144] John S. D. Mason,et al. Short utterance-based video aided speaker recognition , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.
[145] Andreas Spanias,et al. Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..
[146] Powen Ru,et al. Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.
[147] Massimo Tistarelli,et al. Nineteen Urgent Research Topics in Biometrics and Identity Management , 2008, BIOID.
[148] Hema A. Murthy,et al. The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[149] J. A. Stewart,et al. Nonlinear Time Series Analysis , 2015 .
[150] S. Mallat. A wavelet tour of signal processing , 1998 .
[151] Gaël Richard,et al. Temporal Integration for Audio Classification With Application to Musical Instrument Classification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[152] Sridha Sridharan,et al. Minimising Speaker Verification Utterance Length through Confidence Based Early Verification Decisions , 2009, ICB.
[153] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[154] Bernd Freisleben,et al. DAVO: A Domain-Adaptable, Visual BPEL4WS Orchestrator , 2009, 2009 International Conference on Advanced Information Networking and Applications.
[155] Alfred Ultsch,et al. U *-Matrix : a Tool to visualize Clusters in high dimensional Data , 2004 .
[156] William M. Campbell,et al. Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.
[157] Thorsten Joachims,et al. Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.
[158] Emine Yilmaz,et al. Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.
[159] Hsin-Min Wang,et al. Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics , 2004, Computer Music Journal.
[160] Ian Vince McLoughlin,et al. Line spectral pairs , 2008, Signal Process..
[161] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..
[162] Bernd Freisleben,et al. Video Cut Detection without Thresholds , 2004 .
[163] Roland Auckenthaler,et al. Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..
[164] Dirk Van Compernolle,et al. Synthesizing speech from speech recognition parameters , 2004, INTERSPEECH.
[165] M. Pardo,et al. Learning from data: a tutorial with emphasis on modern pattern recognition methods , 2002 .
[166] Bo Zhang,et al. A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.
[167] Arne Ramsperger. Strukturanalyse der Riboflavin Synthase aus Methanococcus jannaschii , 2005 .
[168] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[169] Dimitrios Gunopulos,et al. Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.
[170] Nikos Fakotakis,et al. Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .
[171] Douglas A. Reynolds,et al. Person authentication by voice: a need for caution , 2003, INTERSPEECH.
[172] Data Mining Methoden : Einordnung und Überblick , 2001 .
[173] Anthony J. Robinson,et al. Enhancement and recognition of noisy speech within an autoregressive hidden Markov model framework using noise estimates from the noisy signal , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[174] Li Deng,et al. Speech trajectory discrimination using the minimum classification error learning , 1998, IEEE Trans. Speech Audio Process..
[175] Hsin-Min Wang,et al. A query-by-example framework to retrieve music documents by singer , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).
[176] Werner Verhelst,et al. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[177] E.J. Candes. Compressive Sampling , 2022 .
[178] Yuan-Fu Liao,et al. Prosody modeling and eigen-prosody analysis for robust speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[179] Lie Lu,et al. Unsupervised speaker segmentation and tracking in real-time audio content analysis , 2005, Multimedia Systems.
[180] Ji Li,et al. alpha-Gaussian mixture modelling for speaker recognition , 2009, Pattern Recognit. Lett..
[181] Douglas A. Reynolds,et al. Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[182] Bernd Freisleben,et al. A Web Service Communication Policy for Describing Non-standard Application Requirements , 2008, 2008 International Symposium on Applications and the Internet.
[183] Matjaz B. Juric,et al. Business process execution language for web services , 2004 .
[184] F. Kubala,et al. Automatic Speaker Clustering , 1997 .
[185] Paul Deléglise,et al. The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news , 2005, INTERSPEECH.
[186] Bernd Freisleben,et al. LCDL: an extensible framework for wrapping legacy code , 2009, iiWAS.
[187] D A Reynolds,et al. The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations , 2004 .
[188] Mark Hasegawa-Johnson,et al. A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[189] Ting Su,et al. In search of deterministic methods for initializing K-means and Gaussian mixture clustering , 2007, Intell. Data Anal..
[190] Ananth N. Iyer,et al. ROBUST VOICED / UNVOICED CLASSIFICATION USING NOVEL FEATURES AND GAUSSIAN MIXTURE MODEL , 2003 .
[191] Ralph Ewerth,et al. Robust video content analysis via transductive learning methods , 2009 .
[192] Belkacem Fergani,et al. Unsupervised speaker indexing using one-class Support Vector Machines , 2006, 2006 14th European Signal Processing Conference.
[193] L. Cosmides,et al. Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty , 1996, Cognition.
[194] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[195] Björn Lindblom,et al. Do 'Dominant Frequencies' explain the listener's response to formant and spectrum shape variations? , 2009, Speech Commun..
[196] Robert Sedgewick,et al. Algorithms in C , 1990 .
[197] Kuldip K. Paliwal,et al. Speech Coding and Synthesis , 1995 .
[198] Lie Lu,et al. Real-time unsupervised speaker change detection , 2002, Object recognition supported by user interaction for service robots.
[199] Kuldip K. Paliwal,et al. Short-time phase spectrum in speech processing: A review and some experimental results , 2007, Digit. Signal Process..
[200] C. Tomasi. The Earth Mover's Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval , 1997 .
[201] Kang Jingqiu,et al. Improved Algorithm of Correlation Dimension Estimation and its Application in Fault Diagnosis for Industrial Fan , 2006, 2006 Chinese Control Conference.
[202] Bernd Freisleben,et al. University of Marburg at TRECVID 2005: Shot Boundary Detection and Camera Motion Estimation Results , 2005, TRECVID.
[203] Lawrence K. Saul,et al. Markov Processes on Curves for Automatic Speech Recognition , 1998, NIPS.
[204] Lie Lu,et al. Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .
[205] Bernd Freisleben,et al. MIRO: a mashup editor leveraging web, Grid and Cloud services , 2009, iiWAS.
[206] Bernd Freisleben,et al. A scalable service-oriented architecture for multimedia analysis, synthesis and consumption , 2009, Int. J. Web Grid Serv..
[207] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[208] Shih-Fu Chang,et al. Short-term audio-visual atoms for generic video concept classification , 2009, ACM Multimedia.
[209] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.
[210] William H. Press,et al. Numerical recipes in C , 2002 .
[211] Sadaoki Furui,et al. 50 Years of Progress in Speech and Speaker Recognition Research , 1970 .
[212] E. Jafer,et al. Wavelet-based voiced/unvoiced classification algorithm , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).
[213] M. Vetterli,et al. From Lagrange to Shannon... and back: another look at sampling [DSP Education] , 2009, IEEE Signal Processing Magazine.
[214] Shuang Zhang,et al. Speaker Clustering Aided by Visual Dialogue Analysis , 2008, PCM.
[215] Bernd Freisleben,et al. Omnivore: Integration of Grid Meta-Scheduling and Peer-to-Peer Technologies , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[216] Jean-François Bonastre,et al. Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..
[217] Roy D. Patterson,et al. Auditory images:How complex sounds are represented in the auditory system , 2000 .
[218] Daben Liu,et al. Online speaker clustering , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[219] Bayya Yegnanarayana,et al. Extraction and representation of prosodic features for language and speaker recognition , 2008, Speech Commun..
[220] Patricia A. Keating,et al. Linguistic Voice Quality , 2006 .
[221] Dennis DeCoste,et al. Visualizing data mining models , 2001 .
[222] Shrikanth S. Narayanan,et al. A robust stopping criterion for agglomerative hierarchical clustering in a speaker diarization system , 2007, INTERSPEECH.
[223] Joseph Picone,et al. Signal modeling techniques in speech recognition , 1993, Proc. IEEE.
[224] José Manuel Pardo,et al. Robust Speaker Diarization for meetings , 2006 .
[225] Sotiris B. Kotsiantis,et al. Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.
[226] Man-Wai Mak,et al. Speaker Verification via High-Level Feature Based Phonetic-Class Pronunciation Modeling , 2007, IEEE Transactions on Computers.
[227] Herbert Gish,et al. Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[228] Jitendra Ajmera,et al. A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[229] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .
[230] Larry P. Heck,et al. A lognormal tied mixture model of pitch for prosody based speaker recognition , 1997, EUROSPEECH.
[231] Manuel Davy,et al. An online kernel change detection algorithm , 2005, IEEE Transactions on Signal Processing.
[232] Andrew C. Morris,et al. PAPER Special Section/Issue on Corpus-Based Speech Technologies GMM based clustering and speaker separability in the Timit speech database , 2005 .
[233] M. Palaniswami,et al. Classification of multidimensional trajectories for acoustic modeling using support vector machines , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.
[234] Biing-Hwang Juang,et al. Auditory perception and cognition , 2008, IEEE Signal Processing Magazine.
[235] Guoli Wang,et al. LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates , 2006, BMC Bioinformatics.
[236] Rajesh M. Hegde,et al. Application of the modified group delay function to speaker identification and discrimination , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[237] M. Demirekler,et al. Comparison of parametric and non-parametric representations of speech for recognition , 1994, Proceedings of MELECON '94. Mediterranean Electrotechnical Conference.
[238] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.
[239] Constantine Kotropoulos,et al. Computationally Efficient and Robust BIC-Based Speaker Segmentation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[240] François Pachet,et al. Exploring Billions of Audio Features , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.
[241] Luc Van Gool,et al. SURF: Speeded Up Robust Features , 2006, ECCV.
[242] Jürgen Schmidhuber,et al. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.
[243] Remco C. Veltkamp,et al. Using transportation distances for measuring melodic similarity , 2003, ISMIR.
[244] David R. Hill,et al. Speaker Classification Concepts: Past, Present and Future , 2007, Speaker Classification.
[245] Kishore Prahallad,et al. Source and system features for speaker recognition using AANN models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[246] Steven B. Smith,et al. Digital Signal Processing: A Practical Guide for Engineers and Scientists , 2002 .
[247] Hirotaka Nakasone,et al. Forensic automatic speaker recognition , 2001, Odyssey.
[248] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.
[249] Xavier Anguera Miró,et al. Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System , 2005, MLMI.
[250] Herbert Gish,et al. Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[251] H. Nyquist,et al. Certain Topics in Telegraph Transmission Theory , 1928, Transactions of the American Institute of Electrical Engineers.
[252] Mark Huckvale,et al. How Is Individuality Expressed in Voice? An Introduction to Speech Production and Description for Speaker Classification , 2007, Speaker Classification.
[253] Xu Shao,et al. Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model , 2002, INTERSPEECH.
[254] Bernd Freisleben,et al. Unfolding speaker clustering potential: a biomimetic approach , 2009, ACM Multimedia.
[255] Beth Logan,et al. A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..
[256] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.
[257] Martin Ester,et al. Knowledge Discovery in Databases - Techniken und Anwendungen , 2000 .
[258] Holger Kantz,et al. Practical implementation of nonlinear time series methods: The TISEAN package. , 1998, Chaos.
[259] K. Mathiak,et al. Does Playing Violent Video Games Induce Aggression? Empirical Evidence of a Functional Magnetic Resonance Imaging Study , 2006 .
[260] M.G. Bellanger,et al. Digital processing of speech signals , 1980, Proceedings of the IEEE.
[261] Bernd Freisleben,et al. Eine service-orientierte Grid-Infrastruktur zur Unterstützung medienwissenschaftlicher Filmanalyse , 2009, GeNeMe.
[262] Hai Huang,et al. Speech pitch determination based on Hilbert-Huang transform , 2006, Signal Process..
[263] Alfred Ultsch,et al. Pareto Density Estimation: A Density Estimation for Knowledge Discovery , 2005 .
[264] Allen Y. Yang,et al. Feature Selection in Face Recognition: A Sparse Representation Perspective , 2007 .
[265] Douglas E. Sturim,et al. The 2004 MIT Lincoln Laboratory speaker recognition system , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[266] Bernd Freisleben,et al. University of Marburg at TRECVID 2007: Shot Boundary Detection and High Level Feature Extraction , 2007, TRECVID.
[267] Daniel A. Keim,et al. Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..
[268] Rubo Zhang,et al. Speech Detection Based on Hilbert-Huang Transform , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).
[269] S. W. Beet,et al. Visual representations of speech signals , 1993 .
[270] Jakub Dabkowski,et al. On Some Method of Analysing Time Series , 1998 .
[271] Benoit B. Mandelbrot,et al. Fractal Geometry of Nature , 1984 .
[272] Ian Foster,et al. The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.
[273] Douglas A. Reynolds,et al. Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..
[274] Sancho Salcedo-Sanz,et al. Offline speaker segmentation using genetic algorithms and mutual information , 2006, IEEE Transactions on Evolutionary Computation.
[275] Shingo Kuroiwa,et al. Nonparametric Speaker Recognition Method Using Earth Mover's Distance , 2006, IEICE Trans. Inf. Syst..
[276] Belkacem Fergani,et al. Speaker diarization using one-class support vector machines , 2008, Speech Commun..
[277] R.W. Schafer,et al. From frequency to quefrency: a history of the cepstrum , 2004, IEEE Signal Processing Magazine.
[278] Werner Verhelst. Overlap-add methods for time-scaling of speech , 2000, Speech Commun..
[279] Bayya Yegnanarayana,et al. Speaker change detection in casual conversations using excitation source features , 2008, Speech Commun..
[280] Mauro Cettolo,et al. Evaluation of BIC-based algorithms for audio segmentation , 2005, Comput. Speech Lang..
[281] Treebank Penn,et al. Linguistic Data Consortium , 1999 .
[282] Horst Stöcker,et al. Taschenbuch mathematischer Formeln und moderner Verfahren (3. Aufl.) , 1995 .
[283] Jonathan Foote,et al. Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.
[284] N. L. Johnson,et al. Multivariate Analysis , 1958, Nature.
[285] William J. Fitzgerald,et al. A Class of Kernels For Sets of Vectors , 2005, ESANN.
[286] Nuria Oliver,et al. Understanding near-duplicate videos: a user-centric approach , 2009, ACM Multimedia.
[287] Bernd Freisleben,et al. Fast and Robust Speaker Clustering Using the Earth Mover'S Distance and Mixmax Models , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[288] David Burshtein,et al. Noise adaptation of HMM speech recognition systems using tied-mixtures in the spectral domain , 1997, IEEE Trans. Speech Audio Process..
[289] Yang Wang,et al. Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..
[290] Y. Ephraim,et al. A Brief Survey of Speech Enhancement , 2003 .
[291] Alfred Ultsch. Proof of Pareto’s 80/20 Law and Precise Limits for ABC-Analysis , 2002 .
[292] Kirk L. Kroeker,et al. Face recognition breakthrough , 2009, Commun. ACM.
[293] Steven Skiena,et al. The Algorithm Design Manual , 2020, Texts in Computer Science.
[294] Trevor Darrell,et al. Fast contour matching using approximate earth mover's distance , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..
[295] Bernd Freisleben,et al. Dimension-Decoupled Gaussian Mixture Model for Short Utterance Speaker Recognition , 2010, 2010 20th International Conference on Pattern Recognition.
[296] José Manuel Benítez,et al. Consistency measures for feature selection , 2008, Journal of Intelligent Information Systems.
[297] Donald E. Knuth,et al. The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .
[298] Masafumi Nishida,et al. Speaker indexing for news articles, debates and drama in broadcasted TV programs , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.