MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
暂无分享,去创建一个
[1] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[2] Constantin Papaodysseus,et al. A New Approach to the Automatic Recognition of Musical Recordings , 2001 .
[3] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[4] John C. Platt,et al. Extracting noise-robust features from audio data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5] Ray Meddis,et al. Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .
[6] P. Smaragdis,et al. Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).
[7] David Anthony James,et al. The Application of Classical Informa - tion Retrieval Techniques to Spoken Documents , 1995 .
[8] Herbert Gish,et al. Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[9] Peng Yu,et al. An improved model-based speaker segmentation system , 2003, INTERSPEECH.
[10] E. Batlle,et al. Automatic Song Identification in Noisy Broadcast Audio , 2002 .
[11] David Malah,et al. Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[12] Anssi Klapuri,et al. Signal Processing Methods for the Automatic Transcription of Music , 2004 .
[13] Larry P. Heck,et al. Speaker tracking and detection with multiple speakers , 1999, EUROSPEECH.
[14] Stanley Boykin,et al. Audio Hot Spotting and Retrieval using Multiple Features , 2004, HLT-NAACL 2004.
[15] Martha Larson,et al. Using syllable-based indexing features and language models to improve German spoken document retrieval , 2003, INTERSPEECH.
[16] Ian H. Witten,et al. Signal processing for melody transcription , 1995 .
[17] Thomas Sikora,et al. Speech enhancement based on smoothing of spectral noise floor , 2004, INTERSPEECH.
[18] Thomas Sikora,et al. Evaluation of distance measures for MPEG-7 melody contours , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..
[19] Thomas Sikora,et al. Automatic segmentation of speakers in broadcast audio material , 2003, IS&T/SPIE Electronic Imaging.
[20] Lie Lu,et al. A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.
[21] Kunio Kashino,et al. Very quick audio searching: introducing global pruning to the Time-Series Active Search , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[22] Philip N. Garner,et al. Representation and linking mechanisms for audio in MPEG-7 , 2000, Signal Process. Image Commun..
[23] Peter Schäuble,et al. A system for retrieving speech documents , 1992, SIGIR '92.
[24] Thomas Sikora,et al. Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[25] Pedro Cano,et al. Audio Watermarking and Fingerprinting: For Which Applications? , 2003 .
[26] Chin-Hui Lee,et al. Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..
[27] Karen Sparck Jones,et al. Spoken Document Retrieval for TREC-8 at Cambridge University , 1998, TREC.
[28] Frank Kurth,et al. Identification of Highly Distorted Audio Material for Querying Large Scale Data Bases , 2002 .
[29] Seungjin Choi,et al. Non-negative component parts of sound for classification , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).
[30] L. Varga,et al. Short-term sound stream characterization for reliable, real-time occurrence monitoring of given sound-prints , 2000, 2000 10th Mediterranean Electrotechnical Conference. Information Technology and Electrotechnology for the Mediterranean Countries. Proceedings. MeleCon 2000 (Cat. No.00CH37099).
[31] Alexander G. Hauptmann,et al. SPEECH RECOGNITION AND INFORMATION RETRIEVAL: EXPERIMENTS IN RETRIEVING SPOKEN DOCUMENTS , 1997 .
[32] Pedro Cano,et al. A review of algorithms for audio fingerprinting , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..
[33] Tsuhan Chen,et al. Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..
[34] Ian H. Witten,et al. Towards the digital music library: tune retrieval from acoustic input , 1996, DL '96.
[35] Marc Leman,et al. An Auditory Model Based Transcriber of Singing Sequences , 2002, ISMIR.
[36] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .
[37] Martin Kaltenbrunner,et al. Statistical Significance in Song-Spotting in Audio , 2001 .
[38] Stephen W. Hainsworth,et al. Techniques for the Automated Analysis of Musical Audio , 2004 .
[39] Vijay Balasubramanian,et al. Speech-Based Retrieval Using Semantic Co-Occurrence Filtering , 1994, HLT.
[40] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.
[41] Thomas Niesler,et al. Experiments in broadcast news transcription , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[42] Les E. Atlas,et al. Modulation frequency features for audio fingerprinting , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[43] Alexander H. Waibel,et al. Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[44] Masataka Goto. A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[45] Frédéric Bimbot,et al. Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[46] Andrew K. Halberstadt. Heterogeneous acoustic measurements and multiple classifiers for speech recognition , 1999 .
[47] Francine Chen,et al. Segmentation of speech using speaker identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[48] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.
[49] J. G. Lourens. Detection and Logging Advertisements using its Sound , 1990, IEEE South African Symposium on Communications and Signal Processing.
[50] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[51] Min Chen,et al. DETECTION OF SOCCER GOAL SHOTS USING JOINT MULTIMEDIA FEATURES AND CLASSIFICATION RULES , 2003 .
[52] Thomas Sikora,et al. Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.
[53] Anssi Klapuri,et al. Means of Integrating Audio Content Analysis Algorithms , 2001 .
[54] Ton Kalker,et al. A Highly Robust Audio Fingerprinting System , 2002, ISMIR.
[55] Thomas Sikora,et al. BeatBank ? An MPEG-7 Compliant Query by Tapping System , 2004 .
[56] Ricardo A. Baeza-Yates,et al. Fast and Practical Approximate String Matching , 1992, Inf. Process. Lett..
[57] Helmut Neuschmied,et al. Robust Sound Modeling for Song Detection in Broadcast Audio , 2002 .
[58] Thomas Sikora,et al. A Query by Humming System using MPEG-7 Descriptors , 2004 .
[59] Dauid F. Percy. Cluster Analysis (3rd Edition) , 1994 .
[60] Thomas Sikora,et al. Phonetic confusion based document expansion for spoken document retrieval , 2004, INTERSPEECH.
[61] Erwin M. Bakker,et al. Semantic Video Retrieval Using Audio Analysis , 2002, CIVR.
[62] Peter Kabal,et al. The computation of line spectral frequencies using Chebyshev polynomials , 1986, IEEE Trans. Acoust. Speech Signal Process..
[63] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..
[64] Lutz Prechelt,et al. An interface for melody input , 2001, TCHI.
[65] Mark A. Clements,et al. Scoring Algorithms for Wordspotting Systems , 2004, HLT-NAACL 2004.
[66] B. S. Manjunath,et al. Introduction to mpeg-7 , 2002 .
[67] Ross Wilkinson,et al. Experiments in spoken document retrieval using phoneme n-grams , 2000, Speech Commun..
[68] Masataka Goto,et al. A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[69] Peng Yu,et al. A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech , 2004, INTERSPEECH.
[70] Pedro Cano,et al. Mixed Watermarking-Fingerprinting Approach for Integrity Verification of Audio Recordings , 2002 .
[71] Jean-Luc Gauvain,et al. The LIMSI SDR System for TREC-8 , 1999, TREC.
[72] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.
[73] S. Robertson. The probability ranking principle in IR , 1997 .
[74] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[75] Dragutin Petkovic,et al. Phonetic confusion matrix based spoken document retrieval , 2000, SIGIR '00.
[76] R. C. Rose,et al. Keyword detection in conversational speech utterances using hidden Markov model based continuous speech recognition , 1995, Comput. Speech Lang..
[77] Douglas A. Reynolds,et al. Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..
[78] Takehito Utsuro,et al. Keyword recognition and extraction by multiple-LVCSRs with 60, 000 words in speech-driven WEB retrieval task , 2004, INTERSPEECH.
[79] Justin Zobel,et al. Music Ranking Techniques Evaluated , 2000, ISMIR.
[80] H. Gish,et al. Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.
[81] Jean-Luc Gauvain,et al. Partitioning and transcription of broadcast news data , 1998, ICSLP.
[82] Jr. J.P. Campbell,et al. Speaker recognition: a tutorial , 1997, Proc. IEEE.
[83] S. R. Subramanya,et al. Transform-based indexing of audio data for multimedia databases , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.
[84] Kenney Ng,et al. Subword-based approaches for spoken document retrieval , 2000, Speech Commun..
[85] Eric D. Scheirer,et al. Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.
[86] Chng Eng Siong,et al. Sports highlight detection from keyword sequences using HMM , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).
[87] Ramarathnam Venkatesan,et al. A Perceptual Audio Hashing Algorithm: A Tool for Robust Audio Identification and Information Hiding , 2001, Information Hiding.
[88] Fabio Crestani,et al. Using semantic and phonetic term similarity for spoken document retrieval and spoken query processing , 2002 .
[89] Thomas Sikora,et al. Combination of phone N-grams for a MPEG-7-based spoken document retrieval system , 2004, 2004 12th European Signal Processing Conference.
[90] Dragutin Petkovic,et al. Towards robust features for classifying audio in the CueVideo system , 1999, MULTIMEDIA '99.
[91] Martin Wechsler,et al. Spoken document retrieval based on phoneme recognition , 1998 .
[92] Karen Spärck Jones,et al. Retrieving spoken documents by combining multiple index sources , 1996, SIGIR '96.
[93] M. Sugiyama,et al. Speech segmentation and clustering based on speaker features , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[94] Eric Allamanche,et al. Content-based Identification of Audio Material Using MPEG-7 Low Level Description , 2001, ISMIR.
[95] Douglas B. Paul. An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model , 1992, HLT.
[96] Holger H. Hoos,et al. GUIDO/MIR - an Experimental Musical Information Retrieval System based on GUIDO Music Notation , 2001, ISMIR.
[97] Stephen E. Robertson,et al. Okapi at TREC-6 Automatic ad hoc, VLC, routing, filtering and QSDR , 1997, TREC.
[98] John H. L. Hansen,et al. Unsupervised audio stream segmentation and clustering via the Bayesian information criterion , 2000, INTERSPEECH.
[99] Kenney Ng. Towards robust methods for spoken document retrieval , 1998, ICSLP.
[100] Aaron E. Rosenberg,et al. Unsupervised speaker segmentation of telephone conversations , 2002, INTERSPEECH.
[101] S. Furui,et al. Cepstral analysis technique for automatic speaker verification , 1981 .
[102] Douglas A. Reynolds,et al. Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.
[103] Barry Vercoe,et al. Melody retrieval on the web , 2001, IS&T/SPIE Electronic Imaging.
[104] Emanuele Pollastri. An Audio Front End for Query-by-Humming Systems , 2001, ISMIR.
[105] Markus Cremer,et al. Scalable robust audio fingerprinting using MPEG-7 content description , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..
[106] M. A. Siegler,et al. Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .
[107] H. Gish,et al. An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[108] Regunathan Radhakrishnan,et al. Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[109] Preeti Rao,et al. Pitch Detection of the Singing Voice in Muscial Audio , 2003 .
[110] Lie Lu,et al. Speaker change detection and tracking in real-time news broadcasting analysis , 2002, MULTIMEDIA '02.
[111] Thomas Sikora,et al. How Efficient is MPEG-7 for General Sound Recognition? , 2004 .
[112] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[113] Beth Logan,et al. Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio , 2002 .
[114] Youngmoo E. Kim,et al. Analysis of a Contour-based Representation for Melody , 2000, ISMIR.
[115] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..
[116] P. Boersma. ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .
[117] J. Zobel,et al. Matching Techniques for Large Music Databases , 1999 .
[118] Vincent Kanade,et al. Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.
[119] Ellen M. Voorhees,et al. Overview of the Seventh Text REtrieval Conference , 1998 .
[120] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[121] Nicolas Moreau,et al. Phone-based Spoken Document Retrieval in Conformance with the MPEG-7 Standard , 2004 .
[122] Hsin-Min Wang,et al. A sequential metric-based audio segmentation method via the Bayesian information criterion , 2003, INTERSPEECH.
[123] Ricardo A. Baeza-Yates,et al. Searching in metric spaces , 2001, CSUR.
[124] Ramesh A. Gopinath,et al. Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.
[125] Lie Lu,et al. UBM-based real-time speaker segmentation for broadcasting news , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[126] Anders Arpteg. Information Retrieval Techniques , 1999 .
[127] Christian Wellekens,et al. DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..
[128] Timo Viitaniemi,et al. Probabilistic models for the transcription of single-voice melodies , 2003 .