Audiovisual speaker indexing for Web-TV automations
暂无分享,去创建一个
Charalampos Dimoulas | Charalampos A. Dimoulas | Lazaros Vrysis | Nikolaos Vryzas | Lazaros Vrysis | N. Vryzas
[1] Sangkyun Lee,et al. Feature Selection for High-Dimensional Data with RapidMiner , 2012 .
[2] Jianwu Dang,et al. Voice Activity Detection Based on an Unsupervised Learning Framework , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[3] Israel Cohen,et al. Audio-Visual Voice Activity Detection Using Diffusion Maps , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[4] Charalampos Dimoulas,et al. Embedding sound localization and spatial audio interaction through coincident microphones arrays , 2015, AM '15.
[5] George Kalliris,et al. Improved Localization of Sound Sources Using Multi-Band Processing of Ambisonic Components , 2009 .
[6] Eun-Kyoung Kim,et al. Enhanced voice activity detection using acoustic event detection and classification , 2011, IEEE Transactions on Consumer Electronics.
[7] Mohan S. Kankanhalli,et al. Multi-camera Skype: Enhancing the Quality of Experience of Video Conferencing , 2013 .
[8] Charalampos Dimoulas,et al. Enhanced Temporal Feature Integration in Audio Semantics via Alpha-Stable Modeling , 2021 .
[9] Tomi Kinnunen,et al. Semi-supervised speech activity detection with an application to automatic speaker verification , 2018, Comput. Speech Lang..
[10] Craig Hight,et al. Automation within digital videography: from the Ken Burns Effect to ‘meaning-making’ engines , 2014 .
[11] Charalampos Dimoulas,et al. Semi-supervised audio-driven TV-news speaker diarization using deep neural embeddings. , 2020, The Journal of the Acoustical Society of America.
[12] Ali Dehghan Firoozabadi,et al. Incorporating GammaTone filterbank and Welch spectral estimation in subband processing-based localization of multiple simultaneous speakers , 2017 .
[13] Charalampos Dimoulas,et al. 1D/2D Deep CNNs vs. Temporal Feature Integration for General Audio Classification , 2020 .
[14] José Escolano,et al. Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments , 2012 .
[15] Charalampos Dimoulas,et al. Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification , 2016 .
[16] George Kalliris,et al. Sound Source Localization and B-Format Enhancement Using Soundfield Microphone Sets , 2007 .
[17] Alan McCree,et al. State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations , 2020, Comput. Speech Lang..
[18] Shi-Wen Deng,et al. Statistical voice activity detection based on sparse representation over learned dictionary , 2013, Digit. Signal Process..
[19] Rubén San-Segundo-Hernández,et al. Combining pulse-based features for rejecting far-field speech in a HMM-based Voice Activity Detector , 2011, Comput. Electr. Eng..
[20] Ming Li,et al. LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization , 2019, INTERSPEECH.
[21] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[22] Masakiyo Fujimoto,et al. Noise robust voice activity detection based on periodic to aperiodic component ratio , 2010, Speech Commun..
[23] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.
[24] Nicholas W. D. Evans,et al. Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[25] Xiao-Lei Zhang,et al. Deep Belief Networks Based Voice Activity Detection , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[26] Man-Wai Mak,et al. A study of voice activity detection techniques for NIST speaker recognition evaluations , 2014, Comput. Speech Lang..
[27] George Kalliris,et al. Automated audio detection, segmentation and indexing, with application to post-production editing , 2007 .
[28] Vasileios Bountourakis,et al. An Enhanced Temporal Feature Integration Method for Environmental Sound Recognition , 2019, Acoustics.
[29] Bipin Indurkhya,et al. Learning Photography Aesthetics with Deep CNNs , 2017, MAICS.
[30] George Kalliris,et al. Collaborative Annotation Platform for Audio Semantics , 2013 .
[31] Nikhil Ketkar,et al. Deep Learning with Python , 2017 .
[32] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[33] Maja Pantic,et al. End-to-end visual speech recognition with LSTMS , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Akinori Nishihara,et al. Efficient voice activity detection algorithm using long-term spectral flatness measure , 2013, EURASIP J. Audio Speech Music. Process..
[35] Charalampos A. Dimoulas,et al. Growing Media Skills and Know-How in Situ: Technology-Enhanced Practices and Collaborative Support in Mobile News-Reporting , 2019, Education Sciences.
[36] I. Cohen,et al. AR-GARCH in Presence of Noise: Parameter Estimation and Its Application to Voice Activity Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[37] Charalampos Dimoulas,et al. Experimenting with 1D CNN Architectures for Generic Audio Classification , 2020 .
[38] Charalampos Dimoulas,et al. jReporter: A Smart Voice-Recording Mobile Application , 2019 .
[39] J. V. van Dijck,et al. Making Public Television Social? Public Service Broadcasting and the Challenges of Social Media , 2015 .
[40] Stephen J. Cox,et al. Improved speaker independent lip reading using speaker adaptive training and deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Philip M. Napoli. On Automation in Media Industries: Integrating Algorithmic Media Production into Media Industries Scholarship , 2014 .
[42] Joon-Hyuk Chang,et al. Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection , 2016, Comput. Speech Lang..
[43] Philippe Souères,et al. A survey on sound source localization in robotics: From binaural to array processing methods , 2015, Comput. Speech Lang..
[44] DeLiang Wang,et al. Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[45] Thomas Padois,et al. Enhancement of time-domain acoustic imaging based on generalized cross-correlation and spatial weighting , 2016 .
[46] Hyeontaek Lim,et al. Formant-Based Robust Voice Activity Detection , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[47] Mark B. Sandler,et al. The Sonic Visualiser: A Visualisation Platform for Semantic Descriptors from Musical Signals , 2006, ISMIR.
[48] Trieu-Kien Truong,et al. Improved voice activity detection algorithm using wavelet and support vector machine , 2010, Comput. Speech Lang..
[49] Sharath Pankanti,et al. Video surveillance: past, present, and now the future [DSP Forum] , 2013, IEEE Signal Processing Magazine.
[50] Charalampos Dimoulas,et al. Extending Temporal Feature Integration for Semantic Audio Analysis , 2017 .
[51] Francesco Piazza,et al. Localizing speakers in multiple rooms by using Deep Neural Networks , 2018, Comput. Speech Lang..
[52] Iván V. Meza,et al. Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..
[53] Rigas Kotsakis,et al. Continuous Speech Emotion Recognition with Convolutional Neural Networks , 2020 .
[54] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.
[55] Mohammad Hossein Moattar,et al. A review on speaker diarization systems and approaches , 2012, Speech Commun..
[56] Ji Wu,et al. An efficient voice activity detection algorithm by combining statistical model and energy detection , 2011, EURASIP J. Adv. Signal Process..
[57] Tetsuya Ogata,et al. Sound Source Localization Using Deep Learning Models , 2017, J. Robotics Mechatronics.
[58] Gautham J. Mysore,et al. Speaker and noise independent voice activity detection , 2013, INTERSPEECH.
[59] Joon-Hyuk Chang,et al. Voice activity detection based on statistical models and machine learning approaches , 2010, Comput. Speech Lang..
[60] Justin Salamon,et al. MIR.EDU: AN OPEN-SOURCE LIBRARY FOR TEACHING SOUND AND MUSIC DESCRIPTION , 2014 .
[61] Mark Liberman,et al. Speech activity detection on youtube using deep neural networks , 2013, INTERSPEECH.
[62] Carman Neustaedter,et al. Automated videography for residential communications , 2010, Electronic Imaging.
[63] Hugo Van hamme,et al. Who's Speaking?: Audio-Supervised Classification of Active Speakers in Video , 2015, ICMI.
[64] Alexandros Iosifidis,et al. Visual Voice Activity Detection in the Wild , 2016, IEEE Transactions on Multimedia.
[65] Jean-Pierre Martens,et al. Adaptive speaker diarization of broadcast news based on factor analysis , 2017, Comput. Speech Lang..
[66] Israel Cohen,et al. A deep architecture for audio-visual voice activity detection in the presence of transients , 2018, Signal Process..
[67] Shrikanth S. Narayanan,et al. Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[68] Maximo Cobos,et al. Two-microphone multi-speaker localization based on a Laplacian Mixture Model , 2011, Digit. Signal Process..
[69] Carlos Busso,et al. Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection , 2017, INTERSPEECH.
[70] Hichem Sahli,et al. Robust speaker localization for real-world robots , 2015, Comput. Speech Lang..
[71] Jyh-Shing Roger Jang,et al. ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..