Suggesting Sounds for Images from Video Collections
暂无分享,去创建一个
Jean Charles Bazin | Alexander Sorkine-Hornung | Andreas Krause | Oliver Wang | Matthias Solèr | A. Krause | J. Bazin | Oliver Wang | A. Sorkine-Hornung | Matthias Solèr
[1] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[2] Richard Szeliski,et al. Building Rome in a day , 2009, ICCV.
[3] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.
[4] Diane J. Cook,et al. Automatic Video Classification: A Survey of the Literature , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[5] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[6] Xiaoying Wu,et al. A study of image-based music composition , 2008, 2008 IEEE International Conference on Multimedia and Expo.
[7] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.
[8] Min-Chun Hu,et al. Semantic Based Background Music Recommendation for Home Videos , 2014, MMM.
[9] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[10] M. A. Lea,et al. Who do you look like? Evidence of facial stereotypes for male names , 2007, Psychonomic bulletin & review.
[11] Roger B. Dannenberg,et al. Sound Synthesis from Real-Time Video Images , 2003, ICMC.
[12] Frédo Durand,et al. The visual microphone , 2014, ACM Trans. Graph..
[13] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.
[14] Gert R. G. Lanckriet,et al. Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[15] Cheng-Te Li,et al. Emotion-based impressionism slideshow with automatic music accompaniment , 2007, ACM Multimedia.
[16] Alexei A. Efros,et al. What makes Paris look like Paris? , 2015, Commun. ACM.
[17] Doug L. James,et al. Harmonic fluids , 2009, SIGGRAPH 2009.
[18] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[19] Peter B. L. Meijer,et al. An experimental system for auditory image representations , 1992, IEEE Transactions on Biomedical Engineering.
[20] M. R. Brito,et al. Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection , 1997 .
[21] Daniel Cohen-Or,et al. Distilled Collections from Textual Image Queries , 2015, Comput. Graph. Forum.
[22] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[23] Jon M. Kleinberg,et al. Mapping the world's photos , 2009, WWW '09.
[24] Xinghuo Yu,et al. An approach for image sonification , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..
[25] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[26] Ming C. Lin,et al. Example-guided physically based modal sound synthesis , 2013, ACM Trans. Graph..
[27] Yoav Y. Schechner,et al. Harmony in Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Yael Pritch,et al. Saliency filters: Contrast based filtering for salient region detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[29] Wilmot Li,et al. Tools for placing cuts and transitions in interview video , 2012, ACM Trans. Graph..
[30] Sebastian Michel,et al. Picasso - to sing, you must close your eyes and draw , 2011, SIGIR '11.
[31] Benjamin Schrauwen,et al. Multiscale Approaches To Music Audio Feature Learning , 2013, ISMIR.
[32] Wilmot Li,et al. UnderScore: musical underlays for audio stories , 2012, UIST '12.
[33] Steven M. Seitz,et al. Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[34] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.
[35] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[36] Jun-Cheng Chen,et al. Tiling Slideshow: An Audiovisual Presentation Method for Consumer Photos , 2007, IEEE MultiMedia.
[37] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[38] Peter Dunker,et al. Content-aware auto-soundtracks for personal photo music slideshows , 2011, 2011 IEEE International Conference on Multimedia and Expo.
[39] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.
[40] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[41] Mubarak Shah,et al. Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects , 2013, IEEE Transactions on Multimedia.
[42] Thabo Beeler,et al. Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..
[43] Yizhou Yu,et al. Audeosynth: Music-driven Video Montage , 2015, ACM Trans. Graph..
[44] Brian Wyvill,et al. Robust iso-surface tracking for interactive character skinning , 2014, ACM Trans. Graph..
[45] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[46] Ziv Bar-Joseph,et al. Sound-by-numbers: motion-driven sound synthesis , 2003, SCA '03.
[47] Dinesh K. Pai,et al. FoleyAutomatic: physically-based sound effects for interactive simulation and animation , 2001, SIGGRAPH.
[48] Mohan S. Kankanhalli,et al. Music synthesis for home videos: an analogy based approach , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.
[49] Rebecca Fiebrink,et al. Cross-modal Sound Mapping Using Deep Learning , 2013, NIME.
[50] Meinard Müller,et al. Audio-based Music Structure Analysis , 2010 .
[51] Simon J. Godsill,et al. Digital audio restoration , 1998 .
[52] Markus H. Gross,et al. Scalable Music: Automatic Music Retargeting and Synthesis , 2013, Comput. Graph. Forum.
[53] Ulrike von Luxburg,et al. Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters , 2009, Theor. Comput. Sci..
[54] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.
[55] Doug L. James,et al. Animating fire with sound , 2011, SIGGRAPH 2011.
[56] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.
[57] Ulrike von Luxburg,et al. Cluster Identification in Nearest-Neighbor Graphs , 2007, ALT.
[58] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Huizhong Chen,et al. What's in a Name? First Names as Facial Attributes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[60] Thomas Brox,et al. A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.
[61] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[62] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[63] Riccardo Miotto,et al. A Generative Context Model for Semantic Music Annotation and Retrieval , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[64] Thabo Beeler,et al. FaceDirector: Continuous Control of Facial Performance in Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[65] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[66] D.R. Reddy,et al. Speech recognition by machine: A review , 1976, Proceedings of the IEEE.
[67] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[68] P. Mermelstein,et al. Distance measures for speech recognition, psychological and instrumental , 1976 .
[69] Derek Nowrouzezahrai,et al. Learning hatching for pen-and-ink illustration of surfaces , 2012, TOGS.
[70] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Doug L. James,et al. Rigid-body fracture sound with precomputed soundbanks , 2010, ACM Trans. Graph..