Learning to Segment Actions from Visual and Language Instructions via Differentiable Weak Sequence Alignment
暂无分享,去创建一个
[1] Jonathan Tompson,et al. Temporal Cycle-Consistency Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .
[3] Sanja Fidler,et al. VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Alexander G. Hauptmann,et al. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples , 2014, ACM Multimedia.
[5] Juergen Gall,et al. Action Sets: Weakly Supervised Action Segmentation Without Ordering Constraints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[6] Juan Carlos Niebles,et al. Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[8] Ehsan Elhamifar,et al. Deep Supervised Summarization: Algorithm and Application to Learning Instructions , 2019, NeurIPS.
[9] Sinisa Todorovic,et al. Set-Constrained Viterbi for Set-Supervised Action Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[11] Ehsan Elhamifar,et al. Procedure Completion by Learning from Partial Summaries , 2020, BMVC.
[12] Dat Huynh,et al. Self-supervised Multi-task Procedure Learning from Instructional Videos , 2020, ECCV.
[13] Cordelia Schmid,et al. Weakly Supervised Action Labeling in Videos under Ordering Constraints , 2014, ECCV.
[14] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.
[15] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.
[16] Xinya Du,et al. Be Consistent! Improving Procedural Text Comprehension using Label Consistency , 2019, NAACL.
[17] Xuelong Li,et al. Deep Multimodal Clustering for Unsupervised Audiovisual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Markus H. Gross,et al. A Neural Multi-sequence Alignment TeCHnique (NeuMATCH) , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Ivan Laptev,et al. Unsupervised Learning from Narrated Instruction Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Emma Brunskill,et al. Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure , 2018, ICLR.
[21] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.
[22] Silvio Savarese,et al. Unsupervised Semantic Parsing of Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[23] Ivan Laptev,et al. Learning Actionness via Long-Range Temporal Order Verification , 2020, ECCV.
[24] Rada Mihalcea,et al. Identifying Visible Actions in Lifestyle Vlogs , 2019, ACL.
[25] Amy Beth Warriner,et al. Concreteness ratings for 40 thousand generally known English word lemmas , 2014, Behavior research methods.
[26] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.
[27] Fadime Sener,et al. Zero-Shot Anticipation for Instructional Activities , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Ehsan Elhamifar,et al. Unsupervised Procedure Learning via Joint Dynamic Summarization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Yansong Tang,et al. COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Fadime Sener,et al. Unsupervised Learning of Action Classes With Continuous Temporal Embedding , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Tanel Alumäe,et al. Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration , 2016, INTERSPEECH.
[33] Andrew Zisserman,et al. End-to-End Learning of Visual Representations From Uncurated Instructional Videos , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[35] Juergen Gall,et al. NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[36] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[37] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Juan Carlos Niebles,et al. D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Juan Carlos Niebles,et al. Connectionist Temporal Modeling for Weakly Supervised Action Labeling , 2016, ECCV.
[40] Michael S. Ryoo,et al. Evolving Losses for Unsupervised Video Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Juan Carlos Niebles,et al. Few-Shot Video Classification via Temporal Alignment , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Ivan Laptev,et al. Cross-Task Weakly Supervised Learning From Instructional Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Jun Li,et al. Weakly Supervised Energy-Based Learning for Action Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[44] Marco Cuturi,et al. Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.
[45] Chenliang Xu,et al. Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Ehsan Elhamifar,et al. Subset Selection and Summarization in Sequential Data , 2017, NIPS.
[47] Dima Damen,et al. Action Modifiers: Learning From Adverbs in Instructional Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Ehsan Elhamifar,et al. Sequential Facility Location: Approximate Submodularity and Greedy Algorithm , 2019, ICML.
[49] Fadime Sener,et al. Unsupervised Learning and Segmentation of Complex Activities from Video , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.