Nagrani, A.[Arsha] Co Author Listing * Composable Augmentation Encoding for Video Representation Learning
* Condensed Movies: Story Based Retrieval with Contextual Embeddings
* Count, Crop and Recognise: Fine-Grained Recognition in the Wild
* End-to-end Generative Pretraining for Multimodal Video Captioning
* EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
* Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
* Learnable PINs: Cross-modal Embeddings for Person Identity
* Learning Audio-Video Modalities from Image Captions
* Localizing Visual Sounds the Hard Way
* Look Before you Speak: Visually Contextualized Utterances
* Masking Modalities for Cross-modal Video Retrieval
* Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching
* Speech2Action: Cross-Modal Supervision for Action Recognition
* TL;DW? Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency
* Uncertainty-aware Weakly Supervised Action Detection from Untrimmed Videos
Nagrecha, K.[Kabir] Co Author Listing * Gradient-based Algorithms for Machine Teaching

