Journals starting with mula

MULA21 * *Multimodal Learning and Applications Workshop
* 3D Hand Pose Estimation via aligned latent space injection and kinematic losses
* Adaptive Intermediate Representations for Video Understanding
* APES: Audiovisual Person Search in Untrimmed Video
* Beyond VQA: Generating Multi-word Answers and Rationales to Visual Questions
* Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
* Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
* Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing
* Exploring the Limits of Zero-Shot Learning: How Low Can You Go?
* Improved Attention for Visual Question Answering, An
* Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning
* Private-Shared Disentangled Multimodal VAE for Learning of Latent Representations
* Progressive Knowledge-Embedded Unified Perceptual Parsing for Scene Understanding
* Radar Camera Fusion via Representation Learning in Autonomous Driving
* Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences
* Target-Tailored Source-Transformation for Scene Graph Generation
* Using Text to Teach Image Retrieval
17 for MULA21

MULA22 * *Multimodal Learning and Applications
* Cascaded Siamese Self-supervised Audio to Video GAN
* Coarse-to-Fine Reasoning for Visual Question Answering
* Coupling Vision and Proprioception for Navigation of Legged Robots
* Doubling down: sparse grounding with an additional, almost-matching caption for detection-oriented multimodal pretraining
* Emphasizing Complementary Samples for Non-literal Cross-modal Retrieval
* Guiding Attention using Partial-Order Relationships for Image Captioning
* Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
* Learning to Ask Informative Sub-Questions for Visual Question Answering
* M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation
* Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters
* Probabilistic Compositional Embeddings for Multimodal Image Retrieval
* Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
* Semantically Grounded Visual Embeddings for Zero-Shot Learning
* Transformer Decoders with Multi-Modal Regularization for Cross-Modal Food Retrieval
* Unreasonable Effectiveness of CLIP Features for Image Captioning: An Experimental Analysis, The
16 for MULA22

Index for "m"

Last update: 1-Jun-23 11:30:56
Use for comments.