_ | vqa | _ |
Adversarial | vqa | : A New Benchmark for Evaluating the Robustness of VQA Models |
Adversarial | vqa | : A New Benchmark for Evaluating the Robustness of VQA Models |
Beyond OCR + | vqa | : Towards end-to-end reading and reasoning for robust and accurate textvqa |
Beyond | vqa | : Generating Multi-word Answers and Rationales to Visual Questions |
Blind | vqa | on 360° Video via Progressively Learning From Pixels, Frames, and Video |
Can you even tell left from right? Presenting a new challenge for | vqa | |
cascaded long short-term memory (LSTM) driven generic visual question answering ( | vqa | ), A |
Context- | vqa | : Towards Context-Aware and Purposeful Visual Question Answering |
Contrast and Classify: Training Robust | vqa | Models |
Coordinating explicit and implicit knowledge for knowledge-based | vqa | |
Counterfactual | vqa | : A Cause-Effect Look at Language Bias |
CRIC: A | vqa | Dataset for Compositional Reasoning on Vision and Commonsense |
CS- | vqa | : Visual Question Answering with Compressively Sensed Images |
Doc | vqa | : A Dataset for VQA on Document Images |
Domain-robust | vqa | with diverse datasets and methods but no target labels |
Encyclopedic | vqa | : Visual questions about detailed properties of fine-grained categories |
Explaining | vqa | predictions using visual grounding and a knowledge base |
Explanation vs. attention: A two-player game to obtain attention for | vqa | and visual dialog |
Exploring Sparse Spatial Relation in Graph Inference for Text-Based | vqa | |
FAST- | vqa | : Efficient End-to-End Video Quality Assessment with Fragment Sampling |
FGC | vqa | : Fine-Grained Cross-Attention for Medical VQA |
From Strings to Things: Knowledge-Enabled | vqa | Model That Can Read and Reason |
HIDRO- | vqa | : High Dynamic Range Oracle for Video Quality Assessment |
How to Make a BLT Sandwich? Learning | vqa | towards Understanding Web Instructional Videos |
How to Practice | vqa | on a Resource-limited Target Domain |
How Transferable are Reasoning Patterns in | vqa | ? |
Improving relevant subjective testing for validation: Comparing machine learning algorithms for finding similarities in | vqa | datasets using objective measures |
Inductive Biases for Low Data | vqa | : A Data Augmentation Approach |
Inverse Visual Question Answering: A New Benchmark and | vqa | Diagnosis Tool |
IQ- | vqa | : Intelligent Visual Question Answering |
Knowledge Informed Sequential Scene Graph Verification Using | vqa | |
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based | vqa | |
LaTr: Layout-Aware Transformer for Scene-Text | vqa | |
Localize, Group, and Select: Boosting Text- | vqa | by Scene Text Modeling |
Loss Re-Scaling | vqa | : Revisiting the Language Prior Problem From a Class-Imbalance View |
Making the V in Text- | vqa | Matter |
Making the V in | vqa | Matter: Elevating the Role of Image Understanding in Visual Question Answering |
MD- | vqa | : Multi-Dimensional Quality Assessment for UGC Live Videos |
MRA-Net: Improving | vqa | Via Multi-Modal Relation Attention Network |
MUST- | vqa | : Multilingual Scene-Text VQA |
MUST- | vqa | : Multilingual Scene-Text VQA |
NExT-OOD: Overcoming Dual Multiple-Choice | vqa | Biases |
Object-Based Reasoning in | vqa | |
OK- | vqa | : A Visual Question Answering Benchmark Requiring External Knowledge |
Optimizing feature pooling and prediction models of | vqa | algorithms |
Perception Matters: Detecting Perception Failures of | vqa | Models Using Metamorphic Testing |
POP- | vqa | : Privacy preserving, On-device, Personalized Visual Question Answering |
PromptCap: Prompt-Guided Image Captioning for | vqa | with GPT-3 |
Prompting large language model with context and pre-answer for knowledge-based | vqa | |
Q: How to Specialize Large Vision-Language Models to Data-Scarce | vqa | Tasks? A: Self-Train on Unlabeled Images! |
RankD | vqa | : Deep VQA based on Ranking-inspired Hybrid Training |
Reducing Vision-Answer Biases for Multiple-Choice | vqa | |
Roses are Red, Violets are Blue… But Should | vqa | expect Them To? |
S3C: Semi-Supervised | vqa | Natural Language Explanation via Self-Critical Learning |
SB- | vqa | : A Stack-Based Video Quality Assessment Framework for Video Enhancement |
Self-supervised knowledge distillation in counterfactual learning for | vqa | |
Sim | vqa | : Exploring Simulated Environments for Visual Question Answering |
SQuINTing at | vqa | Models: Introspecting VQA Models With Sub-Questions |
SQuINTing at | vqa | Models: Introspecting VQA Models With Sub-Questions |
Suppressing Biased Samples for Robust | vqa | |
TA-Student | vqa | : Multi-Agents Training by Self-Questioning |
TAP: Text-Aware Pre-training for Text- | vqa | and Text-Caption |
Towards Causal | vqa | : Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing |
Towards | vqa | Models That Can Read |
UGC- | vqa | : Benchmarking Blind Video Quality Assessment for User Generated Content |
Understanding | vqa | for Negative Answers Through Visual and Linguistic Inference |
VC- | vqa | : Visual Calibration Mechanism For Visual Question Answering |
VDPVE: | vqa | Dataset for Perceptual Video Enhancement |
Video Question Answering, Movies, Spatio-Temporal, Query, | vqa | |
Vision-Language Models, Language-Vision Models, | vqa | |
Visual Question Answering, Query, | vqa | |
| vqa | as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering |
| vqa | Therapy: Exploring Answer Differences by Visually Grounding Answers |
| vqa | With No Questions-Answers Training |
| vqa | , Visual Question Answering, Neural Networks |
| vqa | -E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions |
| vqa | -GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering |
| vqa | -LOL: Visual Question Answering Under the Lens of Logic |
| vqa | -Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions, The |
| vqa | : Visual Question Answering |
| vqa | : Visual Question Answering |
| vqa | : Visual Question Answering |
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in | vqa | and Question-Focused Semantic Segmentation |
Weakly Supervised Grounding for | vqa | in Vision-Language Transformers |
Zoom- | vqa | : Patches, Frames and Clips Integration for Video Quality Assessment |
85 for vqa