_ | textvqa | _ |
---|---|---|
Beyond OCR + VQA: Towards end-to-end reading and reasoning for robust and accurate | textvqa | |
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for | textvqa | |
multimodal attention fusion network with a dynamic vocabulary for | textvqa | , A |
Spatially Aware Multimodal Transformers for | textvqa | |
Structured Multimodal Attentions for | textvqa |