20.4.3.3.13 Visual Grounding in Visual Question Answering

Chapter Contents (Back)
Question Answer. Grounding. Visual Grounding. Visual Dialog. Mostly a subset of the related:
See also Visual Question Answering, Query, VQA.

Visual7W visual question answering,
Large-scale visual question answering (QA) dataset, with object-level groundings and multimodal answers. WWW Link.
Dataset, Visual Question Answering.

Liang, J.W.[Jun-Wei], Jiang, L.[Lu], Cao, L.L.[Liang-Liang], Kalantidis, Y.[Yannis], Li, L.J.[Li-Jia], Hauptmann, A.G.[Alexander G.],
Focal Visual-Text Attention for Memex Question Answering,
PAMI(41), No. 8, August 2019, pp. 1893-1908.
IEEE DOI 1907
BibRef
Earlier: A1, A2, A3, A5, A6, Only:
Focal Visual-Text Attention for Visual Question Answering,
CVPR18(6135-6143)
IEEE DOI 1812
Task analysis, Knowledge discovery, Visualization, Grounding, Metadata, Cognition, Photo albums, question answering, memex. Visualization, Videos, Computational modeling, Correlation. BibRef

Riquelme, F.[Felipe], de Goyeneche, A.[Alfredo], Zhang, Y.D.[Yun-Dong], Niebles, J.C.[Juan Carlos], Soto, A.[Alvaro],
Explaining VQA predictions using visual grounding and a knowledge base,
IVC(101), 2020, pp. 103968.
Elsevier DOI 2009
Deep Learning, Attention, Supervision, Knowledge Base, Interpretability, Explainability BibRef

Zhao, L.C.[Li-Chen], Cai, D.G.[Dai-Gang], Zhang, J.[Jing], Sheng, L.[Lu], Xu, D.[Dong], Zheng, R.[Rui], Zhao, Y.J.[Yin-Jie], Wang, L.P.[Li-Peng], Fan, X.[Xibo],
Toward Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline,
CirSysVideo(33), No. 6, June 2023, pp. 2935-2949.
IEEE DOI 2306
Task analysis, Visualization, Annotations, Point cloud compression, Solid modeling, Question answering (information retrieval), vision and language on 3D scenes BibRef

Zhu, L.J.[Liang-Jun], Peng, L.[Li], Zhou, W.N.[Wei-Nan], Yang, J.L.[Jie-Long],
Dual-decoder transformer network for answer grounding in visual question answering,
PRL(171), 2023, pp. 53-60.
Elsevier DOI 2306
Visual question answering, Answer grounding, Dual-decoder transformer BibRef


Huang, J.Y.[Jiang-Yong], Jia, B.X.[Bao-Xiong], Wang, Y.[Yan], Zhu, Z.Y.[Zi-Yu], Linghu, X.K.[Xiong-Kun], Li, Q.[Qing], Zhu, S.C.[Song-Chun], Huang, S.Y.[Si-Yuan],
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis,
CVPR25(24570-24581)
IEEE DOI 2508
Measurement, Solid modeling, Grounding, Computational modeling, Coherence, Benchmark testing, Solids, Data models, Robustness, 3d question answering BibRef

Chen, K.[Kang], Wu, X.Q.[Xiang-Qian],
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning,
CVPR24(27208-27217)
IEEE DOI Code:
WWW Link. 2410
Measurement, Visualization, Grounding, Computational modeling, Natural languages, Fitting, Object detection, dataset BibRef

Di, S.Z.[Shang-Zhe], Xie, W.[Weidi],
Grounded Question-Answering in Long Egocentric Videos,
CVPR24(12934-12943)
IEEE DOI 2410
Visualization, Grounding, Large language models, Pipelines, Training data, Benchmark testing, Data models, egocentric vision, video grounding BibRef

Chen, C.Y.[Chong-Yan], Anjum, S.[Samreen], Gurari, D.[Danna],
VQA Therapy: Exploring Answer Differences by Visually Grounding Answers,
ICCV23(15269-15279)
IEEE DOI Code:
WWW Link. 2401
BibRef

Le, T.M.[Thao Minh], Le, V.[Vuong], Gupta, S.I.[Sun-Il], Venkatesh, S.[Svetha], Tran, T.[Truyen],
Guiding Visual Question Answering with Attention Priors,
WACV23(4370-4379)
IEEE DOI 2302
Training, Visualization, Systematics, Grounding, Semantics, Linguistics, Cognition, visual reasoning) BibRef

Khan, A.U.[Aisha Urooj], Kuehne, H.[Hilde], Gan, C.[Chuang], da Vitoria Lobo, N.[Niels], Shah, M.[Mubarak],
Weakly Supervised Grounding for VQA in Vision-Language Transformers,
ECCV22(XXXV:652-670).
Springer DOI 2211
BibRef

Gupta, K.[Kshitij], Gautam, D.[Devansh], Mamidi, R.[Radhika],
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation,
ICPR22(1734-1741)
IEEE DOI 2212
Training, Visualization, Analytical models, Pipelines, Transformers, Question answering (information retrieval), Data models BibRef

Li, Y.C.[Yi-Cong], Wang, X.[Xiang], Xiao, J.B.[Jun-Bin], Ji, W.[Wei], Chua, T.S.[Tat-Seng],
Invariant Grounding for Video Question Answering,
CVPR22(2918-2927)
IEEE DOI 2210
Visualization, Correlation, Grounding, Semantics, Predictive models, Linguistics, Question answering (information retrieval), Vision + language BibRef

Lu, X.P.[Xiao-Peng], Fan, Z.[Zhen], Wang, Y.[Yansen], Oh, J.[Jean], Rosé, C.P.[Carolyn P.],
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling,
XSAnim21(2631-2639)
IEEE DOI 2112
Integrated optics, Visualization, Grounding, Computational modeling, Knowledge discovery BibRef

Khan, A.U.[Aisha Urooj], Kuehne, H.[Hilde], Duarte, K.[Kevin], Gan, C.[Chuang], Lobo, N.[Niels], Shah, M.[Mubarak],
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules,
CVPR21(8461-8470)
IEEE DOI 2111
Training, Visualization, Vocabulary, Grounding, Focusing, Detectors, Knowledge discovery BibRef

Selvaraju, R.R., Tendulkar, P., Parikh, D., Horvitz, E., Tulio Ribeiro, M., Nushi, B., Kamar, E.,
SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions,
CVPR20(10000-10008)
IEEE DOI 2008
Cognition, Task analysis, Visualization, Image color analysis, Grounding, Text recognition, Computational modeling BibRef

Gouthaman, K.V., Mittal, A.[Anurag],
Reducing Language Biases in Visual Question Answering with Visually-grounded Question Encoder,
ECCV20(XIII:18-34).
Springer DOI 2011
BibRef

Tan, H.L., Leong, M.C., Xu, Q., Li, L., Fang, F., Cheng, Y., Gauthier, N., Sun, Y., Lim, J.H.,
Task-Oriented Multi-Modal Question Answering For Collaborative Applications,
ICIP20(1426-1430)
IEEE DOI 2011
Task analysis, Collaboration, Grounding, Visualization, Cognition, Training, Machine learning, question answering, corpora BibRef

Selvaraju, R.R., Lee, S., Shen, Y., Jin, H., Ghosh, S., Heck, L., Batra, D., Parikh, D.,
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded,
ICCV19(2591-2600)
IEEE DOI 2004
gradient methods, image retrieval, natural language processing, neural nets, question answering (information retrieval), HINT, Correlation BibRef

Zhang, Y., Niebles, J.C., Soto, A.,
Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining,
WACV19(349-357)
IEEE DOI 1904
data mining, data visualisation, image representation, learning (artificial intelligence) BibRef

Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Referring Expression Comprehension .


Last update:Sep 10, 2025 at 12:00:25