Wang, J.[Jialou],
Zhu, M.[Manli],
Li, Y.[Yulei],
Li, H.L.[Hong-Lei],
Yang, L.Z.[Long-Zhi],
Woo, W.L.[Wai Lok],
Detect2Interact: Localizing Object Key Field in Visual Question
Answering with LLMs,
IEEE_Int_Sys(39), No. 3, May 2024, pp. 35-44.
IEEE DOI
2407
Visualization, Semantics, Object detection, Image segmentation,
Task analysis, Computational modeling, Chatbots, Spatial resolution
BibRef
Hu, Z.J.[Zhong-Jian],
Yang, P.[Peng],
Jiang, Y.S.[Yuan-Shuang],
Bai, Z.J.[Zi-Jian],
Prompting large language model with context and pre-answer for
knowledge-based VQA,
PR(151), 2024, pp. 110399.
Elsevier DOI
2404
Visual question answering, Large language model,
Knowledge-based VQA, Fine-tuning, In-context learning
BibRef
Kuang, J.Y.[Jia-Yi],
Shen, Y.[Ying],
Xie, J.[Jingyou],
Luo, H.[Haohao],
Xu, Z.[Zhe],
Li, R.H.[Rong-Hao],
Li, Y.H.[Ying-Hui],
Cheng, X.F.[Xian-Feng],
Lin, X.[Xika],
Han, Y.[Yu],
Natural Language Understanding and Inference with MLLM in Visual
Question Answering: A Survey,
Surveys(57), No. 8, March 2025, pp. xx-yy.
DOI Link
2504
Survey, Large Language Models. Visual question answering,
multimodal representation and reasoning, multimodal large language models
BibRef
Xiong, H.M.[Hao-Miao],
Zhuge, Y.Z.[Yun-Zhi],
Zhu, J.[Jiawen],
Zhang, L.[Lu],
Lu, H.C.[Hu-Chuan],
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene
Understanding,
MultMed(27), 2025, pp. 2899-2911.
IEEE DOI
2506
Large language models, Solid modeling, Visualization, Training,
Point cloud compression, visual question answering
BibRef
Yu, Z.[Zhou],
Ouyang, X.C.[Xue-Cheng],
Shao, Z.W.[Zhen-Wei],
Wang, M.[Meng],
Yu, J.[Jun],
Prophet: Prompting Large Language Models With Complementary Answer
Heuristics for Knowledge-Based Visual Question Answering,
PAMI(47), No. 8, August 2025, pp. 6797-6808.
IEEE DOI
2507
BibRef
Earlier: A3, A1, A4, A5, Only:
Prompting Large Language Models with Answer Heuristics for
Knowledge-Based Visual Question Answering,
CVPR23(14974-14983)
IEEE DOI
2309
Knowledge based systems, Visualization, Helium, Cognition,
Question answering (information retrieval), Training,
large multimodal models
BibRef
Xu, Z.[Zibo],
Li, Q.[Qiang],
Nie, W.Z.[Wei-Zhi],
Wang, W.J.[Wei-Jie],
Liu, A.[Anan],
Structure Causal Models and LLMs Integration in Medical Visual
Question Answering,
MedImg(44), No. 8, August 2025, pp. 3476-3489.
IEEE DOI
2508
Visualization, Medical diagnostic imaging, Data models, Training,
Correlation, Question answering (information retrieval),
prompt strategy
BibRef
Huai, T.Y.[Tian-Yu],
Zhou, J.[Jie],
Wu, X.J.[Xing-Jiao],
Chen, Q.[Qin],
Bai, Q.C.[Qing-Chun],
Zhou, Z.[Ze],
He, L.[Liang],
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum
Mixture-of-Experts for Continual Visual Question Answering,
CVPR25(19608-19617)
IEEE DOI
2508
Training, Visualization,
Large language models, Computational modeling,
multimodal large language model
BibRef
Zhi, H.Y.[Hong-Yan],
Chen, P.H.[Pei-Hao],
Li, J.[Junyan],
Ma, S.[Shuailei],
Sun, X.Y.[Xin-Yu],
Xiang, T.H.[Tian-Hang],
Lei, Y.J.[Yin-Jie],
Tan, M.K.[Ming-Kui],
Gan, C.[Chuang],
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive
Visual Preferences,
CVPR25(3761-3771)
IEEE DOI Code:
WWW Link.
2508
Visualization, Solid modeling, Navigation, Fuses, Focusing, Benchmark testing,
Question answering (information retrieval)
BibRef
Cocchi, F.[Federico],
Moratelli, N.[Nicholas],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Augmenting Multimodal LLMs with Self-Reflective Tokens for
Knowledge-based Visual Question Answering,
CVPR25(9199-9209)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Source coding, Knowledge based systems,
Retrieval augmented generation, Pipelines, Predictive models,
multimodal large language models
BibRef
Yang, Z.[Zhen],
Tao, Z.[Zhuo],
Chen, Q.[Qi],
Li, L.[Liang],
Qi, Y.K.[Yuan-Kai],
van den Hengel, A.J.[Anton J.],
Huang, Q.M.[Qing-Ming],
Separation of powers: On segregating knowledge from observation in
LLM-enabled knowledge-based visual question answering,
CVPR25(24753-24762)
IEEE DOI
2508
Training, Visualization, Accuracy, Large language models,
Knowledge based systems, Pipelines, Transforms, Linguistics,
Data mining
BibRef
Cai, M.[Mu],
Huang, Z.Y.[Ze-Yi],
Li, Y.H.[Yu-Heng],
Ojha, U.[Utkarsh],
Wang, H.H.[Hao-Han],
Lee, Y.J.[Yong Jae],
An Investigation on LLMs' Visual Understanding Ability Using SVG for
Image-Text Bridging,
WACV25(5377-5386)
IEEE DOI Code:
WWW Link.
2505
Visualization, Large language models, Semantics, Vectors,
Question answering (information retrieval), Cognition, SVG
BibRef
Amoroso, R.[Roberto],
Zhang, G.[Gengyuan],
Koner, R.[Rajat],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Tresp, V.[Volker],
Perceive. Query & Reason: Enhancing Video QA with Question-Guided
Temporal Queries,
WACV25(8853-8862)
IEEE DOI
2505
Visualization, Large language models, Computational modeling,
Transformers, Question answering (information retrieval),
multimodal large language models
BibRef
Weng, W.X.[Wei-Xi],
Zhang, R.[Rui],
Meng, X.J.[Xiao-Jun],
Zhu, J.[Jieming],
Liu, Q.[Qun],
Yuan, C.[Chun],
Unsupervised Domain Adaptive Visual Question Answering in the Era of
Multi-Modal Large Language Models,
WACV25(6248-6258)
IEEE DOI
2505
Visualization, Systematics, Adaptive systems,
Large language models, Semantics, Aerospace electronics,
visual question answering
BibRef
Sun, G.H.[Guo-Hao],
Qin, C.[Can],
Wang, J.M.[Jia-Mian],
Chen, Z.Y.[Ze-Yuan],
Xu, R.[Ran],
Tao, Z.Q.[Zhi-Qiang],
SQ-LLAVA: Self-questioning for Large Vision-language Assistant,
ECCV24(IX: 156-172).
Springer DOI
2412
BibRef
Ye, Q.[Qilang],
Yu, Z.T.[Zi-Tong],
Shao, R.[Rui],
Xie, X.Y.[Xin-Yu],
Torr, P.H.S.[Philip H.S.],
Cao, X.C.[Xiao-Chun],
CAT: Enhancing Multimodal Large Language Model to Answer Questions in
Dynamic Audio-visual Scenarios,
ECCV24(X: 146-164).
Springer DOI
2412
BibRef
Li, Z.[Zhuowan],
Jasani, B.[Bhavan],
Tang, P.[Peng],
Ghadar, S.[Shabnam],
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators
for Reasoning-Based Chart VQA,
CVPR24(13613-13623)
IEEE DOI
2410
Training, Visualization, Technological innovation, Accuracy,
Computational modeling, Training data, Data augmentation
BibRef
Özdemir, Ö.[Övgü],
Akagündüz, E.[Erdem],
Enhancing Visual Question Answering through Question-Driven Image
Captions as Prompts,
Prompting24(1562-1571)
IEEE DOI Code:
WWW Link.
2410
Visualization, Computational modeling, Large language models,
Pipelines, Semantics, Question answering (information retrieval),
image captioning
BibRef
Ranasinghe, K.[Kanchana],
Shukla, S.N.[Satya Narayan],
Poursaeed, O.[Omid],
Ryoo, M.S.[Michael S.],
Lin, T.Y.[Tsung-Yu],
Learning to Localize Objects Improves Spatial Reasoning in
Visual-LLMs,
CVPR24(12977-12987)
IEEE DOI
2410
Training, Location awareness, Visualization, Image coding,
Large language models, Pipelines, Cognition, LLM, VQA, Localization,
Video
BibRef
Blau, T.[Tsachi],
Fogel, S.[Sharon],
Ronen, R.[Roi],
Golts, A.[Alona],
Tsiper, S.[Shahar],
Avraham, E.B.[Elad Ben],
Aberdam, A.[Aviad],
Ganz, R.[Roy],
Litman, R.[Ron],
GRAM: Global Reasoning for Multi-Page VQA,
CVPR24(15598-15607)
IEEE DOI
2410
Adaptation models, Visualization, Computational modeling,
Large language models, Benchmark testing, Transformers, Cognition,
Vision Language Models
BibRef
Li, L.[Li],
Peng, J.W.[Jia-Wei],
Chen, H.[Huiyi],
Gao, C.Y.[Chong-Yang],
Yang, X.[Xu],
How to Configure Good In-Context Sequence for Visual Question
Answering,
CVPR24(26700-26710)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Design methodology, Large language models,
Question answering (information retrieval)
BibRef
Agrawal, A.[Aviral],
Lezcano, C.M.S.[Carlos Mateo Samudio],
Heredia-Marin, I.B.[Iqui Balam],
Sethi, P.S.[Prabhdeep Singh],
Listen Then See: Video Alignment with Speaker Attention,
MULA24(2018-2027)
IEEE DOI
2410
Bridges, Visualization, Codes, Accuracy,
Question answering (information retrieval), LLM
BibRef
Tan, R.[Reuben],
Sun, X.[Ximeng],
Hu, P.[Ping],
Wang, J.H.[Jui-Hsien],
Deilamsalehy, H.[Hanieh],
Plummer, B.A.[Bryan A.],
Russell, B.[Bryan],
Saenko, K.[Kate],
Koala: Key Frame-Conditioned Long Video-LLM,
CVPR24(13581-13591)
IEEE DOI
2410
Visualization, Accuracy, Large language models, Computational modeling,
Benchmark testing, Question answering (information retrieval)
BibRef
Ganz, R.[Roy],
Kittenplon, Y.[Yair],
Aberdam, A.[Aviad],
Avraham, E.B.[Elad Ben],
Nuriel, O.[Oren],
Mazor, S.[Shai],
Litman, R.[Ron],
Question Aware Vision Transformer for Multimodal Reasoning,
CVPR24(13861-13871)
IEEE DOI
2410
Visualization, Image coding, Large language models, Focusing,
Transformers
BibRef
Bansal, H.[Hritik],
Bitton, Y.[Yonatan],
Szpektor, I.[Idan],
Chang, K.W.[Kai-Wei],
Grover, A.[Aditya],
VideoCon: Robust Video-Language Alignment via Contrast Captions,
CVPR24(13927-13937)
IEEE DOI
2410
Large language models, Semantics,
Question answering (information retrieval), Data models,
large multimodal models
BibRef
Wang, S.W.[Shao-Wei],
Zhang, L.L.[Ling-Ling],
Zhu, L.J.[Long-Ji],
Qin, T.[Tao],
Yap, K.H.[Kim-Hui],
Zhang, X.Y.[Xin-Yu],
Liu, J.[Jun],
CoG-DQA: Chain-of-Guiding Learning with Large Language Models for
Diagram Question Answering,
CVPR24(13969-13979)
IEEE DOI
2410
Bridges, Visualization, Large language models,
Computational modeling, Natural languages, Large Language Model
BibRef
Khan, Z.[Zaid],
BG, V.K.[Vijay Kumar],
Schulter, S.[Samuel],
Fu, Y.[Yun],
Chandraker, M.[Manmohan],
Self-Training Large Language Models for Improved Visual Program
Synthesis With Visual Reinforcement,
CVPR24(14344-14353)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Annotations, Large language models,
Object detection, Question answering (information retrieval),
visual question answering
BibRef
Liao, Z.[Zhaohe],
Li, J.T.[Jiang-Tong],
Niu, L.[Li],
Zhang, L.Q.[Li-Qing],
Align and Aggregate: Compositional Reasoning with Video Alignment and
Answer Aggregation for Video Question-Answering,
CVPR24(13395-13404)
IEEE DOI
2410
Measurement, Accuracy, Computational modeling, Aggregates,
Large language models, Pipelines
BibRef
Pan, J.T.[Jun-Ting],
Lin, Z.[Ziyi],
Ge, Y.Y.[Yu-Ying],
Zhu, X.T.[Xia-Tian],
Zhang, R.R.[Ren-Rui],
Wang, Y.[Yi],
Qiao, Y.[Yu],
Li, H.S.[Hong-Sheng],
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen
Large Language Models,
MMFM23(272-283)
IEEE DOI
2401
BibRef
Guo, J.X.[Jia-Xian],
Li, J.[Junnan],
Li, D.X.[Dong-Xu],
Tiong, A.M.H.[Anthony Meng Huat],
Li, B.Y.[Bo-Yang],
Tao, D.C.[Da-Cheng],
Hoi, S.[Steven],
From Images to Textual Prompts: Zero-shot Visual Question Answering
with Frozen Large Language Models,
CVPR23(10867-10877)
IEEE DOI
2309
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Large Language Models, Evaluations, Benchmarks, Surveys .