Wang, J.[Jialou],
Zhu, M.[Manli],
Li, Y.[Yulei],
Li, H.L.[Hong-Lei],
Yang, L.Z.[Long-Zhi],
Woo, W.L.[Wai Lok],
Detect2Interact: Localizing Object Key Field in Visual Question
Answering with LLMs,
IEEE_Int_Sys(39), No. 3, May 2024, pp. 35-44.
IEEE DOI
2407
Visualization, Semantics, Object detection, Image segmentation,
Task analysis, Computational modeling, Chatbots, Spatial resolution
BibRef
Hu, Z.J.[Zhong-Jian],
Yang, P.[Peng],
Jiang, Y.S.[Yuan-Shuang],
Bai, Z.J.[Zi-Jian],
Prompting large language model with context and pre-answer for
knowledge-based VQA,
PR(151), 2024, pp. 110399.
Elsevier DOI
2404
Visual question answering, Large language model,
Knowledge-based VQA, Fine-tuning, In-context learning
BibRef
Kuang, J.Y.[Jia-Yi],
Shen, Y.[Ying],
Xie, J.[Jingyou],
Luo, H.[Haohao],
Xu, Z.[Zhe],
Li, R.H.[Rong-Hao],
Li, Y.H.[Ying-Hui],
Cheng, X.F.[Xian-Feng],
Lin, X.[Xika],
Han, Y.[Yu],
Natural Language Understanding and Inference with MLLM in Visual
Question Answering: A Survey,
Surveys(57), No. 8, March 2025, pp. xx-yy.
DOI Link
2504
Survey, Large Language Models. Visual question answering,
multimodal representation and reasoning, multimodal large language models
BibRef
Xiong, H.M.[Hao-Miao],
Zhuge, Y.Z.[Yun-Zhi],
Zhu, J.[Jiawen],
Zhang, L.[Lu],
Lu, H.C.[Hu-Chuan],
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene
Understanding,
MultMed(27), 2025, pp. 2899-2911.
IEEE DOI
2506
Large language models, Solid modeling, Visualization, Training,
Point cloud compression, visual question answering
BibRef
Yu, Z.[Zhou],
Ouyang, X.C.[Xue-Cheng],
Shao, Z.W.[Zhen-Wei],
Wang, M.[Meng],
Yu, J.[Jun],
Prophet: Prompting Large Language Models With Complementary Answer
Heuristics for Knowledge-Based Visual Question Answering,
PAMI(47), No. 8, August 2025, pp. 6797-6808.
IEEE DOI
2507
BibRef
Earlier: A3, A1, A4, A5, Only:
Prompting Large Language Models with Answer Heuristics for
Knowledge-Based Visual Question Answering,
CVPR23(14974-14983)
IEEE DOI
2309
Knowledge based systems, Visualization, Helium, Cognition,
Question answering (information retrieval), Training,
large multimodal models
BibRef
Amoroso, R.[Roberto],
Zhang, G.[Gengyuan],
Koner, R.[Rajat],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Tresp, V.[Volker],
Perceive. Query & Reason: Enhancing Video QA with Question-Guided
Temporal Queries,
WACV25(8853-8862)
IEEE DOI
2505
Visualization, Large language models, Computational modeling,
Transformers, Question answering (information retrieval),
multimodal large language models
BibRef
Weng, W.X.[Wei-Xi],
Zhang, R.[Rui],
Meng, X.J.[Xiao-Jun],
Zhu, J.[Jieming],
Liu, Q.[Qun],
Yuan, C.[Chun],
Unsupervised Domain Adaptive Visual Question Answering in the Era of
Multi-Modal Large Language Models,
WACV25(6248-6258)
IEEE DOI
2505
Visualization, Systematics, Adaptive systems,
Large language models, Semantics, Aerospace electronics,
visual question answering
BibRef
Sun, G.H.[Guo-Hao],
Qin, C.[Can],
Wang, J.M.[Jia-Mian],
Chen, Z.Y.[Ze-Yuan],
Xu, R.[Ran],
Tao, Z.Q.[Zhi-Qiang],
SQ-LLAVA: Self-questioning for Large Vision-language Assistant,
ECCV24(IX: 156-172).
Springer DOI
2412
BibRef
Ye, Q.[Qilang],
Yu, Z.T.[Zi-Tong],
Shao, R.[Rui],
Xie, X.Y.[Xin-Yu],
Torr, P.H.S.[Philip H.S.],
Cao, X.C.[Xiao-Chun],
CAT: Enhancing Multimodal Large Language Model to Answer Questions in
Dynamic Audio-visual Scenarios,
ECCV24(X: 146-164).
Springer DOI
2412
BibRef
Hu, Y.[Yutao],
Li, T.[Tianbin],
Lu, Q.[Quanfeng],
Shao, W.Q.[Wen-Qi],
He, J.J.[Jun-Jun],
Qiao, Y.[Yu],
Luo, P.[Ping],
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for
Medical LVLM,
CVPR24(22170-22183)
IEEE DOI Code:
WWW Link.
2410
Reflectivity, Visualization, Biological system modeling,
Computational modeling, Medical services, Benchmark testing
BibRef
Li, Z.[Zhuowan],
Jasani, B.[Bhavan],
Tang, P.[Peng],
Ghadar, S.[Shabnam],
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators
for Reasoning-Based Chart VQA,
CVPR24(13613-13623)
IEEE DOI
2410
Training, Visualization, Technological innovation, Accuracy,
Computational modeling, Training data, Data augmentation
BibRef
Özdemir, Ö.[Övgü],
Akagündüz, E.[Erdem],
Enhancing Visual Question Answering through Question-Driven Image
Captions as Prompts,
Prompting24(1562-1571)
IEEE DOI Code:
WWW Link.
2410
Visualization, Computational modeling, Large language models,
Pipelines, Semantics, Question answering (information retrieval),
image captioning
BibRef
Ranasinghe, K.[Kanchana],
Shukla, S.N.[Satya Narayan],
Poursaeed, O.[Omid],
Ryoo, M.S.[Michael S.],
Lin, T.Y.[Tsung-Yu],
Learning to Localize Objects Improves Spatial Reasoning in
Visual-LLMs,
CVPR24(12977-12987)
IEEE DOI
2410
Training, Location awareness, Visualization, Image coding,
Large language models, Pipelines, Cognition, LLM, VQA, Localization,
Video
BibRef
Blau, T.[Tsachi],
Fogel, S.[Sharon],
Ronen, R.[Roi],
Golts, A.[Alona],
Tsiper, S.[Shahar],
Avraham, E.B.[Elad Ben],
Aberdam, A.[Aviad],
Ganz, R.[Roy],
Litman, R.[Ron],
GRAM: Global Reasoning for Multi-Page VQA,
CVPR24(15598-15607)
IEEE DOI
2410
Adaptation models, Visualization, Computational modeling,
Large language models, Benchmark testing, Transformers, Cognition,
Vision Language Models
BibRef
Li, L.[Li],
Peng, J.W.[Jia-Wei],
Chen, H.[Huiyi],
Gao, C.Y.[Chong-Yang],
Yang, X.[Xu],
How to Configure Good In-Context Sequence for Visual Question
Answering,
CVPR24(26700-26710)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Design methodology, Large language models,
Question answering (information retrieval)
BibRef
Agrawal, A.[Aviral],
Lezcano, C.M.S.[Carlos Mateo Samudio],
Heredia-Marin, I.B.[Iqui Balam],
Sethi, P.S.[Prabhdeep Singh],
Listen Then See: Video Alignment with Speaker Attention,
MULA24(2018-2027)
IEEE DOI
2410
Bridges, Visualization, Codes, Accuracy,
Question answering (information retrieval), LLM
BibRef
Tan, R.[Reuben],
Sun, X.[Ximeng],
Hu, P.[Ping],
Wang, J.H.[Jui-Hsien],
Deilamsalehy, H.[Hanieh],
Plummer, B.A.[Bryan A.],
Russell, B.[Bryan],
Saenko, K.[Kate],
Koala: Key Frame-Conditioned Long Video-LLM,
CVPR24(13581-13591)
IEEE DOI
2410
Visualization, Accuracy, Large language models, Computational modeling,
Benchmark testing, Question answering (information retrieval)
BibRef
Ganz, R.[Roy],
Kittenplon, Y.[Yair],
Aberdam, A.[Aviad],
Avraham, E.B.[Elad Ben],
Nuriel, O.[Oren],
Mazor, S.[Shai],
Litman, R.[Ron],
Question Aware Vision Transformer for Multimodal Reasoning,
CVPR24(13861-13871)
IEEE DOI
2410
Visualization, Image coding, Large language models, Focusing,
Transformers
BibRef
Bansal, H.[Hritik],
Bitton, Y.[Yonatan],
Szpektor, I.[Idan],
Chang, K.W.[Kai-Wei],
Grover, A.[Aditya],
VideoCon: Robust Video-Language Alignment via Contrast Captions,
CVPR24(13927-13937)
IEEE DOI
2410
Large language models, Semantics,
Question answering (information retrieval), Data models,
large multimodal models
BibRef
Wang, S.W.[Shao-Wei],
Zhang, L.L.[Ling-Ling],
Zhu, L.J.[Long-Ji],
Qin, T.[Tao],
Yap, K.H.[Kim-Hui],
Zhang, X.Y.[Xin-Yu],
Liu, J.[Jun],
CoG-DQA: Chain-of-Guiding Learning with Large Language Models for
Diagram Question Answering,
CVPR24(13969-13979)
IEEE DOI
2410
Bridges, Visualization, Large language models,
Computational modeling, Natural languages, Large Language Model
BibRef
Khan, Z.[Zaid],
BG, V.K.[Vijay Kumar],
Schulter, S.[Samuel],
Fu, Y.[Yun],
Chandraker, M.[Manmohan],
Self-Training Large Language Models for Improved Visual Program
Synthesis With Visual Reinforcement,
CVPR24(14344-14353)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Annotations, Large language models,
Object detection, Question answering (information retrieval),
visual question answering
BibRef
Liao, Z.[Zhaohe],
Li, J.T.[Jiang-Tong],
Niu, L.[Li],
Zhang, L.Q.[Li-Qing],
Align and Aggregate: Compositional Reasoning with Video Alignment and
Answer Aggregation for Video Question-Answering,
CVPR24(13395-13404)
IEEE DOI
2410
Measurement, Accuracy, Computational modeling, Aggregates,
Large language models, Pipelines
BibRef
Pan, J.T.[Jun-Ting],
Lin, Z.[Ziyi],
Ge, Y.Y.[Yu-Ying],
Zhu, X.T.[Xia-Tian],
Zhang, R.R.[Ren-Rui],
Wang, Y.[Yi],
Qiao, Y.[Yu],
Li, H.S.[Hong-Sheng],
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen
Large Language Models,
MMFM23(272-283)
IEEE DOI
2401
BibRef
Guo, J.X.[Jia-Xian],
Li, J.[Junnan],
Li, D.X.[Dong-Xu],
Tiong, A.M.H.[Anthony Meng Huat],
Li, B.Y.[Bo-Yang],
Tao, D.C.[Da-Cheng],
Hoi, S.[Steven],
From Images to Textual Prompts: Zero-shot Visual Question Answering
with Frozen Large Language Models,
CVPR23(10867-10877)
IEEE DOI
2309
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Image-Text Matching, Image Text Retrieval, Image-Text Retrieval .