20.4.3.3.4 Vision-Language Models, Hallucination Mitigation

Chapter Contents (Back)
Vision Language Model. Hallucinations. Vision-Language Model. Large Language Models. 2509

See also Jailbreaking Language Models.


Yu, T.Y.[Tian-Yu], Zhang, H.[Haoye], Li, Q.M.[Qi-Ming], Xu, Q.X.[Qi-Xin], Yao, Y.[Yuan], Chen, D.[Da], Lu, X.M.[Xiao-Man], Cui, G.[Ganqu], Dang, Y.K.[Yun-Kai], He, T.[Taiwen], Feng, X.C.[Xiao-Cheng], Song, J.[Jun], Zheng, B.[Bo], Liu, Z.Y.[Zhi-Yuan], Chua, T.S.[Tat-Seng], Sun, M.S.[Mao-Song],
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness,
CVPR25(19985-19995)
IEEE DOI 2508
Computational modeling, Manuals, Data collection, Benchmark testing, Cognition, Labeling, Artificial intelligence BibRef

Liang, J.[Jian], Huang, W.K.[Wen-Ke], Wan, G.C.[Guan-Cheng], Yang, Q.[Qu], Ye, M.[Mang],
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models,
CVPR25(26170-26180)
IEEE DOI 2508
Visualization, Large language models, Prevention and mitigation, Redundancy, Trajectory, Tuning BibRef

Cao, Y.[Yue], Xing, Y.[Yun], Zhang, J.[Jie], Lin, D.[Di], Zhang, T.W.[Tian-Wei], Tsang, I.[Ivor], Liu, Y.[Yang], Guo, Q.[Qing],
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments,
CVPR25(25050-25059)
IEEE DOI Code:
WWW Link. 2508
Printing, Visualization, Codes, Cognition, Planning, physical adversarial attack, typographic attack, llm agent, BibRef

Wang, Y.B.[Yan-Bo], Guan, J.[Jiyang], Liang, J.[Jian], He, R.[Ran],
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?,
CVPR25(19879-19889)
IEEE DOI 2508
Large language models, Current measurement, Boosting, Data models, Safety, Security, distribution gap, MLLM safety BibRef

Peng, R.[Ruotian], He, H.Y.[Hai-Ying], Wei, Y.[Yake], Wen, Y.D.[Yan-Dong], Hu, D.[Di],
Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception,
CVPR25(3963-3973)
IEEE DOI Code:
WWW Link. 2508
Visualization, Filtering, Computational modeling, Source coding, Semantics, Pipelines, Text to image, Reliability, Text to video, hallucinations BibRef

Yang, Z.[Zhihe], Luo, X.[Xufang], Han, D.Q.[Dong-Qi], Xu, Y.J.[Yun-Jian], Li, D.S.[Dong-Sheng],
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key,
CVPR25(10610-10620)
IEEE DOI 2508
Training, Fault diagnosis, Benchmark testing, Data models, Optimization, reinforcement learning, dpo, hallucinations, large language models BibRef

Bae, K.[Kyungho], Kim, J.[Jinhyung], Lee, S.[Sihaeng], Lee, S.[Soonyoung], Lee, G.[Gunhee], Choi, J.[Jinwoo],
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations,
CVPR25(13744-13753)
IEEE DOI 2508
Technological innovation, Attention mechanisms, Large language models, Benchmark testing, Predictive models, Context modeling BibRef

Yin, H.[Hao], Si, G.Z.[Gunag-Zong], Wang, Z.[Zilei],
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models,
CVPR25(14625-14634)
IEEE DOI Code:
WWW Link. 2508
Training, Visualization, Accuracy, Prevention and mitigation, Large language models, Computational modeling, Coherence, Decoding, attention mechanism BibRef

Yang, L.[Le], Zheng, Z.W.[Zi-Wei], Chen, B.[Boxu], Zhao, Z.Y.[Zheng-Yu], Lin, C.H.[Chen-Hao], Shen, C.[Chao],
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection,
CVPR25(14635-14645)
IEEE DOI Code:
WWW Link. 2508
Visualization, Costs, Codes, Large language models, Computational modeling, Null space, Feature extraction, ai safety BibRef

Wu, Y.C.[Yuan-Chen], Zhang, L.[Lu], Yao, H.[Hang], Du, J.L.[Jun-Long], Yan, K.[Ke], Ding, S.H.[Shou-Hong], Wu, Y.S.[Yun-Sheng], Li, X.Q.[Xiao-Qiang],
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception,
CVPR25(14646-14656)
IEEE DOI 2508
Prevention and mitigation, Focusing, Benchmark testing, Reliability, Optimization, Synthetic data BibRef

Tu, Y.[Yahan], Hu, R.[Rui], Sang, J.[Jitao],
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models,
CVPR25(19836-19845)
IEEE DOI Code:
WWW Link. 2508
Visualization, Protocols, Codes, Large language models, Benchmark testing, Question answering (information retrieval), Contamination BibRef

Liu, J.Z.[Jia-Zhen], Fu, Y.H.[Yu-Han], Xie, R.[Ruobing], Xie, R.[Runquan], Sun, X.[Xingwu], Lian, F.Z.[Feng-Zong], Kang, Z.[Zhanhui], Li, X.R.[Xi-Rong],
PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset,
CVPR25(19857-19866)
IEEE DOI 2508
Visualization, Image synthesis, Large language models, Pipelines, Question generation, Distance measurement, computer vision, mllms, hallucination evaluation BibRef

Jiang, Z.Q.[Zhang-Qi], Chen, J.K.[Jun-Kai], Zhu, B.[Beier], Luo, T.J.[Ting-Jin], Shen, Y.K.[Yan-Kun], Yang, X.[Xu],
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens,
CVPR25(25004-25014)
IEEE DOI 2508
Training, Visualization, Computational modeling, Semantics, Reliability, Lenses, vision-language models, hallucinations BibRef

Park, E.[Eunkyu], Kim, M.[Minyeong], Kim, G.[Gunhee],
HalLoc: Token-level Localization of Hallucinations for Vision Language Models,
CVPR25(29893-29903)
IEEE DOI Code:
WWW Link. 2508
Training, Location awareness, Visualization, Accuracy, Computational modeling, Prevention and mitigation, hallucination detection benchmark for vision and language models BibRef

Suo, W.[Wei], Zhang, L.J.[Li-Jun], Sun, M.Y.[Meng-Yang], Wu, L.Y.B.[Lin Yuan-Bo], Wang, P.[Peng], Zhang, Y.N.[Yan-Ning],
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding,
CVPR25(29904-29914)
IEEE DOI Code:
WWW Link. 2508
Visualization, Adaptation models, Codes, Benchmark testing, Hybrid power systems, Cognition, Decoding, Faces, contrastive decoding BibRef

An, W.B.[Wen-Bin], Tian, F.[Feng], Leng, S.[Sicong], Nie, J.H.[Jia-Hao], Lin, H.[Haonan], Wang, Q.Y.[Qian-Ying], Chen, P.[Ping], Zhang, X.Q.[Xiao-Qin], Lu, S.J.[Shi-Jian],
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention,
CVPR25(29915-29926)
IEEE DOI Code:
WWW Link. 2508
Visualization, Codes, Grounding, Prevention and mitigation, Computational modeling, Decoding, Object recognition, Assembly, large vision-language models BibRef

Zhuang, X.W.[Xian-Wei], Zhu, Z.H.[Zhi-Hong], Xie, Y.X.[Yu-Xin], Liang, L.M.[Li-Ming], Zou, Y.X.[Yue-Xian],
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification,
CVPR25(4189-4199)
IEEE DOI Code:
WWW Link. 2508
Training, Visualization, Codes, Prevention and mitigation, Benchmark testing, Inference algorithms, Decoding BibRef

Basak, D.[Debolena], Bhatt, S.[Soham], Kanduri, S.[Sahith], Desarkar, M.S.[Maunendra Sankar],
Aerial Mirage: Unmasking Hallucinations in Large Vision Language Models,
WACV25(5500-5508)
IEEE DOI 2505
Training, Reviews, Annotations, Surveillance, Computational modeling, Decision making, Data models, Reliability, Drones BibRef

Tang, F.L.[Fei-Long], Liu, C.Z.[Cheng-Zhi], Xu, Z.X.[Zhong-Xing], Hu, M.[Ming], Huang, Z.[Zile], Xue, H.C.[Hao-Chen], Chen, Z.Y.[Zi-Yang], Peng, Z.L.[Ze-Lin], Yang, Z.W.[Zhi-Wei], Zhou, S.J.[Si-Jin], Li, W.X.[Wen-Xue], Li, Y.L.[Yu-Long], Song, W.X.[Wen-Xuan], Su, S.Y.[Shi-Yan], Feng, W.[Wei], Su, J.[Jionglong], Lin, M.[Minquan], Peng, Y.F.[Yi-Fan], Cheng, X.L.[Xue-Lian], Razzak, I.[Imran], Ge, Z.Y.[Zong-Yuan],
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding,
CVPR25(26147-26159)
IEEE DOI 2508
Heart, Visualization, Large language models, Video sequences, Interference, Question answering (information retrieval), proving its effectiveness BibRef

Yang, J.N.[Jia-Ning], Chen, X.[Xuweiyi], Madaan, N.[Nikhil], Iyengar, M.[Madhavan], Qian, S.[Shengyi], Fouhey, D.F.[David F.], Chai, J.[Joyce],
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination,
CVPR25(29501-29512)
IEEE DOI 2508
Technological innovation, Solid modeling, Grounding, Benchmark testing, Reliability engineering, Sparks, Tuning, visual grounding BibRef

Yoon, D.[Dokyoon], Song, Y.[Youngsook], Park, W.[Woomyong],
Stop learning it all to mitigate visual hallucination, Focus on the hallucination target,
CVPR25(4200-4208)
IEEE DOI 2508
Learning systems, Visualization, Large language models, Focusing, Information filters, Data augmentation, Reliability, preference learning BibRef

s Chen, J.Z.[Jun-Zhe], Zhang, T.S.[Tian-Shu], Huang, S.Y.[Shi-Yu], Niu, Y.W.[Yu-Wei], Zhang, L.F.[Lin-Feng], Wen, L.J.[Li-Jie], Hu, X.M.[Xu-Ming],
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models,
CVPR25(4209-4221)
IEEE DOI Code:
WWW Link. 2508
Visualization, Head, Costs, Computational modeling, Data models, Information and communication technology, Decoding, inference intervention BibRef

Kim, B.[Bumsoo], Shin, W.[Wonseop], Lee, K.[Kyuchul], Jung, Y.[Yonghoon], Seo, S.[Sanghyun],
Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information,
WACV25(5398-5407)
IEEE DOI Code:
WWW Link. 2505
Training, Visualization, Solid modeling, Accuracy, Large language models, Semantics, Text to image, large-scale text-to-image (TTI) models BibRef

Huang, P.H.[Po-Hsuan], Li, J.L.[Jeng-Lin], Chen, C.P.[Chin-Po], Chang, M.C.[Ming-Ching], Chen, W.C.[Wei-Chao],
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis,
WACV25(6125-6135)
IEEE DOI 2505
Training, Visualization, Prevention and mitigation, Computational modeling, Semantics, Natural languages, causal analysis BibRef

Liu, S.[Shi], Zheng, K.[Kecheng], Chen, W.[Wei],
Paying More Attention to Image: A Training-free Method for Alleviating Hallucination in LVLMS,
ECCV24(LXXXIII: 125-140).
Springer DOI 2412
BibRef

Zhang, J.[Jinrui], Wang, T.[Teng], Zhang, H.G.[Hai-Gang], Lu, P.[Ping], Zheng, F.[Feng],
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-language Models,
ECCV24(LXVIII: 196-213).
Springer DOI 2412
BibRef

Kaul, P.[Prannay], Li, Z.Z.[Zhi-Zhong], Yang, H.[Hao], Dukler, Y.[Yonatan], Swaminathan, A.[Ashwin], Taylor, C.J., Soatto, S.[Stefano],
THRONE: An Object-Based Hallucination Benchmark for the Free-Form Generations of Large Vision-Language Models,
CVPR24(27218-27228)
IEEE DOI 2410
Measurement, Training, Ethics, Accuracy, Computational modeling, Graphics processing units, hallucination, benchmark, LLM, LVLM, large vision-language model BibRef

Jiang, C.Y.[Chao-Ya], Xu, H.Y.[Hai-Yang], Dong, M.F.[Meng-Fan], Chen, J.X.[Jia-Xing], Ye, W.[Wei], Yan, M.[Ming], Ye, Q.H.[Qing-Hao], Zhang, J.[Ji], Huang, F.[Fei], Zhang, S.K.[Shi-Kun],
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model,
CVPR24(27026-27036)
IEEE DOI Code:
WWW Link. 2410
Representation learning, Visualization, Codes, Large language models, Natural languages, Contrastive learning BibRef

Huang, Q.D.[Qi-Dong], Dong, X.Y.[Xiao-Yi], Zhang, P.[Pan], Wang, B.[Bin], He, C.H.[Cong-Hui], Wang, J.Q.[Jia-Qi], Lin, D.[Dahua], Zhang, W.M.[Wei-Ming], Yu, N.H.[Neng-Hai],
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation,
CVPR24(13418-13427)
IEEE DOI Code:
WWW Link. 2410
Training, Measurement, Costs, Codes, Large language models, Focusing, Hallucination, Large vision-language model, Multimodal LLM, LLM BibRef

Yu, Q.F.[Qi-Fan], Li, J.C.[Jun-Cheng], Wei, L.H.[Long-Hui], Pang, L.[Liang], Ye, W.T.[Wen-Tao], Qin, B.S.[Bo-Sheng], Tang, S.L.[Si-Liang], Tian, Q.[Qi], Zhuang, Y.T.[Yue-Ting],
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data,
CVPR24(12944-12953)
IEEE DOI Code:
WWW Link. 2410
Measurement, Visualization, Toxicology, Correlation, Codes, Large language models, Hallucinations, Vision-language reasoning BibRef

Favero, A.[Alessandro], Zancato, L.[Luca], Trager, M.[Matthew], Choudhary, S.[Siddharth], Perera, P.[Pramuditha], Achille, A.[Alessandro], Swaminathan, A.[Ashwin], Soatto, S.[Stefano],
Multi-Modal Hallucination Control by Visual Information Grounding,
CVPR24(14303-14312)
IEEE DOI 2410
Training, Visualization, Grounding, Linguistics, Sampling methods, Inference algorithms, Vision, language, reasoning BibRef

Ouali, Y.[Yassine], Bulat, A.[Adrian], Martinez, B.[Brais], Tzimiropoulos, G.[Georgios],
CLIP-DPO: Vision-language Models as a Source of Preference for Fixing Hallucinations in LVLMS,
ECCV24(LXXVI: 395-413).
Springer DOI 2412
BibRef

Ye-Bin, M.[Moon], Hyeon-Woo, N.[Nam], Choi, W.[Wonseok], Oh, T.H.[Tae-Hyun],
Beaf: Observing Before-after Changes to Evaluate Hallucination in Vision-language Models,
ECCV24(XI: 232-248).
Springer DOI 2412
BibRef

Kim, M.[Minchan], Kim, M.[Minyeong], Bae, J.[Junik], Choi, S.[Suhwan], Kim, S.[Sungkyung], Chang, B.[Buru],
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-language Models,
ECCV24(LXXXVI: 236-252).
Springer DOI 2412
BibRef

Guan, T.R.[Tian-Rui], Liu, F.[Fuxiao], Wu, X.[Xiyang], Xian, R.Q.[Rui-Qi], Li, Z.X.[Zong-Xia], Liu, X.Y.[Xiao-Yu], Wang, X.[Xijun], Chen, L.[Lichang], Huang, F.[Furong], Yacoob, Y.[Yaser], Manocha, D.[Dinesh], Zhou, T.Y.[Tian-Yi],
Hallusionbench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models,
CVPR24(14375-14385)
IEEE DOI Code:
WWW Link. 2410
Visualization, Analytical models, Accuracy, Statistical analysis, Computational modeling, Benchmark testing, Vision language model, VLM Evaluation BibRef

Leng, S.[Sicong], Zhang, H.[Hang], Chen, G.Z.[Guan-Zheng], Li, X.[Xin], Lu, S.J.[Shi-Jian], Miao, C.Y.[Chun-Yan], Bing, L.[Lidong],
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding,
CVPR24(13872-13882)
IEEE DOI 2410
Training, Visualization, Accuracy, Computational modeling, Benchmark testing, Decoding, Multimodality, Vision and Language BibRef

Wang, Z.[Zhecan], Bingham, G.[Garrett], Yu, A.W.[Adams Wei], Le, Q.V.[Quoc V.], Luong, T.[Thang], Ghiasi, G.[Golnaz],
Haloquest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning,
ECCV24(LXXVII: 288-304).
Springer DOI 2412
BibRef

Wang, T.J.J.[Tzu-Jui Julius], Laaksonen, J.[Jorma], Langer, T.[Tomas], Arponen, H.[Heikki], Bishop, T.E.[Tom E.],
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision,
WACV23(1073-1083)
IEEE DOI 2302
Visualization, Vocabulary, Computational modeling, Detectors, Benchmark testing, Transformers, un-supervised learning BibRef

Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Jailbreaking Language Models .


Last update:Sep 10, 2025 at 12:00:25