Tamaazousti, Y.[Youssef],
Le Borgne, H.[Hervé],
Popescu, A.[Adrian],
Gadeski, E.[Etienne],
Ginsca, A.[Alexandru],
Hudelot, C.[Céline],
Vision-language integration using constrained local semantic features,
CVIU(163), No. 1, 2017, pp. 41-57.
Elsevier DOI
1712
Image classification
BibRef
Zhu, Y.Q.[Yong-Qing],
Li, X.Y.[Xiang-Yang],
Zheng, M.[Mao],
Yang, J.H.[Jia-Hao],
Wang, Z.H.[Zi-Han],
Guo, X.Q.[Xiao-Qian],
Chai, Z.F.[Zi-Feng],
Yuan, Y.C.[Yu-Chen],
Jiang, S.Q.[Shu-Qiang],
Focus and Align: Learning Tube Tokens for Video-Language Pre-Training,
MultMed(25), 2023, pp. 8036-8050.
IEEE DOI
2312
BibRef
Wu, W.H.[Wen-Hao],
Sun, Z.[Zhun],
Song, Y.X.[Yu-Xin],
Wang, J.D.[Jing-Dong],
Ouyang, W.L.[Wan-Li],
Transferring Vision-Language Models for Visual Recognition:
A Classifier Perspective,
IJCV(132), No. 2, February 2024, pp. 392-409.
Springer DOI
2402
BibRef
Ming, Y.F.[Yi-Fei],
Li, Y.X.[Yi-Xuan],
How Does Fine-Tuning Impact Out-of-Distribution Detection for
Vision-Language Models?,
IJCV(132), No. 2, February 2024, pp. 596-609.
Springer DOI
2402
BibRef
Zhao, C.R.[Cai-Rong],
Wang, Y.[Yubin],
Jiang, X.Y.[Xin-Yang],
Shen, Y.F.[Yi-Fei],
Song, K.[Kaitao],
Li, D.S.[Dong-Sheng],
Miao, D.Q.[Duo-Qian],
Learning Domain Invariant Prompt for Vision-Language Models,
IP(33), 2024, pp. 1348-1360.
IEEE DOI
2402
Task analysis, Tuning, Training, Adaptation models, Visualization,
Image color analysis, Self-supervised learning, Prompt learning,
domain generalization
BibRef
Yang, X.F.[Xiao-Feng],
Liu, F.[Fayao],
Lin, G.S.[Guo-Sheng],
Neural Logic Vision Language Explainer,
MultMed(26), 2024, pp. 3331-3340.
IEEE DOI
2402
Cognition, Logic programming, Deep learning, Visualization,
Data models, Training, Markov processes,
vision language pretraining
BibRef
Wang, Y.D.[Yi-Dong],
Yu, Z.O.[Zhu-Ohao],
Wang, J.D.[Jin-Dong],
Heng, Q.[Qiang],
Chen, H.[Hao],
Ye, W.[Wei],
Xie, R.[Rui],
Xie, X.[Xing],
Zhang, S.K.[Shi-Kun],
Exploring Vision-Language Models for Imbalanced Learning,
IJCV(132), No. 1, January 2024, pp. 224-237.
Springer DOI
2402
BibRef
Zeng, Y.[Yan],
Zhang, X.[Xinsong],
Li, H.[Hang],
Wang, J.W.[Jia-Wei],
Zhang, J.P.[Ji-Peng],
Zhou, W.[Wangchunshu],
X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks,
PAMI(46), No. 5, May 2024, pp. 3156-3168.
IEEE DOI
2404
Task analysis, Visualization, Transformers, Detectors, Training,
Feature extraction, Image coding,
vision language pre-training
BibRef
Kong, D.[Daehyeon],
Kong, K.[Kyeongbo],
Kang, S.J.[Suk-Ju],
Image clustering using generated text centroids,
SP:IC(125), 2024, pp. 117128.
Elsevier DOI
2405
Deep neural network, Image clustering, Multimodal task, Vision-language model
BibRef
Chen, X.Y.[Xian-Yu],
Yang, J.H.[Jin-Hui],
Chen, S.[Shi],
Wang, L.[Louis],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Every Problem, Every Step, All in Focus: Learning to Solve
Vision-Language Problems With Integrated Attention,
PAMI(46), No. 7, July 2024, pp. 4720-4735.
IEEE DOI
2406
Problem-solving, Task analysis, Visualization, Measurement,
Graph neural networks, Cognition, Videos, Graph attention,
vision-language problem solving
BibRef
Menon, S.[Sachit],
Chandratreya, I.P.[Ishaan Preetam],
Vondrick, C.[Carl],
Task Bias in Contrastive Vision-Language Models,
IJCV(132), No. 6, June 2024, pp. 2026-2040.
Springer DOI
2406
BibRef
Zhang, J.Y.[Jing-Yi],
Huang, J.X.[Jia-Xing],
Jin, S.[Sheng],
Lu, S.J.[Shi-Jian],
Vision-Language Models for Vision Tasks: A Survey,
PAMI(46), No. 8, August 2024, pp. 5625-5644.
IEEE DOI
2407
Task analysis, Visualization, Training, Deep learning, Surveys,
Data models, Predictive models, Big Data, big model, deep learning,
image classification
BibRef
Dong, M.P.[Meng-Ping],
Li, F.[Fei],
Li, Z.B.[Zhen-Bo],
Liu, X.[Xue],
Cluster prototype earth mover's distance adapters and
alignment-guided prompt learning for vision-language models,
PR(156), 2024, pp. 110861.
Elsevier DOI
2408
Cluster prototype, Earth mover's distance, Adapter,
Prompt learning, Vision-language models
BibRef
Liu, Y.[Ye],
Pan, Y.[Yan],
Yin, J.[Jian],
Enhancing Multi-Label Deep Hashing for Image and Audio With Joint
Internal Global Loss Constraints and Large Vision-Language Model,
SPLetters(31), 2024, pp. 2550-2554.
IEEE DOI
2410
Codes, Transformers, Adaptation models, Training,
Convolutional neural networks, Feature extraction,
vision transformer
BibRef
Zhan, C.L.[Chen-Lu],
Zhang, Y.F.[Yu-Fei],
Lin, Y.[Yu],
Wang, G.A.[Gao-Ang],
Wang, H.W.[Hong-Wei],
UniDCP: Unifying Multiple Medical Vision-Language Tasks via Dynamic
Cross-Modal Learnable Prompts,
MultMed(26), 2024, pp. 9736-9748.
IEEE DOI
2410
Task analysis, Adaptation models, Visualization,
Medical diagnostic imaging, Tuning, Multitasking, Plastics,
cross-modal shareable space
BibRef
Su, K.[Ke],
Zhang, X.X.[Xing-Xing],
Zhang, S.Y.[Si-Yang],
Zhu, J.[Jun],
Zhang, B.[Bo],
To Boost Zero-Shot Generalization for Embodied Reasoning With
Vision-Language Pre-Training,
IP(33), 2024, pp. 5370-5381.
IEEE DOI
2410
Cognition, Visualization, Artificial intelligence, Training,
Image reconstruction, Navigation, vision-language pre-training
BibRef
Xuan, S.Y.[Shi-Yu],
Yang, M.[Ming],
Zhang, S.L.[Shi-Liang],
Adapting Vision-Language Models via Learning to Inject Knowledge,
IP(33), 2024, pp. 5798-5809.
IEEE DOI
2410
Feature extraction, Visualization, Adaptation models, Tuning,
Training, Transformers, Dogs, Accuracy, Robustness, Few shot learning,
knowledge injection
BibRef
Zhou, W.[Wenlve],
Zhou, Z.H.[Zhi-Heng],
Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training,
CirSysVideo(34), No. 9, September 2024, pp. 8201-8214.
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Task analysis, Training, Computational modeling, Tuning,
Data models, Visualization, Unsupervised domain adaptation, model deployment
BibRef
Guo, M.H.[Meng-Hao],
Zhang, Y.[Yi],
Mu, T.J.[Tai-Jiang],
Huang, S.X.[Sharon X.],
Hu, S.M.[Shi-Min],
Tuning Vision-Language Models With Multiple Prototypes Clustering,
PAMI(46), No. 12, December 2024, pp. 11186-11199.
IEEE DOI
2411
Prototypes, Adaptation models, Tuning, Visualization,
Benchmark testing, Computational modeling, Data models, clustering
BibRef
Sun, B.[Bo],
Wu, Z.C.[Zhi-Chao],
Zhang, H.[Hao],
He, J.[Jun],
VTPL: Visual and text prompt learning for visual-language models,
JVCIR(104), 2024, pp. 104280.
Elsevier DOI
2411
V-L models, Prompt learning, Visual and text prompts,
Poly-1 information NCE loss, Center loss
BibRef
Liu, L.C.[Liang-Chen],
Wang, N.N.[Nan-Nan],
Liu, D.[Decheng],
Yang, X.[Xi],
Gao, X.B.[Xin-Bo],
Liu, T.L.[Tong-Liang],
Towards Specific Domain Prompt Learning via Improved Text Label
Optimization,
MultMed(26), 2024, pp. 10805-10815.
IEEE DOI
2411
Visualization, Optimization, Semantics, Task analysis, Terminology,
Learning systems, Adaptation models, vision-language model
BibRef
Liu, X.[Xin],
Wu, J.[Jiamin],
Yang, W.F.[Wen-Fei],
Zhou, X.[Xu],
Zhang, T.Z.[Tian-Zhu],
Multi-Modal Attribute Prompting for Vision-Language Models,
CirSysVideo(34), No. 11, November 2024, pp. 11579-11591.
IEEE DOI
2412
Visualization, Task analysis, Semantics, Adaptation models,
Integrated circuit modeling, Vectors,
attribute
BibRef
Jiang, H.J.[Hao-Jun],
Zhang, J.K.[Jian-Ke],
Huang, R.[Rui],
Ge, C.J.[Chun-Jiang],
Ni, Z.[Zanlin],
Song, S.[Shiji],
Huang, G.[Gao],
Cross-modal adapter for vision-language retrieval,
PR(159), 2025, pp. 111144.
Elsevier DOI
2412
Adapter, Cross-modal interaction, Cross-modal retrieval,
Parameter-efficient training, Multi-modal learning
BibRef
Yellinek, N.[Nir],
Karlinsky, L.[Leonid],
Giryes, R.[Raja],
3VL: Using Trees to Improve Vision-Language Models' Interpretability,
IP(34), 2025, pp. 495-509.
IEEE DOI
2501
aligning image and text representations.
Random forests, Visualization, Training, Cognition, Feature extraction,
Transformers, Forestry, Animals, compositional reasoning
BibRef
Yang, L.F.[Ling-Feng],
Li, X.[Xiang],
Wang, Y.Z.[Yue-Ze],
Wang, X.L.[Xin-Long],
Yang, J.[Jian],
Fine-Grained Visual Text Prompting,
PAMI(47), No. 3, March 2025, pp. 1594-1609.
IEEE DOI
2502
What kind of visual prompts to add.
Visualization, Semantics, Image segmentation, Crops, Tuning, Detectors,
Proposals, Location awareness, Grounding, Gray-scale, zero-shot
BibRef
Wang, F.[Fan],
Han, Z.Y.[Zhong-Yi],
Liu, X.[Xingbo],
Yin, Y.L.[Yi-Long],
Gao, X.[Xin],
CTPT: Continual Test-time Prompt Tuning for vision-language models,
PR(161), 2025, pp. 111300.
Elsevier DOI
2502
Test-time adaptation,
Contrastive Language-Image Pretraining (CLIP),
Stable self-learning
BibRef
Liang, N.[Nanhao],
Liu, Y.[Yong],
DPO: Discrete Prompt Optimization for Vision-Language Models,
SPLetters(32), 2025, pp. 671-675.
IEEE DOI
2502
Training, Optimization, Adaptation models, Visualization,
Overfitting, Vectors, Vocabulary, Signal processing algorithms,
vision-language model
BibRef
Ondeng, O.[Oscar],
Ouma, H.[Heywood],
Akuon, P.[Peter],
Enriching visual feature representations for vision-language tasks
using spectral transforms,
IVC(154), 2025, pp. 105390.
Elsevier DOI
2502
Visual feature enrichment, Transformers, Image captioning,
Discrete Fourier Transform, MS COCO, Kylberg dataset, Diversity
BibRef
Xu, C.[Chen],
Zhu, Y.H.[Yu-Han],
Shen, H.C.[Hao-Cheng],
Chen, B.H.[Bo-Heng],
Liao, Y.X.[Yi-Xuan],
Chen, X.X.[Xiao-Xin],
Wang, L.M.[Li-Min],
Progressive Visual Prompt Learning with Contrastive Feature
Re-formation,
IJCV(133), No. 2, February 2025, pp. 511-526.
Springer DOI
2502
Adapting the pre-trained Vision-Language Models.
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Yuan, J.K.[Jun-Kun],
Tan, Z.C.[Zi-Chang],
Liu, J.J.[Jiang-Jiang],
Feng, J.Y.[Jing-Yuan],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Mutual Prompt Leaning for Vision Language Models,
IJCV(133), No. 3, March 2025, pp. 1258-1276.
Springer DOI
2502
BibRef
Yin, J.H.[Jun-Hui],
Zhang, X.Y.[Xin-Yu],
Wu, L.[Lin],
Wang, X.J.[Xiao-Jie],
Context-aware prompt learning for test-time vision recognition with
frozen vision-language model,
PR(162), 2025, pp. 111359.
Elsevier DOI Code:
WWW Link.
2503
In-context learning, Prompt learning, Vision-language model,
Vision recognition, Test-time adaptation
BibRef
Chen, Y.[Yeming],
Zhang, S.[Siyu],
Sun, Y.[Yaoru],
Yang, J.[Jun],
Liang, W.J.[Wei-Jian],
Wang, H.R.[Hao-Ran],
Artificial-Spiking Hierarchical Networks for Vision-Language
Representation Learning,
CirSysVideo(35), No. 3, March 2025, pp. 2768-2781.
IEEE DOI Code:
WWW Link.
2503
Visualization, Semantics, Computational modeling, Transformers,
Feature extraction, Object detection,
multimodal alignment
BibRef
Li, B.Z.[Bin-Zhe],
Wang, S.R.[Shu-Run],
Wang, S.Q.[Shi-Qi],
Ye, Y.[Yan],
High Efficiency Image Compression for Large Visual-Language Models,
CirSysVideo(35), No. 3, March 2025, pp. 2870-2880.
IEEE DOI
2503
Image coding, Visualization, Machine vision, Codecs, Semantics,
Standards, Image reconstruction, Bit rate, pre-editing process
BibRef
Liu, L.C.[Liang-Chen],
Wang, N.N.[Nan-Nan],
Zhou, D.W.[Da-Wei],
Liu, D.C.[De-Cheng],
Yang, X.[Xi],
Gao, X.B.[Xin-Bo],
Liu, T.L.[Tong-Liang],
Generalizable Prompt Learning via Gradient Constrained
Sharpness-Aware Minimization,
MultMed(27), 2025, pp. 1100-1113.
IEEE DOI
2503
Improving the performance on unseen classes while maintaining the
performance on seen classes.
Optimization, Minimization, Visualization, Training, Degradation,
Vectors, Telecommunications, Intserv networks, Geometry,
sharpness-aware minimization
BibRef
Lu, Z.[Zhihe],
Bai, J.[Jiawang],
Li, X.[Xin],
Xiao, Z.[Zeyu],
Wang, X.C.[Xin-Chao],
Task-to-Instance Prompt Learning for Vision-Language Models at Test
Time,
IP(34), 2025, pp. 1908-1920.
IEEE DOI Code:
WWW Link.
2504
Training, Training data, Visualization, Adaptation models, Learning systems,
Image recognition, Dogs, Vectors, Entropy, task-to-instance
BibRef
Fang, Z.Q.[Zheng-Qing],
Yuan, Z.H.[Zhou-Hang],
Li, Z.Y.[Zi-Yu],
Chen, J.Y.[Jing-Yuan],
Kuang, K.[Kun],
Yao, Y.F.[Yu-Feng],
Wu, F.[Fei],
Cross-Modality Image Interpretation via Concept Decomposition Vector
of Visual-Language Models,
CirSysVideo(35), No. 4, April 2025, pp. 3024-3038.
IEEE DOI
2504
Visualization, Vectors, Semantics, Training, Image representation,
Task analysis, visual-language models
BibRef
Ramzi, E.[Elias],
Audebert, N.[Nicolas],
Rambour, C.[Clément],
Araujo, A.[André],
Bitot, X.[Xavier],
Thome, N.[Nicolas],
Optimization of Rank Losses for Image Retrieval,
PAMI(47), No. 6, June 2025, pp. 4317-4329.
IEEE DOI
2505
Training, Image retrieval, Measurement, Standards, Data mining,
Artificial intelligence, Loss measurement, non-decomposable
BibRef
Lafon, M.[Marc],
Ramzi, E.[Elias],
Rambour, C.[Clément],
Audebert, N.[Nicolas],
Thome, N.[Nicolas],
Gallop: Learning Global and Local Prompts for Vision-language Models,
ECCV24(LXI: 264-282).
Springer DOI
2412
BibRef
Liu, K.C.[Kang-Cheng],
Wang, C.Q.[Chao-Qun],
Han, X.D.[Xiao-Dong],
Liu, Y.J.[Yong-Jin],
Chen, B.Q.[Bao-Quan],
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware
Contrast,
IJCV(133), No. 6, June 2025, pp. 3481-3518.
Springer DOI
2505
BibRef
And:
Correction:
IJCV(133), No. 7, July 2025, pp. 4971-4971.
Springer DOI
2506
BibRef
Yang, L.X.[Ling-Xiao],
Zhang, R.Y.[Ru-Yuan],
Chen, Q.[Qi],
Xie, X.H.[Xiao-Hua],
Learning with Enriched Inductive Biases for Vision-Language Models,
IJCV(133), No. 6, June 2025, pp. 3746-3761.
Springer DOI
2505
BibRef
Yao, H.T.[Han-Tao],
Zhang, R.[Rui],
Lyu, H.H.[Huai-Hai],
Zhang, Y.D.[Yong-Dong],
Xu, C.S.[Chang-Sheng],
Bi-Modality Individual-Aware Prompt Tuning for Visual-Language Model,
PAMI(47), No. 8, August 2025, pp. 6352-6368.
IEEE DOI
2507
BibRef
Earlier: A1, A2, A5, Only:
TCP: Textual-Based Class-Aware Prompt Tuning for Visual-Language
Model,
CVPR24(23438-23448)
IEEE DOI Code:
WWW Link.
2410
Tuning, Visualization, Training, Adaptation models, Hands,
Feature extraction, Data models, Artificial intelligence,
visual-language model.
Benchmark testing.
BibRef
Hao, Z.W.[Zhi-Wei],
Guo, J.Y.[Jian-Yuan],
Shen, L.[Li],
Luo, Y.[Yong],
Hu, H.[Han],
Wen, Y.G.[Yong-Gang],
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
Tuning,
IJCV(133), No. 8, August 2025, pp. 5527-5543.
Springer DOI
2508
BibRef
Zeng, R.F.[Rong-Fei],
Yang, Z.P.[Zhi-Peng],
Yu, R.Y.[Rui-Yun],
Zhang, Y.G.[Yong-Gang],
Supplementary Prompt Learning for Vision-Language Models,
IJCV(133), No. 8, August 2025, pp. 5822-5839.
Springer DOI
2508
BibRef
Liu, K.C.[Kang-Cheng],
Liu, Y.J.[Yong-Jin],
Chen, B.Q.[Bao-Quan],
General 3D Vision-Language Model With Fast Rendering and Pre-Training
Vision-Language Alignment,
PAMI(47), No. 9, September 2025, pp. 7352-7368.
IEEE DOI
2508
Point cloud compression, Semantics, Training, Solid modeling,
Contrastive learning, Data mining, Visualization,
3D vision-language model
BibRef
Gao, Y.S.[Yan-Sheng],
Zhu, Z.X.[Zi-Xi],
Wang, S.S.[Sheng-Sheng],
Mixture of coarse and fine-grained prompt tuning for vision-language
model,
PR(170), 2026, pp. 112074.
Elsevier DOI
2509
Prompt learning, Vision-language models,
Coarse domain-shared information,
BibRef
Hao, F.S.[Fu-Sheng],
Liu, L.[Liu],
Wu, F.X.[Fu-Xiang],
Zhang, Q.S.[Qie-Shi],
Cheng, J.[Jun],
Textual Embeddings are Good Class-Aware Visual Prompts for Adapting
Vision-Language Models,
SPLetters(32), 2025, pp. 2992-2996.
IEEE DOI
2509
Visualization, Tuning, Semantics, Harmonic analysis, Accuracy,
Optimization, Artificial intelligence, Vectors, Training, TV,
class-aware visual prompts
BibRef
Liu, J.[Jun],
Lu, Z.Q.[Zi-Qian],
Luo, H.[Hao],
Lu, Z.M.[Zhe-Ming],
Zheng, Y.M.[Yang-Ming],
Progressive Multi-Prompt Learning for Vision-Language Models,
CirSysVideo(35), No. 10, October 2025, pp. 9562-9574.
IEEE DOI Code:
WWW Link.
2510
Visualization, Overfitting, Optimization, Training, Semantics,
Feature extraction, Dogs, Accuracy, Tuning, Transfer learning, few-shot
BibRef
Wang, W.X.[Wen-Xuan],
He, X.J.[Xing-Jian],
Zhang, Y.[Yisi],
Guo, L.T.[Long-Teng],
Shen, J.C.[Jia-Chen],
Li, J.Y.[Jiang-Yun],
Liu, J.[Jing],
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring
Image Segmentation,
MultMed(26), 2024, pp. 6906-6916.
IEEE DOI
2405
Image segmentation, Visualization, Task analysis, Correlation,
Feature extraction, Transformers, Semantics, vision and language
BibRef
Zhang, E.[Enming],
Zhu, B.[Bingke],
Chen, Y.Y.[Ying-Ying],
Miao, Q.H.[Qing-Hai],
Tang, M.[Ming],
Wang, J.Q.[Jin-Qiao],
Optimization of Prompt Learning via Multi-Knowledge Representation
for Vision-Language Models,
MultMed(27), 2025, pp. 7557-7569.
IEEE DOI
2510
Visualization, Tuning, Training, Birds, Semantics, Image recognition,
Artificial intelligence, Airplanes, Marine vehicles, multi-knowledge
BibRef
Park, K.Y.[Kwan-Yong],
An, S.[Sojung],
Lee, Y.J.[Yong Jae],
Kim, D.H.[Dong-Hyun],
Learning Compositionality from Multifaceted Synthetic Data for
Language-based Object Detection,
IJCV(133), No. 11, November 2025, pp. 7873-7896.
Springer DOI
2511
BibRef
Park, K.Y.[Kwan-Yong],
Saito, K.[Kuniaki],
Kim, D.H.[Dong-Hyun],
Weak-to-strong Compositional Learning from Generative Models for
Language-based Object Detection,
ECCV24(XXIII: 1-19).
Springer DOI
2412
BibRef
Sarto, S.[Sara],
Moratelli, N.[Nicholas],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Positive-Augmented Contrastive Learning for Vision-and-Language
Evaluation and Training,
IJCV(133), No. 11, November 2025, pp. 7647-7671.
Springer DOI
2511
BibRef
Stefanini, M.[Matteo],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
A Novel Attention-based Aggregation Function to Combine Vision and
Language,
ICPR21(1212-1219)
IEEE DOI
2105
Deep learning, Visualization, Image retrieval,
Transforms, Knowledge discovery
BibRef
Liu, L.C.[Liang-Chen],
Wang, N.N.[Nan-Nan],
Chen, C.[Chen],
Liu, D.[Decheng],
Yang, X.[Xi],
Gao, X.B.[Xin-Bo],
Liu, T.L.[Tong-Liang],
Frequency-Based Comprehensive Prompt Learning for Vision-Language
Models,
PAMI(47), No. 12, December 2025, pp. 11974-11989.
IEEE DOI
2511
Visualization, Feature extraction, Frequency-domain analysis,
Transformers, Discrete cosine transforms, Frequency diversity,
transfer learning
BibRef
Li, J.C.[Jun-Cheng],
Gao, M.[Minghe],
Tang, S.L.[Si-Liang],
Wei, L.H.[Long-Hui],
Xiao, J.[Jun],
Wu, F.[Fei],
Hong, R.C.[Ri-Chang],
Wang, M.[Meng],
Tian, Q.[Qi],
Structure-Induced Gradient Regulation for Generalizable
Vision-Language Models,
PAMI(48), No. 1, January 2026, pp. 219-235.
IEEE DOI
2512
Tuning, Metalearning, Adaptation models, Training, Semantics, Testing,
Visualization, Prototypes, Vectors, Overfitting,
vision-language pre-training models
BibRef
Li, J.C.[Jun-Cheng],
Gao, M.[Minghe],
Wei, L.H.[Long-Hui],
Tang, S.L.[Si-Liang],
Zhang, W.Q.[Wen-Qiao],
Li, M.Z.[Meng-Ze],
Ji, W.[Wei],
Tian, Q.[Qi],
Chua, T.S.[Tat-Seng],
Zhuang, Y.T.[Yue-Ting],
Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models,
ICCV23(2551-2562)
IEEE DOI
2401
BibRef
Xiao, Y.S.[Yi-Song],
Liu, X.L.[Xiang-Long],
Cheng, Q.J.[Qian-Jia],
Yin, Z.F.[Zhen-Fei],
Liang, S.Y.[Si-Yuan],
Li, J.P.[Jia-Peng],
Shao, J.[Jing],
Liu, A.S.[Ai-Shan],
Tao, D.C.[Da-Cheng],
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via
Counterfactual Probing,
IJCV(133), No. 12, December 2025, pp. 8332-8355.
Springer DOI
2512
BibRef
Chen, T.Y.[Tian-Yang],
Ai, J.L.[Jian-Liang],
Hierarchical Prompt Engineering for Remote Sensing Scene
Understanding with Large Vision-Language Models,
RS(17), No. 22, 2025, pp. 3727.
DOI Link
2512
BibRef
Xu, X.[Xiao],
Qin, L.[Libo],
Che, W.[Wanxiang],
Kan, M.Y.[Min-Yen],
Manager: Aggregating Insights From Unimodal Experts in Two-Tower VLMs
and MLLMs,
CirSysVideo(35), No. 12, December 2025, pp. 12278-12291.
IEEE DOI Code:
WWW Link.
2512
Visualization, Aggregates, Semantics, Meters, Feeds, Indexes,
Large language models, Image resolution, Vision-Language model,
representation learning
BibRef
Kim, G.[Gahyeon],
Kim, S.[Sohee],
Lee, S.[Seokju],
Decoupling augmentation bias in prompt learning for vision-language
models,
PR(172), 2026, pp. 112630.
Elsevier DOI Code:
WWW Link.
2601
BibRef
Earlier:
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models,
Prompting24(1572-1582)
IEEE DOI
2410
Prompt learning, Vision-language models, Image augmentation,
Adversarial learning loss, Few-shot classification, Domain generalization.
Visualization, Zero-shot learning, Semantics, Focusing,
Feature extraction, Data augmentation, Vectors, VLMs
BibRef
Guo, Y.C.[Yun-Cheng],
Gu, X.D.[Xiao-Dong],
MMRL++: Parameter-Efficient and Interaction-Aware Representation
Learning for Vision-Language Models,
IJCV(134), No. 1, January 2026, pp. 11.
Springer DOI
2601
BibRef
Earlier:
MMRL: Multi-Modal Representation Learning for Vision-Language Models,
CVPR25(25015-25025)
IEEE DOI Code:
WWW Link.
2508
Representation learning, Training, Adaptation models, Codes,
Transfer learning, Image representation, Data models, Overfitting
BibRef
Ye, W.X.[Wei-Xin],
Wang, W.[Wei],
Liu, Y.H.[Ya-Hui],
Song, Y.[Yue],
Ren, B.[Bin],
Bi, W.[Wei],
Cucchiara, R.[Rita],
Sebe, N.[Nicu],
A Unified Masked Jigsaw Puzzle Framework for Vision and Language
Models,
PAMI(48), No. 2, February 2026, pp. 1873-1887.
IEEE DOI
2601
Transformers, Privacy, Natural language processing,
Principal component analysis, Computational modeling, Training,
position embedding
BibRef
Wang, Z.Y.[Zi-Yan],
Liu, L.[Lei],
Wan, G.[Gang],
Lu, Y.C.[Yu-Chen],
Zheng, F.J.[Feng-Jie],
Sun, G.[Guangde],
Huang, Y.X.[Yi-Xiang],
Guo, S.H.[Shi-Hao],
Li, X.[Xinyi],
Yuan, L.[Liang],
SAREval: A Multi-Dimensional and Multi-Task Benchmark for Evaluating
Visual Language Models on SAR Image Understanding,
RS(18), No. 1, 2026, pp. 82.
DOI Link
2601
BibRef
Wu, J.F.[Jun-Feng],
Jiang, Y.[Yi],
Ma, C.F.[Chuo-Fan],
Liu, Y.L.[Yu-Liang],
Zhao, H.S.[Heng-Shuang],
Yuan, Z.H.[Ze-Huan],
Bai, S.[Song],
Bai, X.[Xiang],
Liquid: Language Models are Scalable and Unified Multi-Modal Generators,
IJCV(134), No. 1, January 2026, pp. 39.
Springer DOI
2601
Code:
WWW Link.
BibRef
Su, Y.L.[Yu-Ling],
Liu, X.L.[Xue-Liang],
Huang, Z.[Zhen],
Zhao, Y.W.[Yun-Wei],
Hong, R.C.[Ri-Chang],
Wang, M.[Meng],
AttriPrompt: Class Attribute-Aware Prompt Tuning for Vision-Language
Model,
IP(35), 2026, pp. 1395-1407.
IEEE DOI
2602
Tuning, Semantics, Visualization, Adaptation models, Head,
Legged locomotion, Data models, Training, Standards, Prompt tuning,
vision-language models
BibRef
Li, Y.W.[Yan-Wei],
Zhang, Y.C.[Yue-Chen],
Wang, C.Y.[Cheng-Yao],
Zhong, Z.S.[Zhi-Sheng],
Chen, Y.X.[Yi-Xin],
Chu, R.[Ruihang],
Liu, S.[Shaoteng],
Jia, J.Y.[Jia-Ya],
Mini-Gemini: Mining the Potential of Multi-Modality Vision Language
Models,
PAMI(48), No. 3, March 2026, pp. 3530-3543.
IEEE DOI
2602
Visualization, Cognition, Benchmark testing, Data models,
Image resolution, Training, TV, Generative Pre-trained transformer,
Generative Model
BibRef
Xu, N.[Nuo],
Yao, K.[Kelu],
Yang, R.[Rong],
Li, C.[Chao],
Visual-language active search for wide-area remote sensing imagery,
PR(175), 2026, pp. 113106.
Elsevier DOI Code:
WWW Link.
2603
Active search, Multimodality, Reinforcement learning, Graph neural network
BibRef
Chen, Y.[Yang],
Fu, S.[Shuai],
Zhang, Y.[Yu],
MoPD: Mixture-of-Prompts Distillation for Vision-Language Models,
MultMed(28), 2026, pp. 1943-1954.
IEEE DOI
2603
Visualization, Vectors, Learning systems, Training data,
Adaptation models, Noise measurement, Knowledge engineering,
vision-language models
BibRef
Qi, Y.[Yayun],
Li, H.X.[Hong-Xi],
Song, Y.Q.[Yi-Qi],
Wu, X.X.[Xin-Xiao],
Luo, J.B.[Jie-Bo],
How Vision-Language Tasks Benefit From Large Pre-Trained Models:
A Survey,
MultMed(28), 2026, pp. 1188-1210.
IEEE DOI
2603
Surveys, Visualization, Cognition, Data models, Videos, Training data,
Question answering (information retrieval),
large language model
BibRef
Lee, J.J.[Jae Joong],
Language-guided invariance probing of vision-language models,
PRL(202), 2026, pp. 108-113.
Elsevier DOI
2603
Vision-language models, Prompt robustness, Paraphrase invariance,
Semantic sensitivity, Hard negatives, Image-text similarity
BibRef
Valois, P.H.V.[Pedro H. V.],
Satav, D.[Dipesh],
de Campos, R.A.P.[Rodrigo A. P.],
Pratamasunu, G.Q.O.[Gulpi Q. O.],
Fukui, K.[Kazuhiro],
Vision Language Model Interpretability with Concept Guided Decoding,
ICIP25(397-402)
IEEE DOI
2601
Deep learning, Training, Visualization, Adaptation models,
Analytical models, Toxicology, Systematics, Decoding, Security,
Jailbreak
BibRef
Saravanan, D.[Darshana],
Tapaswi, M.[Makarand],
Gandhi, V.[Vineet],
Investigating Mechanisms for In-Context Vision Language Binding,
InterpVis25(4852-4856)
IEEE DOI
2512
Solid modeling, Shape, Computational modeling,
Toy manufacturing industry, Vectors, Synthetic data, VLM, In-context Binding
BibRef
Selvam, S.[Surya],
Rajendran, R.K.[Ravi K.],
Sankaradas, M.[Murugan],
Raghunathan, A.[Anand],
Chakradhar, S.T.[Srimat T.],
SimCache: Similarity Caching for Efficient VLM-based Scene
Understanding,
LargeVM25(3318-3327)
IEEE DOI
2512
Training, Visualization, Accuracy, Redundancy, Semantics,
Memory management, Throughput, Real-time systems, Videos
BibRef
Tushar, P.[Pranav],
Pandey, E.[Eshan],
Austria, L.D.B.[Lyka Diane Bala],
Loo, Y.Y.[Yin Yin],
Lim, J.H.[Jing Hao],
Atmosukarto, I.[Indriyati],
Lock, D.S.C.[Donny Soh Cheng],
MerCulture: A Comprehensive Benchmark to Evaluate Vision-Language
Models on Cultural Understanding in Singapore,
AIBench25(565-574)
IEEE DOI
2512
Measurement, Visualization, Grounding, Education, Training data,
Benchmark testing, Multilingual, Cultural differences, application
BibRef
Ma, Z.Y.[Zi-Yu],
Gou, C.[Chenhui],
Shi, H.[Hengcan],
Sun, B.[Bin],
Li, S.T.[Shu-Tao],
Rezatofighi, H.[Hamid],
Cai, J.F.[Jian-Fei],
DrVideo: Document Retrieval Based Long Video Understanding,
CVPR25(18936-18946)
IEEE DOI Code:
WWW Link.
2508
Codes, Large language models, Transforms, Benchmark testing,
Cognition, Iterative methods, Videos, long video understanding,
vision and language
BibRef
Dhouib, M.[Mohamed],
Buscaldi, D.[Davide],
Vanier, S.[Sonia],
Shabou, A.[Aymen],
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual
Language Models,
CVPR25(14582-14592)
IEEE DOI
2508
Connectors, Training, Measurement, Visualization,
Computational modeling, Redundancy, Merging, Oral communication
BibRef
Yu, C.[Chong],
Chen, T.[Tao],
Gan, Z.X.[Zhong-Xue],
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple
Vision-Language Model Variants,
CVPR25(14712-14722)
IEEE DOI
2508
Training, Adaptation models, Accuracy, Tensors, Memory management,
Hardware, Model compression, Tuning, Optimization, dynamic expansion capability
BibRef
Hao, F.S.[Fu-Sheng],
He, F.X.[Feng-Xiang],
Wu, F.[Fuxiang],
Wang, T.[Tichao],
Song, C.Q.[Cheng-Qun],
Cheng, J.[Jun],
Task-Aware Clustering for Prompting Vision-Language Models,
CVPR25(14745-14755)
IEEE DOI Code:
WWW Link.
2508
Adaptation models, Visualization, Attention mechanisms, Codes,
Interference, Benchmark testing, Optimization, Overfitting
BibRef
Koleilat, T.[Taha],
Asgariandehkordi, H.[Hojat],
Rivaz, H.[Hassan],
Xiao, Y.M.[Yi-Ming],
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models,
CVPR25(14766-14776)
IEEE DOI Code:
WWW Link.
2508
Representation learning, Adaptation models, Visualization,
Accuracy, Biological system modeling, Semantics,
vision-language models
BibRef
Nath, V.[Vishwesh],
Li, W.Q.[Wen-Qi],
Yang, D.[Dong],
Myronenko, A.[Andriy],
Zheng, M.X.[Ming-Xin],
Lu, Y.[Yao],
Liu, Z.J.[Zhi-Jian],
Yin, H.X.[Hong-Xu],
Law, Y.M.[Yee Man],
Tang, Y.C.[Yu-Cheng],
Guo, P.F.[Peng-Fei],
Zhao, C.[Can],
Xu, Z.Y.[Zi-Yue],
He, Y.F.[Yu-Fan],
Harmon, S.[Stephanie],
Simon, B.[Benjamin],
Heinrich, G.[Greg],
Aylward, S.[Stephen],
Edgar, M.[Marc],
Zephyr, M.[Michael],
Molchanov, P.[Pavlo],
Turkbey, B.[Baris],
Roth, H.[Holger],
Xu, D.[Daguang],
VILA-M3: Enhancing Vision-Language Models with Medical Expert
Knowledge,
CVPR25(14788-14798)
IEEE DOI
2508
Deep learning, Computational modeling, Medical services,
Feature extraction, Data models, Reliability, Tumors, radiology
BibRef
Du, H.[Hao],
Wu, B.[Bo],
Lu, Y.[Yan],
Mao, Z.D.[Zhen-Dong],
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic
Video Situation,
CVPR25(13798-13809)
IEEE DOI
2508
Measurement, Visualization, Filtering, Statistical analysis,
Pipelines, Benchmark testing, Videos
BibRef
Kaduri, O.[Omri],
Bagon, S.[Shai],
Dekel, T.[Tali],
What's in the Image? A Deep-Dive into the Vision of Vision Language
Models,
CVPR25(14549-14558)
IEEE DOI
2508
Visualization, Analytical models, Image coding, Focusing,
Data models, Data mining, Videos
BibRef
Xing, L.[Long],
Huang, Q.D.[Qi-Dong],
Dong, X.Y.[Xiao-Yi],
Lu, J.J.[Jia-Jie],
Zhang, P.[Pan],
Zang, Y.H.[Yu-Hang],
Cao, Y.H.[Yu-Hang],
He, C.H.[Cong-Hui],
Wang, J.Q.[Jia-Qi],
Wu, F.[Feng],
Lin, D.[Dahua],
Conical Visual Concentration for Efficient Large Vision-Language
Models,
CVPR25(14593-14603)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Costs, Codes, Redundancy, Boosting,
large vision language model, efficient training, efficient inference
BibRef
Zhang, L.[Le],
Yang, Q.[Qian],
Agrawal, A.[Aishwarya],
Assessing and Learning Alignment of Unimodal Vision and Language
Models,
CVPR25(14604-14614)
IEEE DOI
2508
Training, Translation, Computational modeling,
Semantic segmentation, Transfer learning, Object recognition
BibRef
Sehgal, A.[Atharva],
Yuan, P.[Patrick],
Hu, Z.[Ziniu],
Yue, Y.S.[Yi-Song],
Sun, J.J.[Jennifer J.],
Chaudhuri, S.[Swarat],
Self-Evolving Visual Concept Library using Vision-Language Critics,
CVPR25(13124-13134)
IEEE DOI
2508
Visualization, Annotations, Buildings, Manuals, Libraries, Cognition, History,
Few shot learning, program synthesis, visual programming, library learning
BibRef
Wang, W.H.[Wei-Han],
Wang, L.[Lefan],
Gu, X.T.[Xiao-Tao],
Huang, S.Y.[Shi-Yu],
Dong, Y.X.[Yu-Xiao],
Tang, J.[Jie],
MotionBench: Benchmarking and Improving Fine-Grained Video Motion
Understanding for Vision Language Models,
CVPR25(8450-8460)
IEEE DOI Code:
WWW Link.
2508
Visualization, Benchmark testing, Data models, Videos,
vision language model, fine-grained video motion understanding, benchmark
BibRef
Nacson, M.S.[Mor Shpigel],
Aberdam, A.[Aviad],
Ganz, R.[Roy],
Avraham, E.B.[Elad Ben],
Golts, A.[Alona],
Kittenplon, Y.[Yair],
Mazor, S.[Shai],
Litman, R.[Ron],
DocVLM: Make Your VLM an Efficient Reader,
CVPR25(29005-29015)
IEEE DOI
2508
Visualization, Image coding, Computational modeling,
Optical character recognition, Layout, Computational efficiency,
Text processing
BibRef
Alhamoud, K.[Kumail],
Alshammari, S.[Shaden],
Tian, Y.L.[Yong-Long],
Li, G.H.[Guo-Hao],
Torr, P.H.S.[Philip H.S.],
Kim, Y.[Yoon],
Ghassemi, M.[Marzyeh],
Vision-Language Models Do Not Understand Negation,
CVPR25(29612-29622)
IEEE DOI
2508
Training, Accuracy, Computational modeling, Natural languages,
Benchmark testing, Videos, Synthetic data, Biomedical imaging, benchmarks
BibRef
Schmalfuss, J.[Jenny],
Chang, N.[Nadine],
VS, V.[Vibashan],
Shen, M.[Maying],
Bruhn, A.[Andrés],
Alvarez, J.M.[Jose M.],
PARC: A Quantitative Framework Uncovering the Symmetries within
Vision Language Models,
CVPR25(25081-25091)
IEEE DOI Code:
WWW Link.
2508
Visualization, Analytical models, Sensitivity,
Sensitivity analysis, Computational modeling, Semantics,
prompt sensitivity
BibRef
Xiao, J.Q.[Jin-Qi],
Sang, S.[Shen],
Zhi, T.C.[Tian-Cheng],
Liu, J.[Jing],
Yan, Q.[Qing],
Luo, L.J.[Lin-Jie],
Yuan, B.[Bo],
COAP: Memory-Efficient Training with Correlation-Aware Gradient
Projection,
CVPR25(30116-30126)
IEEE DOI Code:
WWW Link.
2508
Training, Degradation, Quantization (signal),
Computational modeling, Neural networks, Flora,
vision language model
BibRef
Zhu, Y.Q.[Yi-Qi],
Wang, Z.Y.[Zi-Yue],
Zhang, C.[Can],
Li, P.[Peng],
Liu, Y.[Yang],
CoSpace: Benchmarking Continuous Space Perception Ability for
Vision-Language Models,
CVPR25(29569-29579)
IEEE DOI
2508
Visualization, Analytical models, Accuracy, Computational modeling,
Benchmark testing, Cognition, Image reconstruction,
continuous space perception
BibRef
Kang, H.Q.[Hao-Qiang],
Sachdeva, E.[Enna],
Gupta, P.[Piyush],
Bae, S.J.[Sang-Jae],
Lee, K.[Kwonjoon],
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models
with Generative Flow Networks,
CVPR25(3815-3825)
IEEE DOI Code:
WWW Link.
2508
Training, Decision making, Distributed databases,
Reinforcement learning, Games, Cognition, Planning, Optimization,
gflownets
BibRef
Chen, J.H.[Jiu-Hai],
Yang, J.W.[Jian-Wei],
Wu, H.P.[Hai-Ping],
Li, D.[Dianqi],
Gao, J.F.[Jian-Feng],
Zhou, T.Y.[Tian-Yi],
Xiao, B.[Bin],
Florence-VL: Enhancing Vision-Language Models with Generative Vision
Encoder and Depth-Breadth Fusion,
CVPR25(24928-24938)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Statistical analysis,
Computational modeling, Optical character recognition, Tuning
BibRef
Yang, C.Y.[Chen-Yu],
Dong, X.[Xuan],
Zhu, X.Z.[Xi-Zhou],
Su, W.J.[Wei-Jie],
Wang, J.H.[Jia-Hao],
Tian, H.[Hao],
Chen, Z.[Zhe],
Wang, W.H.[Wen-Hai],
Lu, L.W.[Le-Wei],
Dai, J.F.[Ji-Feng],
PVC: Progressive Visual Token Compression for Unified Image and Video
Processing in Large Vision-Language Models,
CVPR25(24939-24949)
IEEE DOI Code:
WWW Link.
2508
Visualization, Adaptation models, Image coding, Limiting, Redundancy,
Benchmark testing, Encoding, Data mining, Videos
BibRef
Zhang, K.[Kun],
Li, J.Y.[Jing-Yu],
Li, Z.[Zhe],
Zhou, S.K.[S. Kevin],
DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid
Set-Embeddings Learning,
CVPR25(24993-25003)
IEEE DOI
2508
Accuracy, Computational modeling, Semantics, Benchmark testing,
Computational efficiency, Complexity theory,
set-embeddings learning
BibRef
Zhu, B.[Beier],
Cui, J.[Jiequan],
Zhang, H.W.[Han-Wang],
Zhang, C.[Chi],
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness,
CVPR25(25487-25496)
IEEE DOI
2508
Training, Correlation, Foundation models, Null space, Robustness,
Probes, Faces, group robustness, vision-language models
BibRef
Li, H.Y.[Hao-Yang],
Wang, L.[Liang],
Wang, C.[Chao],
Jiang, J.[Jing],
Peng, Y.[Yan],
Long, G.D.[Guo-Dong],
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models,
CVPR25(25623-25632)
IEEE DOI Code:
WWW Link.
2508
Codes, Semantic segmentation, Collaboration, Cloning,
Object detection, Vectors, Optimization, Tuning, prompt tuning,
multi-modal learning
BibRef
Saravanan, D.[Darshana],
Gupta, V.[Varun],
Singh, D.[Darshan],
Khan, Z.[Zeeshan],
Gandhi, V.[Vineet],
Tapaswi, M.[Makarand],
VELOCITI: Benchmarking Video-Language Compositional Reasoning with
Strict Entailment,
CVPR25(18914-18924)
IEEE DOI
2508
Visualization, Accuracy, Benchmark testing, Cognition, Videos,
video language benchmark
BibRef
Pan, B.[Bikang],
Li, Q.[Qun],
Tang, X.Y.[Xiao-Ying],
Huang, W.[Wei],
Fang, Z.[Zhen],
Liu, F.[Feng],
Wang, J.Y.[Jing-Ya],
Yu, J.Y.[Jing-Yi],
Shi, Y.[Ye],
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models,
CVPR25(19963-19973)
IEEE DOI
2508
Representation learning, Accuracy, Purification, Foundation models,
Transportation, Prototypes, Robustness, Noise measurement, Signal to noise ratio
BibRef
Zhang, Y.T.[Yong-Ting],
Chen, L.[Lu],
Zheng, G.D.[Guo-Dong],
Gao, Y.F.[Yi-Feng],
Zheng, R.[Rui],
Fu, J.[Jinlan],
Yin, Z.F.[Zhen-Fei],
Jin, S.[Senjie],
Qiao, Y.[Yu],
Huang, X.J.[Xuan-Jing],
Zhao, F.[Feng],
Gui, T.[Tao],
Shao, J.[Jing],
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for
Vision Language Models,
CVPR25(19867-19878)
IEEE DOI
2508
Visualization, Computational modeling, Semantics, Data models, Safety
BibRef
Bhattacharjee, S.S.[Subhransu S.],
Campbell, D.[Dylan],
Shome, R.[Rahul],
Believing is Seeing: Unobserved Object Detection using Generative
Models,
CVPR25(19366-19377)
IEEE DOI
2508
Measurement, Training, Solid modeling, Adaptation models,
Visualization, Pipelines, Object detection, Diffusion models,
vision-language models
BibRef
Zhou, E.[Enshen],
Su, Q.[Qi],
Chi, C.[Cheng],
Zhang, Z.Z.[Zhi-Zheng],
Wang, Z.Y.[Zhong-Yuan],
Huang, T.J.[Tie-Jun],
Sheng, L.[Lu],
Wang, H.[He],
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and
Proactive Robotic Failure Detection,
CVPR25(6919-6929)
IEEE DOI Code:
WWW Link.
2508
Visualization, Codes, Accuracy, Prevention and mitigation,
Programming, Real-time systems, Closed loop systems, Monitoring,
vision-language model
BibRef
Zhou, W.J.[Wei-Jie],
Tao, M.[Manli],
Zhao, C.Y.[Chao-Yang],
Guo, H.Y.[Hai-Yun],
Dong, H.H.[Hong-Hui],
Tang, M.[Ming],
Wang, J.Q.[Jin-Qiao],
PhysVLM: Enabling Visual Language Models to Understand Robotic
Physical Reachability,
CVPR25(6940-6949)
IEEE DOI
2508
Visualization, Adaptation models, Service robots, Decision making,
Benchmark testing, Cognition, Reliability, Robots, embodied ai, ,
embodied visual reasoning
BibRef
Song, C.H.[Chan Hee],
Blukis, V.[Valts],
Tremblay, J.[Jonathan],
Tyree, S.[Stephen],
Su, Y.[Yu],
Birchfield, S.[Stan],
RoboSpatial: Teaching Spatial Understanding to 2D and 3D
Vision-Language Models for Robotics,
CVPR25(15768-15780)
IEEE DOI
2508
Training, Solid modeling, Soft sensors, Pipelines, Training data,
Predictive models, Spatial databases, Cognition, Robots,
robot perception
BibRef
Lozano, A.[Alejandro],
Sun, M.W.[Min Woo],
Burgess, J.[James],
Chen, L.[Liangyu],
Nirschl, J.J.[Jeffrey J.],
Gu, J.[Jeffrey],
Lopez, I.[Ivan],
Aklilu, J.[Josiah],
Rau, A.[Anita],
Katzer, A.W.[Austin Wolfgang],
Zhang, Y.H.[Yu-Hui],
Chiu, C.[Collin],
Wang, X.H.[Xiao-Han],
Song, A.S.[Alfred Seunghoon],
Tibshirani, R.[Robert],
Yeung-Levy, S.[Serena],
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and
Vision-Language Models Derived from Scientific Literature,
CVPR25(19724-19735)
IEEE DOI
2508
Annotations, Biological system modeling, Computational modeling,
Dermatology, Surgery, Streaming media, Radiology,
biomedical foundation models
BibRef
Xiao, R.[Rui],
Kim, S.[Sanghwan],
Georgescu, M.I.[Mariana-Iuliana],
Akata, Z.[Zeynep],
Alaniz, S.[Stephan],
FLAIR: VLM with Fine-grained Language-informed Image Representations,
CVPR25(24884-24894)
IEEE DOI Code:
WWW Link.
2508
Visualization, Codes, Semantic segmentation,
Computational modeling, Image representation, Benchmark testing,
multimodal learning
BibRef
Wang, X.[Xin],
Chen, K.[Kai],
Zhang, J.M.[Jia-Ming],
Chen, J.J.[Jing-Jing],
Ma, X.[Xingjun],
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in
Vision-Language Models,
CVPR25(19910-19920)
IEEE DOI Code:
WWW Link.
2508
Visualization, Accuracy, Scalability, Perturbation methods,
Benchmark testing, Robustness, Entropy, Safety, Tuning,
test-time adversarial prompt tuning
BibRef
Vasu, P.K.A.[Pavan Kumar Anasosalu],
Faghri, F.[Fartash],
Li, C.L.[Chun-Liang],
Koc, C.[Cem],
True, N.[Nate],
Antony, A.[Albert],
Santhanam, G.[Gokul],
Gabriel, J.[James],
Grasch, P.[Peter],
Tuzel, O.[Oncel],
Pouransari, H.[Hadi],
FastVLM: Efficient Vision Encoding for Vision Language Models,
CVPR25(19769-19780)
IEEE DOI Code:
WWW Link.
2508
Visualization, Image resolution, Accuracy, Image coding, Codes,
Benchmark testing, Encoding, vision-language models, efficiency
BibRef
Chen, Q.Z.[Qi-Zhou],
Wang, C.[Chengyu],
Wang, D.[Dakan],
Zhang, T.[Taolin],
Li, W.[Wangyue],
He, X.F.[Xiao-Feng],
Lifelong Knowledge Editing for Vision Language Models with Low-Rank
Mixture-of-Experts,
CVPR25(9455-9466)
IEEE DOI
2508
Training, Visualization, Filtering, Large language models, Semantics,
Benchmark testing, Routing, Generators, Robustness, model editing,
mixture of expert
BibRef
Chen, T.Y.[Tian-Yu],
Fu, X.C.[Xing-Cheng],
Gao, Y.[Yisen],
Qian, H.D.[Hao-Dong],
Wei, Y.[Yuecen],
Yan, K.[Kun],
Zhou, H.Y.[Hao-Yi],
Li, J.X.[Jian-Xin],
Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding,
CVPR25(4112-4121)
IEEE DOI
2508
Space vehicles, Geometry, Training, Adaptation models,
Extraterrestrial phenomena, Estimation, Stars, Vectors,
multi-modal learning
BibRef
Liu, Z.J.[Zhi-Jian],
Zhu, L.[Ligeng],
Shi, B.[Baifeng],
Zhang, Z.Y.[Zhuo-Yang],
Lou, Y.M.[Yu-Ming],
Yang, S.[Shang],
Xi, H.C.[Hao-Cheng],
Cao, S.Y.[Shi-Yi],
Gu, Y.X.[Yu-Xian],
Li, D.C.[Da-Cheng],
Li, X.[Xiuyu],
Tang, H.T.[Hao-Tian],
Fang, Y.H.[Yun-Hao],
Chen, Y.[Yukang],
Hsieh, C.Y.[Cheng-Yu],
Huang, D.A.[De-An],
Cheng, A.C.[An-Chieh],
Hu, J.Y.[Jin-Yi],
Liu, S.[Sifei],
Krishna, R.[Ranjay],
Molchanov, P.[Pavlo],
Kautz, J.[Jan],
Yin, H.X.[Hong-Xu],
Han, S.[Song],
Lu, Y.[Yao],
NVILA: Efficient Frontier Visual Language Models,
CVPR25(4122-4134)
IEEE DOI
2508
Training, Visualization, Accuracy, Systematics, Image coding, Costs,
Decoding, Spatial resolution, Videos
BibRef
Poppi, T.[Tobia],
Kasarla, T.[Tejaswi],
Mettes, P.[Pascal],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Hyperbolic Safety-Aware Vision-Language Models,
CVPR25(4222-4232)
IEEE DOI Code:
WWW Link.
2508
Adaptation models, Ethics, Law, Source coding, Robustness, Data models,
Safety, Standards, trustworthy, safety, nsfw, hyperbolic, vision-and-language
BibRef
Zhang, H.Y.[Hao-Yu],
Guo, Y.Y.[Yang-Yang],
Kankanhalli, M.[Mohan],
Joint Vision-Language Social Bias Removal for CLIP,
CVPR25(4246-4255)
IEEE DOI Code:
WWW Link.
2508
Measurement, Degradation, Protocols, Codes,
Prevention and mitigation, Computational modeling,
vision-language alignment
BibRef
Zhang, Y.[Yi],
Deng, Y.X.[Yi-Xuan],
Guo, M.H.[Meng-Hao],
Hu, S.M.[Shi-Min],
Adaptive Parameter Selection for Tuning Vision-Language Models,
CVPR25(4280-4290)
IEEE DOI
2508
Adaptation models, Adaptive learning, Manuals, Benchmark testing,
Performance gain, Flowering plants, Tuning, Overfitting
BibRef
Deng, A.[Ailin],
Cao, T.[Tri],
Chen, Z.[Zhirui],
Hooi, B.[Bryan],
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?,
CVPR25(3867-3876)
IEEE DOI
2508
Training, Visualization, Analytical models, Computational modeling,
Reliability theory, Robustness, Data models, Safety,
bias
BibRef
Huang, R.[Runhui],
Ding, X.P.[Xin-Peng],
Wang, C.W.[Chun-Wei],
Han, J.H.[Jian-Hua],
Liu, Y.L.[Yu-Long],
Zhao, H.S.[Heng-Shuang],
Xu, H.[Hang],
Hou, L.[Lu],
Zhang, W.[Wei],
Liang, X.D.[Xiao-Dan],
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large
Vision-Language Models,
CVPR25(29814-29824)
IEEE DOI
2508
Training, Visualization, Costs, Computational modeling,
Benchmark testing, Feature extraction, Image restoration,
visual token compression
BibRef
Wang, S.[Sudong],
Zhang, Y.J.[Yun-Jian],
Zhu, Y.[Yao],
Li, J.N.[Jia-Ning],
Wang, Z.Z.[Zi-Zhe],
Liu, Y.W.[Yan-Wei],
Ji, X.Y.[Xiang-Yang],
Towards Understanding How Knowledge Evolves in Large Vision-Language
Models,
CVPR25(29858-29868)
IEEE DOI Code:
WWW Link.
2508
Dimensionality reduction, Codes, Natural languages,
Probability distribution, Encoding, Trajectory, Model compression,
interpretation
BibRef
Deitke, M.[Matt],
Clark, C.[Christopher],
Lee, S.H.[Sang-Ho],
Tripathi, R.[Rohun],
Yang, Y.[Yue],
Park, J.S.[Jae Sung],
Salehi, M.[Mohammadreza],
Muennighoff, N.[Niklas],
Lo, K.[Kyle],
Soldaini, L.[Luca],
Lu, J.[Jiasen],
Anderson, T.[Taira],
Bransom, E.[Erin],
Ehsani, K.[Kiana],
Ngo, H.[Huong],
Chen, Y.[YenSung],
Patel, A.[Ajay],
Yatskar, M.[Mark],
Callison-Burch, C.[Chris],
Head, A.[Andrew],
Hendrix, R.[Rose],
Bastani, F.[Favyen],
VanderBilt, E.[Eli],
Lambert, N.[Nathan],
Chou, Y.[Yvonne],
Chheda, A.[Arnavi],
Sparks, J.[Jenna],
Skjonsberg, S.[Sam],
Schmitz, M.[Michael],
Sarnat, A.[Aaron],
Bischoff, B.[Byron],
Walsh, P.[Pete],
Newell, C.[Chris],
Wolters, P.[Piper],
Gupta, T.[Tanmay],
Zeng, K.H.[Kuo-Hao],
Borchardt, J.[Jon],
Groeneveld, D.[Dirk],
Nam, C.[Crystal],
Lebrecht, S.[Sophie],
Wittlif, C.[Caitlin],
Schoenick, C.[Carissa],
Michel, O.[Oscar],
Krishna, R.[Ranjay],
Weihs, L.[Luca],
Smith, N.A.[Noah A.],
Hajishirzi, H.[Hannaneh],
Girshick, R.[Ross],
Farhadi, A.[Ali],
Kembhavi, A.[Aniruddha],
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Vision-Language Models,
CVPR25(91-104)
IEEE DOI Code:
WWW Link.
2508
Award, CVPR, Paper HM. Training, Source coding, Computational modeling, Pipelines,
Training data, Data models, Open data, Synthetic data,
visual instruction tuning
BibRef
Zhao, W.[Wangbo],
Han, Y.Z.[Yi-Zeng],
Tang, J.S.[Jia-Sheng],
Li, Z.[Zhikai],
Song, Y.B.[Yi-Bing],
Wang, K.[Kai],
Wang, Z.Y.[Zhang-Yang],
You, Y.[Yang],
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for
Accelerating Large VLMs,
CVPR25(19814-19824)
IEEE DOI Code:
WWW Link.
2508
Visualization, Codes, Accuracy, Benchmark testing, Computational efficiency
BibRef
Lee, B.K.[Byung-Kwan],
Hachiuma, R.[Ryo],
Wang, Y.C.A.F.[Yu-Chi-Ang Frank],
Ro, Y.M.[Yong Man],
Wu, Y.H.[Yueh-Hua],
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
Language Models,
CVPR25(29545-29557)
IEEE DOI
2508
Training, Performance evaluation, Visualization,
Computational modeling, Natural languages, Merging, Tuning
BibRef
Sun, J.C.[Jing-Chen],
Sharma, R.[Rohan],
Lokhande, V.S.[Vishnu Suresh],
Chen, C.Y.[Chang-You],
Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt
Tuning,
WACV25(4714-4724)
IEEE DOI
2505
Training, Adaptation models, Visualization, Codes, Computational modeling,
Stochastic processes, Robustness, Tuning, vision-language model
BibRef
Safaei, B.[Bardia],
Patel, V.M.[Vishal M.],
Active Learning for Vision-Language Models,
WACV25(4902-4912)
IEEE DOI
2505
Training, Bridges, Uncertainty, Computational modeling, Active learning,
Measurement uncertainty, Entropy, Reliability, Image classification
BibRef
Wang, Y.C.[Yi-Cheng],
Zhang, Z.K.[Zhi-Kang],
Wang, J.[Jue],
Fan, D.[David],
Xu, Z.L.[Zhen-Lin],
Liu, L.[Linda],
Hao, X.[Xiang],
Bhat, V.[Vimal],
Li, X.Y.[Xin-Yu],
GEXIA: Granularity Expansion and Iterative Approximation for Scalable
Multi-Grained Video-Language Learning,
WACV25(4725-4735)
IEEE DOI
2505
Computational modeling, Semantics, Benchmark testing, Data models,
Iterative methods, Videos
BibRef
Colman, R.[Roman],
Vu, M.[Minh],
Bhattarai, M.[Manish],
Ma, M.[Martin],
Viswanathan, H.[Hari],
O'Malley, D.[Daniel],
Santos, J.E.[Javier E.],
PatchFinder: Leveraging Visual Language Models for Accurate
Information Retrieval Using Model Uncertainty,
WACV25(9146-9155)
IEEE DOI
2505
Visualization, Uncertainty, Accuracy, Computational modeling,
Software algorithms, Predictive models, Information retrieval,
log likelihood
BibRef
Jawade, B.[Bhavin],
Soares, J.V.B.[João V. B.],
Thadani, K.[Kapil],
Mohan, D.D.[Deen Dayal],
Eshratifar, A.E.[Amir Erfan],
Culpepper, B.[Benjamin],
de Juan, P.[Paloma],
Setlur, S.[Srirangaraj],
Govindaraju, V.[Venu],
SCOT: Self-Supervised Contrastive Pretraining for Zero-Shot
Compositional Retrieval,
WACV25(5509-5519)
IEEE DOI Code:
WWW Link.
2505
Training, Codes, Large language models, Image retrieval,
Benchmark testing, Web search, Standards, zero-shot
BibRef
Talemi, N.A.[Niloufar Alipour],
Kashiani, H.[Hossein],
Afghah, F.[Fatemeh],
Style-Pro: Style-Guided Prompt Learning for Generalizable
Vision-Language Models,
WACV25(6207-6216)
IEEE DOI
2505
Adaptation models, Image recognition, Computational modeling,
Benchmark testing, Data models, Robustness, Overfitting,
style shift learning
BibRef
Chang, H.S.[Hung-Shuo],
Wang, C.Y.[Chien-Yao],
Wang, R.R.[Richard Robert],
Chou, G.[Gene],
Liao, H.Y.M.[Hong-Yuan Mark],
Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual
Language Models,
WACV25(6217-6227)
IEEE DOI Code:
WWW Link.
2505
YOLO, Training, Visualization, Accuracy, Source coding, Semantics,
Predictive models, Real-time systems, Decoding, multi-task
BibRef
Westfechtel, T.[Thomas],
Zhang, D.[Dexuan],
Harada, T.[Tatsuya],
Combining Inherent Knowledge of Vision-Language Models with
Unsupervised Domain Adaptation Through Strong-Weak Guidance,
WACV25(6528-6537)
IEEE DOI
2505
Adaptation models, Accuracy, Predictive models, Benchmark testing,
Prediction algorithms, Labeling
BibRef
Chen, H.N.[Han-Ning],
Ni, Y.[Yang],
Huang, W.J.[Wen-Jun],
Liu, Y.[Yezi],
Jeong, S.[Sung_Heon],
Wen, F.[Fei],
Bastian, N.D.[Nathaniel D.],
Latapie, H.[Hugo],
Imani, M.[Mohsen],
VLTP: Vision-Language Guided Token Pruning for Task-Oriented
Segmentation,
WACV25(9353-9363)
IEEE DOI
2505
Uniform resource locators, Image segmentation, Image recognition,
Computational modeling, Large language models, Transformers, Load modeling
BibRef
Ali, E.[Eman],
Silva, S.[Sathira],
Khan, M.H.[Muhammad Haris],
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of
Vision-Language Models,
WACV25(6083-6093)
IEEE DOI
2505
Training, Adaptation models, Visualization, Accuracy, Prototypes,
Data models, Noise measurement, Image classification
BibRef
Zhang, C.[Ce],
Stepputtis, S.[Simon],
Sycara, K.[Katia],
Xie, Y.Q.[Ya-Qi],
Enhancing Vision-Language Few-Shot Adaptation with Negative Learning,
WACV25(5905-5915)
IEEE DOI Code:
WWW Link.
2505
Adaptation models, Codes, Accuracy, Computational modeling, Noise,
Transforms, Computational efficiency, Noise measurement, Few shot learning
BibRef
Yamada, M.[Moyuru],
Dharamshi, N.[Nimish],
Kohli, A.[Ayushi],
Kasu, P.[Prasad],
Khan, A.[Ainulla],
Ghulyani, M.[Manu],
Unleashing Potentials of Vision-Language Models for Zero-Shot HOI
Detection,
WACV25(5751-5760)
IEEE DOI
2505
Head, Computational modeling, Redundancy, Object detection,
Network architecture, Predictive models, Decoding,
vision-and-language
BibRef
Imam, R.[Raza],
Gani, H.[Hanan],
Huzaifa, M.[Muhammad],
Nandakumar, K.[Karthik],
Test-Time Low Rank Adaptation via Confidence Maximization for
Zero-Shot Generalization of Vision-Language Models,
WACV25(5449-5459)
IEEE DOI Code:
WWW Link.
2505
Adaptation models, Visualization, Codes, Large language models,
Transformers, Entropy, Tuning, Optimization
BibRef
Ghoddoosian, R.[Reza],
Agarwal, N.[Nakul],
Dwivedi, I.[Isht],
Darisuh, B.[Behzad],
ACE: Action Concept Enhancement of Video-Language Models in
Procedural Videos,
WACV25(9521-9531)
IEEE DOI
2505
Training, Visualization, Robustness, Assembly, Videos, Overfitting, zero-shot,
action recognition, vlm, vision language model, synonym, text augmentation
BibRef
Onoe, Y.[Yasumasa],
Rane, S.[Sunayana],
Berger, Z.[Zachary],
Bitton, Y.[Yonatan],
Cho, J.[Jaemin],
Garg, R.[Roopal],
Ku, A.[Alexander],
Parekh, Z.[Zarana],
Pont-Tuset, J.[Jordi],
Tanzer, G.[Garrett],
Wang, S.[Su],
Baldridge, J.[Jason],
DOCCI: Descriptions of Connected and Contrasting Images,
ECCV24(LX: 291-309).
Springer DOI
2412
BibRef
Li, T.[Tang],
Ma, M.M.[Meng-Meng],
Peng, X.[Xi],
DEAL: Disentangle and Localize Concept-level Explanations for VLMs,
ECCV24(XXXIX: 383-401).
Springer DOI
2412
BibRef
Li, S.C.[Shi-Cheng],
Li, L.[Lei],
Liu, Y.[Yi],
Ren, S.H.[Shu-Huai],
Liu, Y.X.[Yuan-Xin],
Gao, R.D.[Run-Dong],
Sun, X.[Xu],
Hou, L.[Lu],
Vitatecs: A Diagnostic Dataset for Temporal Concept Understanding of
Video-language Models,
ECCV24(LXX: 331-348).
Springer DOI
2412
BibRef
Yang, Y.T.[Yan-Ting],
Chen, M.H.[Ming-Hao],
Qiu, Q.[Qibo],
Wu, J.H.[Jia-Hao],
Wang, W.X.[Wen-Xiao],
Lin, B.B.[Bin-Bin],
Guan, Z.Y.[Zi-Yu],
He, X.F.[Xiao-Fei],
Adapt2reward: Adapting Video-language Models to Generalizable Robotic
Rewards via Failure Prompts,
ECCV24(LVII: 163-180).
Springer DOI
2412
BibRef
Rahmanzadehgervi, P.[Pooyan],
Bolton, L.[Logan],
Taesiri, M.R.[Mohammad Reza],
Nguyen, A.T.[Anh Totti],
Vision Language Models are blind,
ACCV24(V: 293-309).
Springer DOI
2412
BibRef
Chytas, S.P.[Sotirios Panagiotis],
Kim, H.W.J.[Hyun-Woo J.],
Singh, V.[Vikas],
Understanding Multi-compositional Learning in Vision and Language
Models via Category Theory,
ECCV24(XLVIII: 324-341).
Springer DOI
2412
BibRef
Song, Y.Z.[Yun-Zhu],
Chen, Y.S.[Yi-Syuan],
Lin, T.L.[Tzu-Ling],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
Capture Concept Through Comparison: Vision-and-language Representation
Learning with Intrinsic Information Mining,
ACCV24(III: 220-238).
Springer DOI
2412
BibRef
Adhikari, R.[Rabin],
Thapaliya, S.[Safal],
Dhakal, M.[Manish],
Khanal, B.[Bishesh],
Tunevlseg: Prompt Tuning Benchmark for Vision-language Segmentation
Models,
ACCV24(III: 44-62).
Springer DOI
2412
BibRef
He, H.C.[Hai-Chen],
Liu, W.B.[Wei-Bin],
Xing, W.W.[Wei-Wei],
Biefficient: Bidirectionally Prompting Vision-language Models for
Parameter-efficient Video Recognition,
ACCV24(III: 257-274).
Springer DOI
2412
BibRef
Yang, J.K.[Jing-Kang],
Dong, Y.H.[Yu-Hao],
Liu, S.[Shuai],
Li, B.[Bo],
Wang, Z.Y.[Zi-Yue],
Tan, H.R.[Hao-Ran],
Jiang, C.C.[Chen-Cheng],
Kang, J.[Jiamu],
Zhang, Y.H.[Yuan-Han],
Zhou, K.Y.[Kai-Yang],
Liu, Z.W.[Zi-Wei],
Octopus: Embodied Vision-language Programmer from Environmental
Feedback,
ECCV24(I: 20-38).
Springer DOI
2412
BibRef
Kar, O.F.[Oguzhan Fatih],
Tonioni, A.[Alessio],
Poklukar, P.[Petra],
Kulshrestha, A.[Achin],
Zamir, A.[Amir],
Tombari, F.[Federico],
Brave: Broadening the Visual Encoding of Vision-language Models,
ECCV24(XVI: 113-132).
Springer DOI
2412
BibRef
Kamath, A.[Amita],
Hsieh, C.Y.[Cheng-Yu],
Chang, K.W.[Kai-Wei],
Krishna, R.[Ranjay],
The Hard Positive Truth About Vision-language Compositionality,
ECCV24(XIV: 37-54).
Springer DOI
2412
BibRef
Jia, B.X.[Bao-Xiong],
Chen, Y.X.[Yi-Xin],
Yu, H.Y.[Huang-Yue],
Wang, Y.[Yan],
Niu, X.S.[Xue-Song],
Liu, T.Y.[Teng-Yu],
Li, Q.[Qing],
Huang, S.Y.[Si-Yuan],
Sceneverse: Scaling 3d Vision-language Learning for Grounded Scene
Understanding,
ECCV24(IX: 289-310).
Springer DOI
2412
BibRef
Zhang, Y.F.[Yi-Feng],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Learning Chain of Counterfactual Thought for Bias-robust
Vision-language Reasoning,
ECCV24(VIII: 334-351).
Springer DOI
2412
BibRef
Li, J.[Junyan],
Chen, D.[Delin],
Cai, T.[Tianle],
Chen, P.H.[Pei-Hao],
Hong, Y.[Yining],
Chen, Z.F.[Zhen-Fang],
Shen, Y.K.[Yi-Kang],
Gan, C.[Chuang],
Flexattention for Efficient High-resolution Vision-language Models,
ECCV24(XXV: 286-302).
Springer DOI
2412
BibRef
Li, X.[Xiang],
Ding, J.[Jian],
Chen, Z.Y.[Zhao-Yang],
Elhoseiny, M.[Mohamed],
UNI3DL: A Unified Model for 3d Vision-language Understanding,
ECCV24(XXIII: 74-92).
Springer DOI
2412
BibRef
Hao, T.X.[Tian-Xiang],
Ding, X.H.[Xiao-Han],
Feng, J.X.[Jue-Xiao],
Yang, Y.H.[Yu-Hong],
Chen, H.[Hui],
Ding, G.[Guiguang],
Quantized Prompt for Efficient Generalization of Vision-language Models,
ECCV24(XIX: 54-73).
Springer DOI
2412
BibRef
Xu, H.B.[Huang-Biao],
Ke, X.[Xiao],
Li, Y.Z.[Yue-Zhou],
Xu, R.[Rui],
Wu, H.Q.[Huan-Qi],
Lin, X.F.[Xiao-Feng],
Guo, W.Z.[Wen-Zhong],
Vision-language Action Knowledge Learning for Semantic-aware Action
Quality Assessment,
ECCV24(XLII: 423-440).
Springer DOI
2412
BibRef
Zhu, Z.Y.[Zi-Yu],
Zhang, Z.[Zhuofan],
Ma, X.J.[Xiao-Jian],
Niu, X.S.[Xue-Song],
Chen, Y.X.[Yi-Xin],
Jia, B.X.[Bao-Xiong],
Deng, Z.D.[Zhi-Dong],
Huang, S.Y.[Si-Yuan],
Li, Q.[Qing],
Unifying 3d Vision-language Understanding via Promptable Queries,
ECCV24(XLIV: 188-206).
Springer DOI
2412
BibRef
Zhang, J.M.[Jia-Ming],
Ma, X.J.[Xing-Jun],
Wang, X.[Xin],
Qiu, L.Y.[Ling-Yu],
Wang, J.Q.[Jia-Qi],
Jiang, Y.G.[Yu-Gang],
Sang, J.[Jitao],
Adversarial Prompt Tuning for Vision-language Models,
ECCV24(XLV: 56-72).
Springer DOI
2412
BibRef
Wu, G.[Ge],
Zhang, X.[Xin],
Li, Z.[Zheng],
Chen, Z.W.[Zhao-Wei],
Liang, J.J.[Jia-Jun],
Yang, J.[Jian],
Li, X.[Xiang],
Cascade Prompt Learning for Vision-language Model Adaptation,
ECCV24(L: 304-321).
Springer DOI
2412
BibRef
Jiang, H.B.[Hao-Bin],
Yue, J.P.[Jun-Peng],
Luo, H.[Hao],
Ding, Z.[Ziluo],
Lu, Z.Q.[Zong-Qing],
Reinforcement Learning Friendly Vision-language Model for Minecraft,
ECCV24(LXVIII: 1-17).
Springer DOI
2412
BibRef
Nguyen, A.T.[A. Tuan],
Tai, K.S.[Kai Sheng],
Chen, B.C.[Bor-Chun],
Shukla, S.N.[Satya Narayan],
Yu, H.C.[Han-Chao],
Torr, P.H.S.[Philip H.S.],
Tian, T.P.[Tai-Peng],
Lim, S.N.[Ser-Nam],
ucap: An Unsupervised Prompting Method for Vision-language Models,
ECCV24(LXXIV: 425-439).
Springer DOI
2412
BibRef
Zhang, Y.[Yi],
Yu, K.[Ke],
Wu, S.Q.[Si-Qi],
He, Z.H.[Zhi-Hai],
Conceptual Codebook Learning for Vision-language Models,
ECCV24(LXXVII: 235-251).
Springer DOI
2412
BibRef
Chatterjee, A.[Agneet],
Luo, Y.R.[Yi-Ran],
Gokhale, T.[Tejas],
Yang, Y.Z.[Ye-Zhou],
Baral, C.[Chitta],
Revision: Rendering Tools Enable Spatial Fidelity in Vision-language
Models,
ECCV24(XXX: 339-357).
Springer DOI
2412
BibRef
Sharma, P.[Pratyusha],
Shaham, T.R.[Tamar Rott],
Baradad, M.[Manel],
Rodriíuez-Muñoz, A.[Adrián],
Duggal, S.[Shivam],
Isola, P.[Phillip],
Torralba, A.[Antonio],
Fu, S.[Stephanie],
A Vision Check-up for Language Models,
CVPR24(14410-14419)
IEEE DOI
2410
Representation learning, Visualization, Analytical models, Codes,
Image synthesis, Computational modeling
BibRef
Parodi, F.[Felipe],
Matelsky, J.K.[Jordan K.],
Regla-Vargas, A.[Alejandra],
Foglia, E.E.[Elizabeth E.],
Lim, C.[Charis],
Weinberg, D.[Danielle],
Kording, K.P.[Konrad P.],
Herrick, H.M.[Heidi M.],
Platt, M.L.[Michael L.],
Vision-language models for decoding provider attention during
neonatal resuscitation,
CVPM24(343-353)
IEEE DOI
2410
Training, Pediatrics, Accuracy, Semantics, Decision making, Transformers
BibRef
Zhang, Y.B.[Ya-Bin],
Zhu, W.J.[Wen-Jie],
Tang, H.[Hui],
Ma, Z.Y.[Zhi-Yuan],
Zhou, K.Y.[Kai-Yang],
Zhang, L.[Lei],
Dual Memory Networks: A Versatile Adaptation Approach for
Vision-Language Models,
CVPR24(28718-28728)
IEEE DOI Code:
WWW Link.
2410
Training, Knowledge engineering, Adaptation models, Codes,
Training data, Data models, Vision-language models,
versatile adaptation
BibRef
Guo, Y.C.[Yun-Cheng],
Gu, X.D.[Xiao-Dong],
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language
Models,
CVPR24(28695-28705)
IEEE DOI
2410
Adaptation models, Adaptive systems, Noise, Manuals, Robustness,
Noise measurement,
prompt learning
BibRef
Han, J.[Jinwei],
Lin, Z.W.[Zhi-Wen],
Sun, Z.Y.[Zhong-Yisun],
Gao, Y.G.[Ying-Guo],
Yan, K.[Ke],
Ding, S.H.[Shou-Hong],
Gao, Y.[Yuan],
Xia, G.S.[Gui-Song],
Anchor-based Robust Finetuning of Vision-Language Models,
CVPR24(26909-26918)
IEEE DOI
2410
Image recognition, Zero-shot learning, Semantics,
Benchmark testing, Anchor, Robust Finetuning
BibRef
Cao, Q.L.[Qing-Long],
Zheng-Qin, X.,
Chen, Y.T.[Yun-Tian],
Chao, M.,
Yang, X.K.[Xiao-Kang],
Domain Prompt Learning with Quaternion Networks,
CVPR24(26627-26636)
IEEE DOI Code:
WWW Link.
2410
Knowledge engineering, Adaptation models, Codes, Quaternions,
Face recognition, Contrastive learning, vision-language models,
quaternion networks
BibRef
Li, L.[Lin],
Guan, H.Y.[Hao-Yan],
Qiu, J.N.[Jia-Ning],
Spratling, M.[Michael],
One Prompt Word is Enough to Boost Adversarial Robustness for
Pre-Trained Vision-Language Models,
CVPR24(24408-24419)
IEEE DOI Code:
WWW Link.
2410
Accuracy, Codes, Training data, Robustness,
Computational efficiency, vision-language models, VLMs
BibRef
Zanella, M.[Maxime],
Fuchs, C.[Clément],
de Vleeschouwer, C.[Christophe],
Ayed, I.B.[Ismail Ben],
Realistic Test-Time Adaptation of Vision-Language Models,
CVPR25(25103-25112)
IEEE DOI Code:
WWW Link.
2508
BibRef
And: A2, A1, A3, Only:
Online Gaussian Test-Time Adaptation of Vision-Language Models,
MULA25(128-137)
IEEE DOI Code:
WWW Link.
2512
Adaptation models, Codes, Predictive models, Performance gain,
Robustness, vision-language, test-time adaptation,
regularized maximum likelihood estimation.
Measurement, Visualization, Accuracy, Protocols,
Limiting, Predictive models, Data models, Mathematical models, CLIP
BibRef
Zanella, M.[Maxime],
Ayed, I.B.[Ismail Ben],
On the Test-Time Zero-Shot Generalization of Vision-Language Models:
Do we Really need Prompt Learning?,
CVPR24(23783-23793)
IEEE DOI
2410
Training, Systematics, Computational modeling, Quality assessment,
Computational efficiency, vision-language,
training-free
BibRef
Yang, S.[Senqiao],
Tian, Z.[Zhuotao],
Jiang, L.[Li],
Jia, J.Y.[Jia-Ya],
Unified Language-Driven Zero-Shot Domain Adaptation,
CVPR24(23407-23415)
IEEE DOI
2410
Representation learning, Adaptation models, Visualization,
Correlation, Scalability, Computational modeling,
Vision-Language Model
BibRef
Cui, J.Q.[Jie-Quan],
Zhu, B.[Beier],
Wen, X.[Xin],
Qi, X.J.[Xiao-Juan],
Yu, B.[Bei],
Zhang, H.W.[Han-Wang],
Classes Are Not Equal: An Empirical Study on Image Recognition
Fairness,
CVPR24(23283-23292)
IEEE DOI
2410
Training, Representation learning, Image recognition, Accuracy,
Predictive models, Network architecture, Prediction algorithms,
Vision-Language Models
BibRef
Stojnic, V.[Vladan],
Kalantidis, Y.[Yannis],
Tolias, G.[Giorgos],
Label Propagation for Zero-shot Classification with Vision-Language
Models,
CVPR24(23209-23218)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Closed box, Encoding, Data models,
vision-language models, label propagation, zero-shot classification
BibRef
Yuan, T.[Tongtong],
Zhang, X.[Xuange],
Liu, K.[Kun],
Liu, B.[Bo],
Chen, C.[Chen],
Jin, J.[Jian],
Jiao, Z.Z.[Zhen-Zhen],
Towards Surveillance Video-and-Language Understanding: New Dataset,
Baselines, and Challenges,
CVPR24(22052-22061)
IEEE DOI Code:
WWW Link.
2410
Annotations, Surveillance, Semantics, Benchmark testing,
Public security, Timing, Security, Dataset Annotation
BibRef
Chen, Y.F.[Yi-Fei],
Chen, D.P.[Da-Peng],
Liu, R.J.[Rui-Jin],
Zhou, S.[Sai],
Xue, W.Y.[Wen-Yuan],
Peng, W.[Wei],
Align Before Adapt: Leveraging Entity-to-Region Alignments for
Generalizable Video Action Recognition,
CVPR24(18688-18698)
IEEE DOI
2410
Representation learning, Adaptation models, Visualization, Semantics,
Transformers, Vectors, Video action recognition, visual-language model
BibRef
Mittal, H.[Himangi],
Agarwal, N.[Nakul],
Lo, S.Y.[Shao-Yuan],
Lee, K.[Kwonjoon],
Can't make an Omelette without Breaking some Eggs: Plausible Action
Anticipation using Large Video-Language Models,
CVPR24(18580-18590)
IEEE DOI
2410
Accuracy, Computational modeling, Linear programming,
Action Anticipation, Video, Large Multimodal Models
BibRef
Kahatapitiya, K.[Kumara],
Arnab, A.[Anurag],
Nagran, A.[Arsha],
Ryoo, M.S.[Michael S.],
VicTR: Video-conditioned Text Representations for Activity
Recognition,
CVPR24(18547-18558)
IEEE DOI
2410
Training, Visualization, Adaptation models, Semantics, Focusing,
Benchmark testing, Vision-language models, Activity Recognition,
Video-conditioned Text
BibRef
Wu, T.Y.[Tz-Ying],
Ho, C.H.[Chih-Hui],
Vasconcelos, N.M.[Nuno M.],
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification,
CVPR24(16531-16540)
IEEE DOI Code:
WWW Link.
2410
Measurement, Training, Frequency modulation, Accuracy, Taxonomy,
Semantics, Hierarchical Classification, Visual-language foundation model
BibRef
Zhao, G.[Ganlong],
Li, G.B.[Guan-Bin],
Chen, W.[Weikai],
Yu, Y.Z.[Yi-Zhou],
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with
Open-Vocabulary Detection and StructurEd Representation,
CVPR24(16296-16306)
IEEE DOI
2410
Art, Accuracy, Navigation, Annotations, Detectors,
Vision-and-Language Navigation, Open-vocabulary, Multi-Modal Learning
BibRef
Li, X.[Xin],
Wu, Y.F.[Yun-Fei],
Jiang, X.H.[Xing-Hua],
Guo, Z.H.[Zhi-Hao],
Gong, M.M.[Ming-Ming],
Cao, H.Y.[Hao-Yu],
Liu, Y.S.[Yin-Song],
Jiang, D.Q.[De-Qiang],
Sun, X.[Xing],
Enhancing Visual Document Understanding with Contrastive Learning in
Large Visual-Language Models,
CVPR24(15546-15555)
IEEE DOI
2410
Visualization, Computational modeling, Contrastive learning,
Benchmark testing, Feature extraction, Filling, Contrastive Learning
BibRef
Pham, K.[Khoi],
Huynh, C.[Chuong],
Lim, S.N.[Ser-Nam],
Shrivastava, A.[Abhinav],
Composing Object Relations and Attributes for Image-Text Matching,
CVPR24(14354-14363)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Image edge detection,
Semantics, Benchmark testing, vision-language, image retrieval,
image-text matching
BibRef
Xu, Z.L.[Zhen-Lin],
Zhu, Y.[Yi],
Deng, S.Q.[Si-Qi],
Mittal, A.[Abhay],
Chen, Y.B.[Yan-Bei],
Wang, M.[Manchen],
Favaro, P.[Paolo],
Tighe, J.[Joseph],
Modolo, D.[Davide],
Benchmarking Zero-Shot Recognition with Vision-Language Models:
Challenges on Granularity and Specificity,
WhatNext24(1827-1836)
IEEE DOI
2410
Computational modeling, Face recognition, Semantics, Training data,
Focusing, Vision and language models, Zero-shot recognition,
Benchmarking
BibRef
Luo, Z.W.[Zi-Wei],
Gustafsson, F.K.[Fredrik K.],
Zhao, Z.[Zheng],
Sjölund, J.[Jens],
Schön, T.B.[Thomas B.],
Photo-Realistic Image Restoration in the Wild with Controlled
Vision-Language Models,
NTIRE24(6641-6651)
IEEE DOI
2410
Degradation, Training, Image synthesis, Pipelines, Transform coding,
Diffusion models, Feature extraction, Image restoration, real-world
BibRef
Huang, C.Q.[Chao-Qin],
Jiang, A.[Aofan],
Feng, J.H.[Jing-Hao],
Zhang, Y.[Ya],
Wang, X.C.[Xin-Chao],
Wang, Y.F.[Yan-Feng],
Adapting Visual-Language Models for Generalizable Anomaly Detection
in Medical Images,
CVPR24(11375-11385)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Image segmentation, Visualization,
Source coding, Semantics, Anomaly Detection, Medical Images
BibRef
Bang, J.[Jihwan],
Ahn, S.[Sumyeong],
Lee, J.G.[Jae-Gil],
Active Prompt Learning in Vision Language Models,
CVPR24(26994-27004)
IEEE DOI Code:
WWW Link.
2410
Learning systems, Adaptation models, Codes, Sampling methods, Labeling
BibRef
Pan, C.[Chenbin],
Yaman, B.[Burhaneddin],
Nesti, T.[Tommaso],
Mallik, A.[Abhirup],
Allievi, A.G.[Alessandro G],
Velipasalar, S.[Senem],
Ren, L.[Liu],
VLP: Vision Language Planning for Autonomous Driving,
CVPR24(14760-14769)
IEEE DOI
2410
Training, Urban areas, Linguistics, Cognition, Robustness, Planning
BibRef
Liang, M.[Mingfu],
Su, J.C.[Jong-Chyi],
Schulter, S.[Samuel],
Garg, S.[Sparsh],
Zhao, S.Y.[Shi-Yu],
Wu, Y.[Ying],
Chandraker, M.[Manmohan],
AIDE: An Automatic Data Engine for Object Detection in Autonomous
Driving,
CVPR24(14695-14706)
IEEE DOI
2410
Training, Costs, Roads, Pipelines, Object detection, Benchmark testing,
Data models, Autonomous Driving, Vision Language Model,
Automatic Data Engine
BibRef
Li, Z.[Zheng],
Li, X.[Xiang],
Fu, X.[Xinyi],
Zhang, X.[Xin],
Wang, W.Q.[Wei-Qiang],
Chen, S.[Shuo],
Yang, J.[Jian],
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models,
CVPR24(26607-26616)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Prediction algorithms, Data models,
Vectors, Probability distribution, knowledge distillation,
zero-shot learning
BibRef
Khandelwal, A.[Anant],
PromptSync: Bridging Domain Gaps in Vision-Language Models through
Class-Aware Prototype Alignment and Discrimination,
ZeroShot24(7819-7828)
IEEE DOI
2410
Adaptation models, Computational modeling, Prototypes,
Contrastive learning, Benchmark testing, Robustness
BibRef
Hirohashi, Y.[Yuki],
Hirakawa, T.[Tsubasa],
Yamashita, T.[Takayoshi],
Fujiyoshi, H.[Hironobu],
Prompt Learning with One-Shot Setting based Feature Space Analysis in
Vision-and-Language Models,
ZeroShot24(7761-7770)
IEEE DOI
2410
Learning systems, Analytical models, Adaptation models,
Image resolution, Accuracy, Vision-and-Language Model, Prompt Learning
BibRef
Zhang, L.[Le],
Awal, R.[Rabiul],
Agrawal, A.[Aishwarya],
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to
Enhance Visio-Linguistic Compositional Understanding,
CVPR24(13774-13784)
IEEE DOI Code:
WWW Link.
2410
Annotations, Semantics, Refining, Text to image,
Contrastive learning, Benchmark testing, Cognition,
contrastive learning
BibRef
Rosasco, A.[Andrea],
Berti, S.[Stefano],
Pasquale, G.[Giulia],
Malafronte, D.[Damiano],
Sato, S.[Shogo],
Segawa, H.[Hiroyuki],
Inada, T.[Tetsugo],
Natale, L.[Lorenzo],
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized
Vision-Language Tasks,
CVPR24(22239-22248)
IEEE DOI Code:
WWW Link.
2410
Measurement, Codes, Image synthesis, Text to image,
Benchmark testing, benchmark, dataset,
compositionality
BibRef
Cheng, S.[Sijie],
Guo, Z.C.[Zhi-Cheng],
Wu, J.[Jinawen],
Fang, K.[Kechen],
Li, P.[Peng],
Liu, H.P.[Hua-Ping],
Liu, Y.[Yang],
EgoThink: Evaluating First-Person Perspective Thinking Capability of
Vision-Language Models,
CVPR24(14291-14302)
IEEE DOI
2410
Bridges, Visualization, Computational modeling, Focusing,
Benchmark testing, Planning, Egocentric, Vision-Language Models, Benchmark
BibRef
Kil, J.[Jihyung],
Song, C.H.[Chan Hee],
Zheng, B.[Boyuan],
Deng, X.[Xiang],
Su, Y.[Yu],
Chao, W.L.[Wei-Lun],
Dual-View Visual Contextualization for Web Navigation,
CVPR24(14445-14454)
IEEE DOI
2410
Visualization, Navigation, Benchmark testing,
AI Agents, Web Agents, Web Navigation, Vision-Language,
Multimodal Agents
BibRef
Guo, Y.Y.[Yang-Yang],
Wang, G.Z.[Guang-Zhi],
Kankanhalli, M.[Mohan],
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation,
CVPR24(15699-15709)
IEEE DOI
2410
Codes, Computational modeling, Perturbation methods, Loading,
Transformers, Vision-Language,
Low-rank Approximation
BibRef
Farina, M.[Matteo],
Mancini, M.[Massimiliano],
Cunegatti, E.[Elia],
Cunegatti, E.[Elia],
Iacca, G.[Giovanni],
Ricci, E.[Elisa],
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning,
CVPR24(16185-16195)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Transfer learning, Neurons,
Benchmark testing, multimodal learning,
sparse neural networks
BibRef
Mu, F.Z.[Fang-Zhou],
Mo, S.C.[Si-Cheng],
Li, Y.[Yin],
SnAG: Scalable and Accurate Video Grounding,
CVPR24(18930-18940)
IEEE DOI Code:
WWW Link.
2410
Training, Analytical models, Accuracy, Grounding, Scalability,
Computational modeling, Video understanding,
Vision-Language Learning
BibRef
Cao, Y.H.[Yun-Hao],
Ji, K.X.[Kai-Xiang],
Huang, Z.Y.[Zi-Yuan],
Zheng, C.Y.[Chuan-Yang],
Liu, J.J.[Jia-Jia],
Wang, J.[Jian],
Chen, J.D.[Jing-Dong],
Yang, M.[Ming],
Towards Better Vision-Inspired Vision-Language Models,
CVPR24(13537-13547)
IEEE DOI
2410
Training, Bridges, Visualization, Computational modeling,
Poles and towers, Benchmark testing, deep learning, deep prompt
BibRef
Shi, K.Y.[Kun-Yu],
Dong, Q.[Qi],
Goncalves, L.[Luis],
Tu, Z.W.[Zhuo-Wen],
Soatto, S.[Stefano],
Non-autoregressive Sequence-to-Sequence Vision-Language Models,
CVPR24(13603-13612)
IEEE DOI
2410
Visualization, Technological innovation, Computational modeling,
Predictive models, Drives, Encoding, Non-autoregressive, CTC,
vision language models
BibRef
Man, Y.Z.[Yun-Ze],
Gui, L.Y.[Liang-Yan],
Wang, Y.X.[Yu-Xiong],
Situational Awareness Matters in 3D Vision Language Reasoning,
CVPR24(13678-13688)
IEEE DOI
2410
Visualization, Solid modeling, Estimation, Performance gain,
Cognition, Vision-Language, Multi-modal, 3D Reasoning
BibRef
Zheng, C.H.[Chen-Hao],
Zhang, J.[Jieyu],
Kembhavi, A.[Aniruddha],
Krishna, R.[Ranjay],
Iterated Learning Improves Compositionality in Large Vision-Language
Models,
CVPR24(13785-13795)
IEEE DOI
2410
Training, Training data, Games, Contrastive learning,
Benchmark testing, Performance gain, Cognitive science
BibRef
Song, C.H.[Chull Hwan],
Hwang, T.[Taebaek],
Yoon, J.Y.[Joo-Young],
Choi, S.[Shunghyun],
Gu, Y.H.[Yeong Hyeon],
SyncMask: Synchronized Attentional Masking for Fashion-centric
Vision-Language Pretraining,
CVPR24(13948-13957)
IEEE DOI
2410
Training, Visualization, Image segmentation, Image resolution,
Refining, Contrastive learning
BibRef
Pramanick, S.[Shraman],
Han, G.X.[Guang-Xing],
Hou, R.[Rui],
Nag, S.[Sayan],
Lim, S.N.[Ser-Nam],
Ballas, N.[Nicolas],
Wang, Q.F.[Qi-Fan],
Chellappa, R.[Rama],
Almahairi, A.[Amjad],
Jack of All Tasks, Master of Many: Designing General-purpose
Coarse-to-Fine Vision-Language Model,
CVPR24(14076-14088)
IEEE DOI Code:
WWW Link.
2410
Image segmentation, Visualization, Image coding, Filters, Grounding,
Machine vision, Visual systems
BibRef
Zeng, Y.[Yunan],
Huang, Y.[Yan],
Zhang, J.J.[Jin-Jin],
Jie, Z.Q.[Ze-Qun],
Chai, Z.H.[Zhen-Hua],
Wang, L.[Liang],
Investigating Compositional Challenges in Vision-Language Models for
Visual Grounding,
CVPR24(14141-14151)
IEEE DOI
2410
Visualization, Codes, Grounding, Annotations, Pipelines, Benchmark testing
BibRef
Karmanov, A.[Adilbek],
Guan, D.[Dayan],
Lu, S.J.[Shi-Jian],
El Saddik, A.[Abdulmotaleb],
Xing, E.[Eric],
Efficient Test-Time Adaptation of Vision-Language Models,
CVPR24(14162-14171)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Computational modeling, Noise,
Predictive models, Benchmark testing
BibRef
Sameni, S.[Sepehr],
Kafle, K.[Kushal],
Tan, H.[Hao],
Jenni, S.[Simon],
Building Vision-Language Models on Solid Foundations with Masked
Distillation,
CVPR24(14216-14226)
IEEE DOI
2410
Training, Solid modeling, Visualization, Computational modeling,
Semantic segmentation, Buildings, LLM
BibRef
Peng, W.[Wujian],
Xie, S.C.[Si-Cheng],
You, Z.[Zuyao],
Lan, S.Y.[Shi-Yi],
Wu, Z.X.[Zu-Xuan],
Synthesize, Diagnose, and Optimize: Towards Fine-Grained
Vision-Language Understanding,
CVPR24(13279-13288)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Pipelines, Benchmark testing,
Linguistics, Vision language model, Fine-grained understdanding
BibRef
Zhao, Y.[Yue],
Zhao, L.[Long],
Zhou, X.Y.[Xing-Yi],
Wu, J.L.[Jia-Lin],
Chu, C.T.[Chun-Te],
Miao, H.[Hui],
Schroff, F.[Florian],
Adam, H.[Hartwig],
Liu, T.[Ting],
Gong, B.Q.[Bo-Qing],
Krähenbühl, P.[Philipp],
Yuan, L.Z.[Liang-Zhe],
Distilling Vision-Language Models on Millions of Videos,
CVPR24(13106-13116)
IEEE DOI
2410
Adaptation models, Computational modeling, Benchmark testing,
Data models, Text to video
BibRef
Chen, J.N.[Jie-Neng],
Yu, Q.H.[Qi-Hang],
Shen, X.H.[Xiao-Hui],
Yuille, A.L.[Alan L.],
Chen, L.C.[Liang-Chieh],
ViTamin: Designing Scalable Vision Models in the Vision-Language Era,
CVPR24(12954-12966)
IEEE DOI
2410
Training, Image segmentation, Accuracy, Protocols, Image coding, Scalability,
Computational modeling, Vision-Language Models, Architectural Design
BibRef
Liu, S.H.[Shi-Hong],
Yu, S.[Samuel],
Lin, Z.Q.[Zhi-Qiu],
Pathak, D.[Deepak],
Ramanan, D.[Deva],
Language Models as Black-Box Optimizers for Vision-Language Models,
CVPR24(12687-12697)
IEEE DOI
2410
Computational modeling, Natural languages, Closed box,
Text to image, Human in the loop, Data models,
generative models
BibRef
Howard, P.[Phillip],
Madasu, A.[Avinash],
Le, T.[Tiep],
Moreno, G.L.[Gustavo Lujan],
Bhiwandiwalla, A.[Anahita],
Lal, V.[Vasudev],
SocialCounterfactuals: Probing and Mitigating Intersectional Social
Biases in Vision-Language Models with Counterfactual Examples,
CVPR24(11975-11985)
IEEE DOI
2410
Training, Prevention and mitigation, Text to image,
Diffusion models, Fairness, social bias,
counterfactuals
BibRef
Jiang, Y.K.[Yan-Kai],
Huang, Z.Z.[Zhong-Zhen],
Zhang, R.Z.[Rong-Zhao],
Zhang, X.F.[Xiao-Fan],
Zhang, S.T.[Shao-Ting],
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and
Self-Prompting,
CVPR24(11386-11397)
IEEE DOI
2410
Training, Visualization, Pathology, Image segmentation,
Image analysis, Computational modeling, Vision-Language Model
BibRef
Kim, Y.[Younghyun],
Mo, S.[Sangwoo],
Kim, M.[Minkyu],
Lee, K.[Kyungmin],
Lee, J.[Jaeho],
Shin, J.[Jinwoo],
Discovering and Mitigating Visual Biases Through Keyword Explanation,
CVPR24(11082-11092)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Image recognition, Computational modeling,
Training data, Flowering plants, bias and fairness, explainable AI,
vision-language model
BibRef
Li, R.[Rui],
Fischer, T.[Tobias],
Segu, M.[Mattia],
Pollefeys, M.[Marc],
Van Gool, L.J.[Luc J.],
Tombari, F.[Federico],
Know Your Neighbors: Improving Single-View Reconstruction via Spatial
Vision-Language Reasoning,
CVPR24(9848-9858)
IEEE DOI Code:
WWW Link.
2410
Geometry, Visualization, Attention mechanisms, Shape, Semantics,
radiance field, vision-language model, spatial context, spatial attention
BibRef
Zeng, Z.[Ziyao],
Wang, D.[Daniel],
Yang, F.Y.[Feng-Yu],
Park, H.[Hyoungseob],
Soatto, S.[Stefano],
Lao, D.[Dong],
Wong, A.[Alex],
WorDepth: Variational Language Prior for Monocular Depth Estimation,
CVPR24(9708-9719)
IEEE DOI Code:
WWW Link.
2410
Measurement, Codes, Estimation, Encoding,
Monocular Depth Estimation, Vision-Language Model, Variational Model
BibRef
Hu, Y.S.[Yu-Shi],
Stretcu, O.[Otilia],
Lu, C.T.[Chun-Ta],
Viswanathan, K.[Krishnamurthy],
Hata, K.[Kenji],
Luo, E.[Enming],
Krishna, R.[Ranjay],
Fuxman, A.[Ariel],
Visual Program Distillation: Distilling Tools and Programmatic
Reasoning into Vision-Language Models,
CVPR24(9590-9601)
IEEE DOI
2410
Visualization, Adaptation models, Computational modeling,
Instruments, Loading, Music, Cognition, vision-language model,
tools
BibRef
Zanella, M.[Maxime],
Fuchs, C.[Clément],
Ben Ayed, I.[Ismail],
de Vleeschouwer, C.[Christophe],
Vocabulary-Free Few-Shot Learning for Vision-Language Models,
MULA25(149-158)
IEEE DOI Code:
WWW Link.
2512
Adaptation models, Visualization, Computational modeling,
Semantics, Computational efficiency, Few shot learning, prompts
BibRef
Silva-Rodríguez, J.[Julio],
Hajimiri, S.[Sina],
Ben Ayed, I.[Ismail],
Dolz, J.[Jose],
A Closer Look at the Few-Shot Adaptation of Large Vision-Language
Models,
CVPR24(23681-23690)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Computational modeling,
Transfer learning, Probes
BibRef
Zanella, M.[Maxime],
Ben Ayed, I.[Ismail],
Low-Rank Few-Shot Adaptation of Vision-Language Models,
Prompting24(1593-1603)
IEEE DOI
2410
Training, Adaptation models, Design methodology,
Few shot learning, Vision-Language, few-shot,
adapter
BibRef
Yang, C.[Cheng],
Xu, R.[Rui],
Guo, Y.[Ye],
Huang, P.X.[Pei-Xiang],
Chen, Y.[Yiru],
Ding, W.[Wenkui],
Wang, Z.Y.[Zhong-Yuan],
Zhou, H.[Hong],
Improving Vision-and-Language Reasoning via Spatial Relations
Modeling,
WACV24(758-767)
IEEE DOI
2404
Visualization, Analytical models, Graphical models,
Statistical analysis, Computational modeling, Excavation,
Vision + language and/or other modalities
BibRef
Shen, S.[Sheng],
Yang, S.[Shijia],
Zhang, T.J.[Tian-Jun],
Zhai, B.[Bohan],
Gonzalez, J.E.[Joseph E.],
Keutzer, K.[Kurt],
Darrell, T.J.[Trevor J.],
Multitask Vision-Language Prompt Tuning,
WACV24(5644-5655)
IEEE DOI
2404
Learning systems, Visualization, Adaptation models,
Benchmark testing, Vectors, Task analysis, Algorithms,
Vision + language and/or other modalities
BibRef
Zhang, G.[Gengyuan],
Zhang, Y.R.[Yu-Rui],
Zhang, K.[Kerui],
Tresp, V.[Volker],
Can Vision-Language Models be a Good Guesser? Exploring VLMs for
Times and Location Reasoning,
WACV24(625-634)
IEEE DOI Code:
WWW Link.
2404
Visualization, Computational modeling, Feature extraction,
Cognition, Task analysis, Commonsense reasoning, Algorithms,
Vision + language and/or other modalities
BibRef
Ganz, R.[Roy],
Nuriel, O.[Oren],
Aberdam, A.[Aviad],
Kittenplon, Y.[Yair],
Mazor, S.[Shai],
Litman, R.[Ron],
Towards Models that Can See and Read,
ICCV23(21661-21671)
IEEE DOI
2401
BibRef
Zhang, H.[Heng],
Liu, D.[Daqing],
Lv, Z.[Zezhong],
Su, B.[Bing],
Tao, D.C.[Da-Cheng],
Exploring Temporal Concurrency for Video-Language Representation
Learning,
ICCV23(15522-15532)
IEEE DOI Code:
WWW Link.
2401
BibRef
Shukor, M.[Mustafa],
Dancette, C.[Corentin],
Cord, M.[Matthieu],
eP-ALM: Efficient Perceptual Augmentation of Language Models,
ICCV23(21999-22012)
IEEE DOI Code:
WWW Link.
2401
BibRef
Schulter, S.[Samuel],
Kumar, B.G.V.[B.G. Vijay],
Suh, Y.M.[Yu-Min],
Dafnis, K.M.[Konstantinos M.],
Zhang, Z.X.[Zhi-Xing],
Zhao, S.Y.[Shi-Yu],
Metaxas, D.N.[Dimitris N.],
OmniLabel: A Challenging Benchmark for Language-Based Object
Detection,
ICCV23(11919-11928)
IEEE DOI Code:
WWW Link.
2401
BibRef
Chen, Z.L.[Zi-Liang],
Huang, X.[Xin],
Guan, Q.L.[Quan-Long],
Lin, L.[Liang],
Luo, W.Q.[Wei-Qi],
A Retrospect to Multi-prompt Learning across Vision and Language,
ICCV23(22133-22144)
IEEE DOI
2401
BibRef
Derakhshani, M.M.[Mohammad Mahdi],
Sanchez, E.[Enrique],
Bulat, A.[Adrian],
da Costa, V.G.T.[Victor Guilherme Turrisi],
Snoek, C.G.M.[Cees G. M.],
Tzimiropoulos, G.[Georgios],
Martinez, B.[Brais],
Bayesian Prompt Learning for Image-Language Model Generalization,
ICCV23(15191-15200)
IEEE DOI Code:
WWW Link.
2401
BibRef
Lin, W.[Wei],
Mirza, M.J.[Muhammad Jehanzeb],
Doveh, S.[Sivan],
Feris, R.[Rogerio],
Giryes, R.[Raja],
Hochreiter, S.[Sepp],
Karlinsky, L.[Leonid],
Comparison Visual Instruction Tuning,
Reasoning25(2964-2974)
IEEE DOI
2512
Visualization, Solid modeling, Large language models,
Benchmark testing, Cognition, Tuning, Anomaly detection,
visual instruction tuning
BibRef
Cascante-Bonilla, P.[Paola],
Shehada, K.[Khaled],
Smith, J.S.[James Seale],
Doveh, S.[Sivan],
Kim, D.H.[Dong-Hyun],
Panda, R.[Rameswar],
Varol, G.[Gül],
Oliva, A.[Aude],
Ordonez, V.[Vicente],
Feris, R.S.[Rogerio S.],
Karlinsky, L.[Leonid],
Going Beyond Nouns With Vision & Language Models Using Synthetic
Data,
ICCV23(20098-20108)
IEEE DOI
2401
BibRef
Upadhyay, U.[Uddeshya],
Karthik, S.[Shyamgopal],
Mancini, M.[Massimiliano],
Akata, Z.[Zeynep],
ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models,
ICCV23(1899-1910)
IEEE DOI Code:
WWW Link.
2401
BibRef
Bitton-Guetta, N.[Nitzan],
Bitton, Y.[Yonatan],
Hessel, J.[Jack],
Schmidt, L.[Ludwig],
Elovici, Y.[Yuval],
Stanovsky, G.[Gabriel],
Schwartz, R.[Roy],
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
Synthetic and Compositional Images,
ICCV23(2616-2627)
IEEE DOI
2401
BibRef
Hu, Z.Y.[Zi-Yuan],
Li, Y.Y.[Yan-Yang],
Lyu, M.R.[Michael R.],
Wang, L.W.[Li-Wei],
VL-PET: Vision-and-Language Parameter-Efficient Tuning via
Granularity Control,
ICCV23(2998-3008)
IEEE DOI Code:
WWW Link.
2401
BibRef
Slyman, E.[Eric],
Kahng, M.[Minsuk],
Lee, S.[Stefan],
VLSlice: Interactive Vision-and-Language Slice Discovery,
ICCV23(15245-15255)
IEEE DOI
2401
BibRef
Najibi, M.[Mahyar],
Ji, J.W.[Jing-Wei],
Zhou, Y.[Yin],
Qi, C.R.[Charles R.],
Yan, X.C.[Xin-Chen],
Ettinger, S.[Scott],
Anguelov, D.[Dragomir],
Unsupervised 3D Perception with 2D Vision-Language Distillation for
Autonomous Driving,
ICCV23(8568-8578)
IEEE DOI
2401
BibRef
Xu, H.[Hu],
Xie, S.[Saining],
Huang, P.Y.[Po-Yao],
Yu, L.C.[Li-Cheng],
Howes, R.[Russell],
Ghosh, G.[Gargi],
Zettlemoyer, L.[Luke],
Feichtenhofer, C.[Christoph],
CiT: Curation in Training for Effective Vision-Language Data,
ICCV23(15134-15143)
IEEE DOI
2401
BibRef
Trager, M.[Matthew],
Perera, P.[Pramuditha],
Zancato, L.[Luca],
Achille, A.[Alessandro],
Bhatia, P.[Parminder],
Soatto, S.[Stefano],
Linear Spaces of Meanings: Compositional Structures in
Vision-Language Models,
ICCV23(15349-15358)
IEEE DOI
2401
BibRef
Chen, Y.S.[Yi-Syuan],
Song, Y.Z.[Yun-Zhu],
Yeo, C.Y.[Cheng Yu],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks,
ICCV23(15384-15396)
IEEE DOI
2401
BibRef
Wu, C.E.[Cheng-En],
Tian, Y.[Yu],
Yu, H.C.[Hai-Chao],
Wang, H.[Heng],
Morgado, P.[Pedro],
Hu, Y.H.[Yu Hen],
Yang, L.J.[Lin-Jie],
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy
Labels?,
ICCV23(15442-15451)
IEEE DOI Code:
WWW Link.
2401
BibRef
Ouali, Y.[Yassine],
Bulat, A.[Adrian],
Matinez, B.[Brais],
Tzimiropoulos, G.[Georgios],
Black Box Few-Shot Adaptation for Vision-Language models,
ICCV23(15488-15500)
IEEE DOI Code:
WWW Link.
2401
BibRef
Kan, B.[Baoshuo],
Wang, T.[Teng],
Lu, W.P.[Wen-Peng],
Zhen, X.T.[Xian-Tong],
Guan, W.[Weili],
Zheng, F.[Feng],
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language
Models,
ICCV23(15624-15634)
IEEE DOI
2401
BibRef
Zhai, J.T.[Jiang-Tian],
Zhang, Q.[Qi],
Wu, T.[Tong],
Chen, X.Y.[Xing-Yu],
Liu, J.J.[Jiang-Jiang],
Cheng, M.M.[Ming-Ming],
SLAN: Self-Locator Aided Network for Vision-Language Understanding,
ICCV23(21892-21901)
IEEE DOI Code:
WWW Link.
2401
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Yuan, J.[Junkun],
Tan, Z.C.[Zi-Chang],
Liu, J.J.[Jiang-Jiang],
Zhou, L.P.[Lu-Ping],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models,
ICCV23(21902-21912)
IEEE DOI
2401
BibRef
Cho, E.[Eulrang],
Kim, J.[Jooyeon],
Kim, H.W.J.[Hyun-Woo J.],
Distribution-Aware Prompt Tuning for Vision-Language Models,
ICCV23(21947-21956)
IEEE DOI Code:
WWW Link.
2401
BibRef
Varma, M.[Maya],
Delbrouck, J.B.[Jean-Benoit],
Hooper, S.[Sarah],
Chaudhari, A.[Akshay],
Langlotz, C.[Curtis],
ViLLA: Fine-Grained Vision-Language Representation Learning from
Real-World Data,
ICCV23(22168-22178)
IEEE DOI
2401
BibRef
Zhu, H.G.[Hong-Guang],
Wei, Y.C.[Yun-Chao],
Liang, X.D.[Xiao-Dan],
Zhang, C.J.[Chun-Jie],
Zhao, Y.[Yao],
CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation,
ICCV23(22200-22210)
IEEE DOI Code:
WWW Link.
2401
BibRef
Hall, M.[Melissa],
Gustafson, L.[Laura],
Adcock, A.[Aaron],
Misra, I.[Ishan],
Ross, C.[Candace],
Vision-Language Models Performing Zero-Shot Tasks Exhibit Disparities
Between Gender Groups,
CLVL23(2770-2777)
IEEE DOI
2401
BibRef
Agnolucci, L.[Lorenzo],
Baldrati, A.[Alberto],
Todino, F.[Francesco],
Becattini, F.[Federico],
Bertini, M.[Marco],
del Bimbo, A.[Alberto],
ECO: Ensembling Context Optimization for Vision-Language Models,
CLVL23(2803-2807)
IEEE DOI
2401
BibRef
Palit, V.[Vedant],
Pandey, R.[Rohan],
Arora, A.[Aryaman],
Liang, P.P.[Paul Pu],
Towards Vision-Language Mechanistic Interpretability: A Causal
Tracing Tool for BLIP,
CLVL23(2848-2853)
IEEE DOI
2401
BibRef
Sammani, F.[Fawaz],
Deligiannis, N.[Nikos],
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language
Tasks,
VLAR23(4636-4641)
IEEE DOI
2401
BibRef
Lee, D.J.[Dong-Jun],
Song, S.[Seokwon],
Suh, J.[Jihee],
Choi, J.[Joonmyeong],
Lee, S.[Sanghyeok],
Kim, H.W.J.[Hyun-Woo J.],
Read-only Prompt Optimization for Vision-Language Few-shot Learning,
ICCV23(1401-1411)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, X.[Xuanlin],
Fang, Y.H.[Yun-Hao],
Liu, M.H.[Ming-Hua],
Ling, Z.[Zhan],
Tu, Z.W.[Zhuo-Wen],
Su, H.[Hao],
Distilling Large Vision-Language Model with Out-of-Distribution
Generalizability,
ICCV23(2492-2503)
IEEE DOI
2401
BibRef
Bi, J.Y.[Jun-Yu],
Cheng, D.[Daixuan],
Yao, P.[Ping],
Pang, B.[Bochen],
Zhan, Y.F.[Yue-Feng],
Yang, C.G.[Chuan-Guang],
Wang, Y.J.[Yu-Jing],
Sun, H.[Hao],
Deng, W.W.[Wei-Wei],
Zhang, Q.[Qi],
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and
Instance-Level Matching,
ICCV23(2584-2593)
IEEE DOI
2401
BibRef
Udandarao, V.[Vishaal],
Gupta, A.[Ankush],
Albanie, S.[Samuel],
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models,
ICCV23(2725-2736)
IEEE DOI Code:
WWW Link.
2401
BibRef
Jiang, C.Y.[Chao-Ya],
Xu, H.Y.[Hai-Yang],
Ye, W.[Wei],
Ye, Q.H.[Qing-Hao],
Li, C.L.[Chen-Liang],
Yan, M.[Ming],
Bi, B.[Bin],
Zhang, S.K.[Shi-Kun],
Huang, F.[Fei],
Huang, S.F.[Song-Fang],
BUS: Efficient and Effective Vision-language Pre-training with
Bottom-Up Patch Summarization,
ICCV23(2888-2898)
IEEE DOI
2401
BibRef
Shi, C.[Cheng],
Yang, S.[Sibei],
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for
Vision-Language Models,
ICCV23(2920-2929)
IEEE DOI
2401
BibRef
Wang, A.J.P.[Alex Jin-Peng],
Lin, K.Q.H.[Kevin Qing-Hong],
Zhang, D.J.H.[David Jun-Hao],
Lei, S.W.X.[Stan Wei-Xian],
Shou, M.Z.[Mike Zheng],
Too Large; Data Reduction for Vision-Language Pre-Training,
ICCV23(3124-3134)
IEEE DOI
2401
BibRef
Wang, W.H.[Wei-Han],
Yang, Z.[Zhen],
Xu, B.[Bin],
Li, J.Z.[Juan-Zi],
Sun, Y.K.[Yan-Kui],
ViLTA: Enhancing Vision-Language Pre-training through Textual
Augmentation,
ICCV23(3135-3146)
IEEE DOI
2401
BibRef
Boecking, B.[Benedikt],
Usuyama, N.[Naoto],
Bannur, S.[Shruthi],
Castro, D.C.[Daniel C.],
Schwaighofer, A.[Anton],
Hyland, S.[Stephanie],
Wetscherek, M.[Maria],
Naumann, T.[Tristan],
Nori, A.[Aditya],
Alvarez-Valle, J.[Javier],
Poon, H.[Hoifung],
Oktay, O.[Ozan],
Making the Most of Text Semantics to Improve Biomedical Vision-Language
Processing,
ECCV22(XXXVI:1-21).
Springer DOI
2211
BibRef
Cui, Q.[Quan],
Zhou, B.[Boyan],
Guo, Y.[Yu],
Yin, W.D.[Wei-Dong],
Wu, H.[Hao],
Yoshie, O.[Osamu],
Chen, Y.[Yubo],
Contrastive Vision-Language Pre-training with Limited Resources,
ECCV22(XXXVI:236-253).
Springer DOI
2211
BibRef
Hu, X.W.[Xiao-Wei],
Gan, Z.[Zhe],
Wang, J.F.[Jian-Feng],
Yang, Z.Y.[Zheng-Yuan],
Liu, Z.C.[Zi-Cheng],
Lu, Y.[Yumao],
Wang, L.J.[Li-Juan],
Scaling Up Vision-Language Pretraining for Image Captioning,
CVPR22(17959-17968)
IEEE DOI
2210
Training, Visualization, Computational modeling, Training data,
Benchmark testing, Transformers, Feature extraction, Vision + language
BibRef
Zhang, P.C.[Peng-Chuan],
Li, X.J.[Xiu-Jun],
Hu, X.W.[Xiao-Wei],
Yang, J.W.[Jian-Wei],
Zhang, L.[Lei],
Wang, L.J.[Li-Juan],
Choi, Y.J.[Ye-Jin],
Gao, J.F.[Jian-Feng],
VinVL: Revisiting Visual Representations in Vision-Language Models,
CVPR21(5575-5584)
IEEE DOI
2111
Training, Visualization, Computational modeling, Object detection,
Benchmark testing, Feature extraction, Transformers
BibRef
Li, Z.W.[Zhuo-Wan],
Stengel-Eskin, E.[Elias],
Zhang, Y.X.[Yi-Xiao],
Xie, C.[Cihang],
Tran, Q.[Quan],
van Durme, B.[Benjamin],
Yuille, A.L.[Alan L.],
Calibrating Concepts and Operations:
Towards Symbolic Reasoning on Real Images,
ICCV21(14890-14899)
IEEE DOI
2203
Visualization, Analytical models, Codes, Computational modeling,
Cognition, Data models, Vision + language
BibRef
Yang, X.[Xu],
Zhang, H.W.[Han-Wang],
Qi, G.J.[Guo-Jun],
Cai, J.F.[Jian-Fei],
Causal Attention for Vision-Language Tasks,
CVPR21(9842-9852)
IEEE DOI
2111
Correlation, Codes, Computational modeling,
Training data, Transformers, Data models
BibRef
Zheng, W.B.[Wen-Bo],
Yan, L.[Lan],
Gou, C.[Chao],
Wang, F.Y.[Fei-Yue],
Webly Supervised Knowledge Embedding Model for Visual Reasoning,
CVPR20(12442-12451)
IEEE DOI
2008
Visual reasoning between visual image and natural language description.
Visualization, Cognition, Knowledge based systems, Task analysis,
Knowledge engineering, Modulation, Robustness
BibRef
Nguyen, D.K.[Duy-Kien],
Okatani, T.[Takayuki],
Multi-Task Learning of Hierarchical Vision-Language Representation,
CVPR19(10484-10493).
IEEE DOI
2002
BibRef
Gupta, T.[Tanmay],
Shih, K.J.[Kevin J.],
Singh, S.[Saurabh],
Hoiem, D.[Derek],
Aligned Image-Word Representations Improve Inductive Transfer Across
Vision-Language Tasks,
ICCV17(4223-4232)
IEEE DOI
1802
data visualisation, image recognition,
learning (artificial intelligence),
Visualization
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Attacks on Vision-Language Models .