Tamaazousti, Y.[Youssef],
Le Borgne, H.[Hervé],
Popescu, A.[Adrian],
Gadeski, E.[Etienne],
Ginsca, A.[Alexandru],
Hudelot, C.[Céline],
Vision-language integration using constrained local semantic features,
CVIU(163), No. 1, 2017, pp. 41-57.
Elsevier DOI
1712
Image classification
BibRef
Zhu, Y.Q.[Yong-Qing],
Li, X.Y.[Xiang-Yang],
Zheng, M.[Mao],
Yang, J.H.[Jia-Hao],
Wang, Z.H.[Zi-Han],
Guo, X.Q.[Xiao-Qian],
Chai, Z.F.[Zi-Feng],
Yuan, Y.C.[Yu-Chen],
Jiang, S.Q.[Shu-Qiang],
Focus and Align: Learning Tube Tokens for Video-Language Pre-Training,
MultMed(25), 2023, pp. 8036-8050.
IEEE DOI
2312
BibRef
Wu, W.H.[Wen-Hao],
Sun, Z.[Zhun],
Song, Y.X.[Yu-Xin],
Wang, J.D.[Jing-Dong],
Ouyang, W.L.[Wan-Li],
Transferring Vision-Language Models for Visual Recognition:
A Classifier Perspective,
IJCV(132), No. 2, February 2024, pp. 392-409.
Springer DOI
2402
BibRef
Ming, Y.F.[Yi-Fei],
Li, Y.X.[Yi-Xuan],
How Does Fine-Tuning Impact Out-of-Distribution Detection for
Vision-Language Models?,
IJCV(132), No. 2, February 2024, pp. 596-609.
Springer DOI
2402
BibRef
Zhao, C.R.[Cai-Rong],
Wang, Y.[Yubin],
Jiang, X.Y.[Xin-Yang],
Shen, Y.F.[Yi-Fei],
Song, K.[Kaitao],
Li, D.S.[Dong-Sheng],
Miao, D.Q.[Duo-Qian],
Learning Domain Invariant Prompt for Vision-Language Models,
IP(33), 2024, pp. 1348-1360.
IEEE DOI
2402
Task analysis, Tuning, Training, Adaptation models, Visualization,
Image color analysis, Self-supervised learning, Prompt learning,
domain generalization
BibRef
Yang, X.F.[Xiao-Feng],
Liu, F.[Fayao],
Lin, G.S.[Guo-Sheng],
Neural Logic Vision Language Explainer,
MultMed(26), 2024, pp. 3331-3340.
IEEE DOI
2402
Cognition, Logic programming, Deep learning, Visualization,
Data models, Training, Markov processes,
vision language pretraining
BibRef
Wang, Y.D.[Yi-Dong],
Yu, Z.O.[Zhu-Ohao],
Wang, J.D.[Jin-Dong],
Heng, Q.[Qiang],
Chen, H.[Hao],
Ye, W.[Wei],
Xie, R.[Rui],
Xie, X.[Xing],
Zhang, S.K.[Shi-Kun],
Exploring Vision-Language Models for Imbalanced Learning,
IJCV(132), No. 1, January 2024, pp. 224-237.
Springer DOI
2402
BibRef
Zeng, Y.[Yan],
Zhang, X.[Xinsong],
Li, H.[Hang],
Wang, J.W.[Jia-Wei],
Zhang, J.P.[Ji-Peng],
Zhou, W.[Wangchunshu],
X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks,
PAMI(46), No. 5, May 2024, pp. 3156-3168.
IEEE DOI
2404
Task analysis, Visualization, Transformers, Detectors, Training,
Feature extraction, Image coding,
vision language pre-training
BibRef
Kong, D.[Daehyeon],
Kong, K.[Kyeongbo],
Kang, S.J.[Suk-Ju],
Image clustering using generated text centroids,
SP:IC(125), 2024, pp. 117128.
Elsevier DOI
2405
Deep neural network, Image clustering, Multimodal task, Vision-language model
BibRef
Chen, X.Y.[Xian-Yu],
Yang, J.H.[Jin-Hui],
Chen, S.[Shi],
Wang, L.[Louis],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Every Problem, Every Step, All in Focus: Learning to Solve
Vision-Language Problems With Integrated Attention,
PAMI(46), No. 7, July 2024, pp. 4720-4735.
IEEE DOI
2406
Problem-solving, Task analysis, Visualization, Measurement,
Graph neural networks, Cognition, Videos, Graph attention,
vision-language problem solving
BibRef
Menon, S.[Sachit],
Chandratreya, I.P.[Ishaan Preetam],
Vondrick, C.[Carl],
Task Bias in Contrastive Vision-Language Models,
IJCV(132), No. 6, June 2024, pp. 2026-2040.
Springer DOI
2406
BibRef
Zhang, J.Y.[Jing-Yi],
Huang, J.X.[Jia-Xing],
Jin, S.[Sheng],
Lu, S.J.[Shi-Jian],
Vision-Language Models for Vision Tasks: A Survey,
PAMI(46), No. 8, August 2024, pp. 5625-5644.
IEEE DOI
2407
Task analysis, Visualization, Training, Deep learning, Surveys,
Data models, Predictive models, Big Data, big model, deep learning,
image classification
BibRef
Dong, M.P.[Meng-Ping],
Li, F.[Fei],
Li, Z.B.[Zhen-Bo],
Liu, X.[Xue],
Cluster prototype earth mover's distance adapters and
alignment-guided prompt learning for vision-language models,
PR(156), 2024, pp. 110861.
Elsevier DOI
2408
Cluster prototype, Earth mover's distance, Adapter,
Prompt learning, Vision-language models
BibRef
Liu, Y.[Ye],
Pan, Y.[Yan],
Yin, J.[Jian],
Enhancing Multi-Label Deep Hashing for Image and Audio With Joint
Internal Global Loss Constraints and Large Vision-Language Model,
SPLetters(31), 2024, pp. 2550-2554.
IEEE DOI
2410
Codes, Transformers, Adaptation models, Training,
Convolutional neural networks, Feature extraction,
vision transformer
BibRef
Zhan, C.L.[Chen-Lu],
Zhang, Y.F.[Yu-Fei],
Lin, Y.[Yu],
Wang, G.A.[Gao-Ang],
Wang, H.W.[Hong-Wei],
UniDCP: Unifying Multiple Medical Vision-Language Tasks via Dynamic
Cross-Modal Learnable Prompts,
MultMed(26), 2024, pp. 9736-9748.
IEEE DOI
2410
Task analysis, Adaptation models, Visualization,
Medical diagnostic imaging, Tuning, Multitasking, Plastics,
cross-modal shareable space
BibRef
Su, K.[Ke],
Zhang, X.X.[Xing-Xing],
Zhang, S.Y.[Si-Yang],
Zhu, J.[Jun],
Zhang, B.[Bo],
To Boost Zero-Shot Generalization for Embodied Reasoning With
Vision-Language Pre-Training,
IP(33), 2024, pp. 5370-5381.
IEEE DOI
2410
Cognition, Visualization, Artificial intelligence, Training,
Image reconstruction, Navigation, vision-language pre-training
BibRef
Xuan, S.Y.[Shi-Yu],
Yang, M.[Ming],
Zhang, S.L.[Shi-Liang],
Adapting Vision-Language Models via Learning to Inject Knowledge,
IP(33), 2024, pp. 5798-5809.
IEEE DOI
2410
Feature extraction, Visualization, Adaptation models, Tuning,
Training, Transformers, Dogs, Accuracy, Robustness, Few shot learning,
knowledge injection
BibRef
Zhou, W.[Wenlve],
Zhou, Z.H.[Zhi-Heng],
Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training,
CirSysVideo(34), No. 9, September 2024, pp. 8201-8214.
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Task analysis, Training, Computational modeling, Tuning,
Data models, Visualization, Unsupervised domain adaptation, model deployment
BibRef
Guo, M.H.[Meng-Hao],
Zhang, Y.[Yi],
Mu, T.J.[Tai-Jiang],
Huang, S.X.[Sharon X.],
Hu, S.M.[Shi-Min],
Tuning Vision-Language Models With Multiple Prototypes Clustering,
PAMI(46), No. 12, December 2024, pp. 11186-11199.
IEEE DOI
2411
Prototypes, Adaptation models, Tuning, Visualization,
Benchmark testing, Computational modeling, Data models, clustering
BibRef
Sun, B.[Bo],
Wu, Z.C.[Zhi-Chao],
Zhang, H.[Hao],
He, J.[Jun],
VTPL: Visual and text prompt learning for visual-language models,
JVCIR(104), 2024, pp. 104280.
Elsevier DOI
2411
V-L models, Prompt learning, Visual and text prompts,
Poly-1 information NCE loss, Center loss
BibRef
Liu, L.C.[Liang-Chen],
Wang, N.N.[Nan-Nan],
Liu, D.[Decheng],
Yang, X.[Xi],
Gao, X.B.[Xin-Bo],
Liu, T.L.[Tong-Liang],
Towards Specific Domain Prompt Learning via Improved Text Label
Optimization,
MultMed(26), 2024, pp. 10805-10815.
IEEE DOI
2411
Visualization, Optimization, Semantics, Task analysis, Terminology,
Learning systems, Adaptation models, vision-language model
BibRef
Liu, X.[Xin],
Wu, J.[Jiamin],
Yang, W.F.[Wen-Fei],
Zhou, X.[Xu],
Zhang, T.Z.[Tian-Zhu],
Multi-Modal Attribute Prompting for Vision-Language Models,
CirSysVideo(34), No. 11, November 2024, pp. 11579-11591.
IEEE DOI
2412
Visualization, Task analysis, Semantics, Adaptation models,
Integrated circuit modeling, Vectors,
attribute
BibRef
Jiang, H.J.[Hao-Jun],
Zhang, J.K.[Jian-Ke],
Huang, R.[Rui],
Ge, C.J.[Chun-Jiang],
Ni, Z.[Zanlin],
Song, S.[Shiji],
Huang, G.[Gao],
Cross-modal adapter for vision-language retrieval,
PR(159), 2025, pp. 111144.
Elsevier DOI
2412
Adapter, Cross-modal interaction, Cross-modal retrieval,
Parameter-efficient training, Multi-modal learning
BibRef
Yellinek, N.[Nir],
Karlinsky, L.[Leonid],
Giryes, R.[Raja],
3VL: Using Trees to Improve Vision-Language Models' Interpretability,
IP(34), 2025, pp. 495-509.
IEEE DOI
2501
aligning image and text representations.
Random forests, Visualization, Training, Cognition, Feature extraction,
Transformers, Forestry, Animals, compositional reasoning
BibRef
Yang, L.F.[Ling-Feng],
Li, X.[Xiang],
Wang, Y.Z.[Yue-Ze],
Wang, X.L.[Xin-Long],
Yang, J.[Jian],
Fine-Grained Visual Text Prompting,
PAMI(47), No. 3, March 2025, pp. 1594-1609.
IEEE DOI
2502
What kind of visual prompts to add.
Visualization, Semantics, Image segmentation, Crops, Tuning, Detectors,
Proposals, Location awareness, Grounding, Gray-scale, zero-shot
BibRef
Wang, F.[Fan],
Han, Z.Y.[Zhong-Yi],
Liu, X.[Xingbo],
Yin, Y.L.[Yi-Long],
Gao, X.[Xin],
CTPT: Continual Test-time Prompt Tuning for vision-language models,
PR(161), 2025, pp. 111300.
Elsevier DOI
2502
Test-time adaptation,
Contrastive Language-Image Pretraining (CLIP),
Stable self-learning
BibRef
Liang, N.[Nanhao],
Liu, Y.[Yong],
DPO: Discrete Prompt Optimization for Vision-Language Models,
SPLetters(32), 2025, pp. 671-675.
IEEE DOI
2502
Training, Optimization, Adaptation models, Visualization,
Overfitting, Vectors, Vocabulary, Signal processing algorithms,
vision-language model
BibRef
Ondeng, O.[Oscar],
Ouma, H.[Heywood],
Akuon, P.[Peter],
Enriching visual feature representations for vision-language tasks
using spectral transforms,
IVC(154), 2025, pp. 105390.
Elsevier DOI
2502
Visual feature enrichment, Transformers, Image captioning,
Discrete Fourier Transform, MS COCO, Kylberg dataset, Diversity
BibRef
Xu, C.[Chen],
Zhu, Y.H.[Yu-Han],
Shen, H.C.[Hao-Cheng],
Chen, B.H.[Bo-Heng],
Liao, Y.X.[Yi-Xuan],
Chen, X.X.[Xiao-Xin],
Wang, L.M.[Li-Min],
Progressive Visual Prompt Learning with Contrastive Feature
Re-formation,
IJCV(133), No. 2, February 2025, pp. 511-526.
Springer DOI
2502
Adapting the pre-trained Vision-Language Models.
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Yuan, J.K.[Jun-Kun],
Tan, Z.C.[Zi-Chang],
Liu, J.J.[Jiang-Jiang],
Feng, J.Y.[Jing-Yuan],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Mutual Prompt Leaning for Vision Language Models,
IJCV(133), No. 3, March 2025, pp. 1258-1276.
Springer DOI
2502
BibRef
Yin, J.H.[Jun-Hui],
Zhang, X.Y.[Xin-Yu],
Wu, L.[Lin],
Wang, X.J.[Xiao-Jie],
Context-aware prompt learning for test-time vision recognition with
frozen vision-language model,
PR(162), 2025, pp. 111359.
Elsevier DOI Code:
WWW Link.
2503
In-context learning, Prompt learning, Vision-language model,
Vision recognition, Test-time adaptation
BibRef
Chen, Y.[Yeming],
Zhang, S.[Siyu],
Sun, Y.[Yaoru],
Yang, J.[Jun],
Liang, W.J.[Wei-Jian],
Wang, H.R.[Hao-Ran],
Artificial-Spiking Hierarchical Networks for Vision-Language
Representation Learning,
CirSysVideo(35), No. 3, March 2025, pp. 2768-2781.
IEEE DOI Code:
WWW Link.
2503
Visualization, Semantics, Computational modeling, Transformers,
Feature extraction, Object detection,
multimodal alignment
BibRef
Li, B.Z.[Bin-Zhe],
Wang, S.R.[Shu-Run],
Wang, S.Q.[Shi-Qi],
Ye, Y.[Yan],
High Efficiency Image Compression for Large Visual-Language Models,
CirSysVideo(35), No. 3, March 2025, pp. 2870-2880.
IEEE DOI
2503
Image coding, Visualization, Machine vision, Codecs, Semantics,
Standards, Image reconstruction, Bit rate, pre-editing process
BibRef
Liu, L.C.[Liang-Chen],
Wang, N.N.[Nan-Nan],
Zhou, D.W.[Da-Wei],
Liu, D.C.[De-Cheng],
Yang, X.[Xi],
Gao, X.B.[Xin-Bo],
Liu, T.L.[Tong-Liang],
Generalizable Prompt Learning via Gradient Constrained
Sharpness-Aware Minimization,
MultMed(27), 2025, pp. 1100-1113.
IEEE DOI
2503
Improving the performance on unseen classes while maintaining the
performance on seen classes.
Optimization, Minimization, Visualization, Training, Degradation,
Vectors, Telecommunications, Intserv networks, Geometry,
sharpness-aware minimization
BibRef
Lu, Z.[Zhihe],
Bai, J.[Jiawang],
Li, X.[Xin],
Xiao, Z.[Zeyu],
Wang, X.C.[Xin-Chao],
Task-to-Instance Prompt Learning for Vision-Language Models at Test
Time,
IP(34), 2025, pp. 1908-1920.
IEEE DOI Code:
WWW Link.
2504
Training, Training data, Visualization, Adaptation models, Learning systems,
Image recognition, Dogs, Vectors, Entropy, task-to-instance
BibRef
Fang, Z.Q.[Zheng-Qing],
Yuan, Z.H.[Zhou-Hang],
Li, Z.Y.[Zi-Yu],
Chen, J.Y.[Jing-Yuan],
Kuang, K.[Kun],
Yao, Y.F.[Yu-Feng],
Wu, F.[Fei],
Cross-Modality Image Interpretation via Concept Decomposition Vector
of Visual-Language Models,
CirSysVideo(35), No. 4, April 2025, pp. 3024-3038.
IEEE DOI
2504
Visualization, Vectors, Semantics, Training, Image representation,
Task analysis, visual-language models
BibRef
Ramzi, E.[Elias],
Audebert, N.[Nicolas],
Rambour, C.[Clément],
Araujo, A.[André],
Bitot, X.[Xavier],
Thome, N.[Nicolas],
Optimization of Rank Losses for Image Retrieval,
PAMI(47), No. 6, June 2025, pp. 4317-4329.
IEEE DOI
2505
Training, Image retrieval, Measurement, Standards, Data mining,
Artificial intelligence, Loss measurement, non-decomposable
BibRef
Lafon, M.[Marc],
Ramzi, E.[Elias],
Rambour, C.[Clément],
Audebert, N.[Nicolas],
Thome, N.[Nicolas],
Gallop: Learning Global and Local Prompts for Vision-language Models,
ECCV24(LXI: 264-282).
Springer DOI
2412
BibRef
Liu, K.C.[Kang-Cheng],
Wang, C.Q.[Chao-Qun],
Han, X.D.[Xiao-Dong],
Liu, Y.J.[Yong-Jin],
Chen, B.Q.[Bao-Quan],
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware
Contrast,
IJCV(133), No. 6, June 2025, pp. Psges 3481-3518.
Springer DOI
2505
BibRef
And:
Correction:
IJCV(133), No. 7, July 2025, pp. 4971-4971.
Springer DOI
2506
BibRef
Yang, L.X.[Ling-Xiao],
Zhang, R.Y.[Ru-Yuan],
Chen, Q.[Qi],
Xie, X.H.[Xiao-Hua],
Learning with Enriched Inductive Biases for Vision-Language Models,
IJCV(133), No. 6, June 2025, pp. Psges 3746-3761.
Springer DOI
2505
BibRef
Zhang, W.Y.[Wen-Yao],
Wu, L.[Letian],
Zhang, Z.Q.[Ze-Qun],
Yu, T.[Tao],
Ma, C.[Chao],
Jin, X.[Xin],
Yang, X.K.[Xiao-Kang],
Zeng, W.J.[Wen-Jun],
Unleash the Power of Vision-Language Models by Visual Attention
Prompt and Multimodal Interaction,
MultMed(27), 2025, pp. 2399-2411.
IEEE DOI
2505
Visualization, Adaptation models, Tuning, Training,
Computational modeling, Tail, Pipelines, Overfitting, Nose, Attention,
vision-language models
BibRef
Weng, Y.[Yu],
He, W.B.[Wen-Bin],
Dong, J.[Jun],
Chaomurilige,
Liu, X.[Xuan],
Liu, Z.[Zheng],
Cross-Lingual Adaptation for Vision-Language Model via Multimodal
Semantic Distillation,
MultMed(27), 2025, pp. 3184-3196.
IEEE DOI
2506
Adaptation models, Multilingual, Visualization, Training, Semantics,
Data models, Natural language processing, Translation, zero-shot learning
BibRef
Liang, J.W.[Jia-Wei],
Liang, S.Y.[Si-Yuan],
Liu, A.S.[Ai-Shan],
Cao, X.C.[Xiao-Chun],
VL-Trojan: Multimodal Instruction Backdoor Attacks against
Autoregressive Visual Language Models,
IJCV(133), No. 7, July 2025, pp. 3994-4013.
Springer DOI
2506
BibRef
Yao, H.T.[Han-Tao],
Zhang, R.[Rui],
Lyu, H.H.[Huai-Hai],
Zhang, Y.D.[Yong-Dong],
Xu, C.S.[Chang-Sheng],
Bi-Modality Individual-Aware Prompt Tuning for Visual-Language Model,
PAMI(47), No. 8, August 2025, pp. 6352-6368.
IEEE DOI
2507
BibRef
Earlier: A1, A2, A5, Only:
TCP: Textual-Based Class-Aware Prompt Tuning for Visual-Language
Model,
CVPR24(23438-23448)
IEEE DOI Code:
WWW Link.
2410
Tuning, Visualization, Training, Adaptation models, Hands,
Feature extraction, Data models, Artificial intelligence,
visual-language model.
Benchmark testing.
BibRef
Hao, Z.W.[Zhi-Wei],
Guo, J.Y.[Jian-Yuan],
Shen, L.[Li],
Luo, Y.[Yong],
Hu, H.[Han],
Wen, Y.G.[Yong-Gang],
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
Tuning,
IJCV(133), No. 8, August 2025, pp. 5527-5543.
Springer DOI
2508
BibRef
Zeng, R.F.[Rong-Fei],
Yang, Z.P.[Zhi-Peng],
Yu, R.Y.[Rui-Yun],
Zhang, Y.G.[Yong-Gang],
Supplementary Prompt Learning for Vision-Language Models,
IJCV(133), No. 8, August 2025, pp. 5822-5839.
Springer DOI
2508
BibRef
Liu, K.C.[Kang-Cheng],
Liu, Y.J.[Yong-Jin],
Chen, B.Q.[Bao-Quan],
General 3D Vision-Language Model With Fast Rendering and Pre-Training
Vision-Language Alignment,
PAMI(47), No. 9, September 2025, pp. 7352-7368.
IEEE DOI
2508
Point cloud compression, Semantics, Training, Solid modeling,
Contrastive learning, Data mining, Visualization,
3D vision-language model
BibRef
Gao, Y.S.[Yan-Sheng],
Zhu, Z.X.[Zi-Xi],
Wang, S.S.[Sheng-Sheng],
Mixture of coarse and fine-grained prompt tuning for vision-language
model,
PR(170), 2026, pp. 112074.
Elsevier DOI
2509
Prompt learning, Vision-language models,
Coarse domain-shared information,
BibRef
Hao, F.S.[Fu-Sheng],
Liu, L.[Liu],
Wu, F.X.[Fu-Xiang],
Zhang, Q.S.[Qie-Shi],
Cheng, J.[Jun],
Textual Embeddings are Good Class-Aware Visual Prompts for Adapting
Vision-Language Models,
SPLetters(32), 2025, pp. 2992-2996.
IEEE DOI
2509
Visualization, Tuning, Semantics, Harmonic analysis, Accuracy,
Optimization, Artificial intelligence, Vectors, Training, TV,
class-aware visual prompts
BibRef
Dhouib, M.[Mohamed],
Buscaldi, D.[Davide],
Vanier, S.[Sonia],
Shabou, A.[Aymen],
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual
Language Models,
CVPR25(14582-14592)
IEEE DOI
2508
Connectors, Training, Measurement, Visualization,
Computational modeling, Redundancy, Merging, Oral communication
BibRef
Xie, P.[Peng],
Bie, Y.[Yequan],
Mao, J.[Jianda],
Song, Y.Q.[Yang-Qiu],
Wang, Y.[Yang],
Chen, H.[Hao],
Chen, K.[Kani],
Chain of Attack: On the Robustness of Vision-Language Models Against
Transfer-Based Adversarial Attacks,
CVPR25(14679-14689)
IEEE DOI
2508
Correlation, Computational modeling, Semantics, Closed box,
Robustness, Natural language processing, Safety, robustness
BibRef
Yu, C.[Chong],
Chen, T.[Tao],
Gan, Z.X.[Zhong-Xue],
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple
Vision-Language Model Variants,
CVPR25(14712-14722)
IEEE DOI
2508
Training, Adaptation models, Accuracy, Tensors, Memory management,
Hardware, Model compression, Tuning, Optimization, dynamic expansion capability
BibRef
Hao, F.S.[Fu-Sheng],
He, F.X.[Feng-Xiang],
Wu, F.[Fuxiang],
Wang, T.[Tichao],
Song, C.Q.[Cheng-Qun],
Cheng, J.[Jun],
Task-Aware Clustering for Prompting Vision-Language Models,
CVPR25(14745-14755)
IEEE DOI Code:
WWW Link.
2508
Adaptation models, Visualization, Attention mechanisms, Codes,
Interference, Benchmark testing, Optimization, Overfitting
BibRef
Koleilat, T.[Taha],
Asgariandehkordi, H.[Hojat],
Rivaz, H.[Hassan],
Xiao, Y.M.[Yi-Ming],
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models,
CVPR25(14766-14776)
IEEE DOI Code:
WWW Link.
2508
Representation learning, Adaptation models, Visualization,
Accuracy, Biological system modeling, Semantics,
vision-language models
BibRef
Nath, V.[Vishwesh],
Li, W.Q.[Wen-Qi],
Yang, D.[Dong],
Myronenko, A.[Andriy],
Zheng, M.X.[Ming-Xin],
Lu, Y.[Yao],
Liu, Z.J.[Zhi-Jian],
Yin, H.X.[Hong-Xu],
Law, Y.M.[Yee Man],
Tang, Y.C.[Yu-Cheng],
Guo, P.F.[Peng-Fei],
Zhao, C.[Can],
Xu, Z.Y.[Zi-Yue],
He, Y.F.[Yu-Fan],
Harmon, S.[Stephanie],
Simon, B.[Benjamin],
Heinrich, G.[Greg],
Aylward, S.[Stephen],
Edgar, M.[Marc],
Zephyr, M.[Michael],
Molchanov, P.[Pavlo],
Turkbey, B.[Baris],
Roth, H.[Holger],
Xu, D.[Daguang],
VILA-M3: Enhancing Vision-Language Models with Medical Expert
Knowledge,
CVPR25(14788-14798)
IEEE DOI
2508
Deep learning, Computational modeling, Medical services,
Feature extraction, Data models, Reliability, Tumors, radiology
BibRef
Zhang, D.[Di],
Lei, J.[Jingdi],
Li, J.X.[Jun-Xian],
Wang, X.Z.[Xun-Zhi],
Liu, Y.J.[Yu-Jie],
Yang, Z.L.[Zong-Lin],
Li, J.T.[Jia-Tong],
Wang, W.[Weida],
Yang, S.[Suorong],
Wu, J.B.[Jian-Bo],
Ye, P.[Peng],
Ouyang, W.L.[Wan-Li],
Zhou, D.Z.[Dong-Zhan],
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning,
CVPR25(9050-9061)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Computational modeling, Natural languages,
Benchmark testing, Cognition, Mathematical models, Reliability,
multimodal reasoning
BibRef
Du, H.[Hao],
Wu, B.[Bo],
Lu, Y.[Yan],
Mao, Z.D.[Zhen-Dong],
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic
Video Situation,
CVPR25(13798-13809)
IEEE DOI
2508
Measurement, Visualization, Filtering, Statistical analysis,
Pipelines, Benchmark testing, Videos
BibRef
Kaduri, O.[Omri],
Bagon, S.[Shai],
Dekel, T.[Tali],
What's in the Image? A Deep-Dive into the Vision of Vision Language
Models,
CVPR25(14549-14558)
IEEE DOI
2508
Visualization, Analytical models, Image coding, Focusing,
Data models, Data mining, Videos
BibRef
Xing, L.[Long],
Huang, Q.D.[Qi-Dong],
Dong, X.Y.[Xiao-Yi],
Lu, J.J.[Jia-Jie],
Zhang, P.[Pan],
Zang, Y.H.[Yu-Hang],
Cao, Y.H.[Yu-Hang],
He, C.H.[Cong-Hui],
Wang, J.Q.[Jia-Qi],
Wu, F.[Feng],
Lin, D.[Dahua],
Conical Visual Concentration for Efficient Large Vision-Language
Models,
CVPR25(14593-14603)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Costs, Codes, Redundancy, Boosting,
large vision language model, efficient training, efficient inference
BibRef
Zhang, L.[Le],
Yang, Q.[Qian],
Agrawal, A.[Aishwarya],
Assessing and Learning Alignment of Unimodal Vision and Language
Models,
CVPR25(14604-14614)
IEEE DOI
2508
Training, Translation, Computational modeling,
Semantic segmentation, Transfer learning, Object recognition
BibRef
Sehgal, A.[Atharva],
Yuan, P.[Patrick],
Hu, Z.[Ziniu],
Yue, Y.S.[Yi-Song],
Sun, J.J.[Jennifer J.],
Chaudhuri, S.[Swarat],
Self-Evolving Visual Concept Library using Vision-Language Critics,
CVPR25(13124-13134)
IEEE DOI
2508
Visualization, Annotations, Buildings, Manuals, Libraries, Cognition, History,
Few shot learning, program synthesis, visual programming, library learning
BibRef
Wang, W.H.[Wei-Han],
Wang, L.[Lefan],
Gu, X.T.[Xiao-Tao],
Huang, S.Y.[Shi-Yu],
Dong, Y.X.[Yu-Xiao],
Tang, J.[Jie],
MotionBench: Benchmarking and Improving Fine-Grained Video Motion
Understanding for Vision Language Models,
CVPR25(8450-8460)
IEEE DOI Code:
WWW Link.
2508
Visualization, Benchmark testing, Data models, Videos,
vision language model, fine-grained video motion understanding, benchmark
BibRef
Nacson, M.S.[Mor Shpigel],
Aberdam, A.[Aviad],
Ganz, R.[Roy],
Avraham, E.B.[Elad Ben],
Golts, A.[Alona],
Kittenplon, Y.[Yair],
Mazor, S.[Shai],
Litman, R.[Ron],
DocVLM: Make Your VLM an Efficient Reader,
CVPR25(29005-29015)
IEEE DOI
2508
Visualization, Image coding, Computational modeling,
Optical character recognition, Layout, Computational efficiency,
Text processing
BibRef
Alhamoud, K.[Kumail],
Alshammari, S.[Shaden],
Tian, Y.L.[Yong-Long],
Li, G.H.[Guo-Hao],
Torr, P.H.S.[Philip H.S.],
Kim, Y.[Yoon],
Ghassemi, M.[Marzyeh],
Vision-Language Models Do Not Understand Negation,
CVPR25(29612-29622)
IEEE DOI
2508
Training, Accuracy, Computational modeling, Natural languages,
Benchmark testing, Videos, Synthetic data, Biomedical imaging, benchmarks
BibRef
Schmalfuss, J.[Jenny],
Chang, N.[Nadine],
VS, V.[Vibashan],
Shen, M.[Maying],
Bruhn, A.[Andrés],
Alvarez, J.M.[Jose M.],
PARC: A Quantitative Framework Uncovering the Symmetries within
Vision Language Models,
CVPR25(25081-25091)
IEEE DOI Code:
WWW Link.
2508
Visualization, Analytical models, Sensitivity,
Sensitivity analysis, Computational modeling, Semantics,
prompt sensitivity
BibRef
Xiao, J.Q.[Jin-Qi],
Sang, S.[Shen],
Zhi, T.C.[Tian-Cheng],
Liu, J.[Jing],
Yan, Q.[Qing],
Luo, L.J.[Lin-Jie],
Yuan, B.[Bo],
COAP: Memory-Efficient Training with Correlation-Aware Gradient
Projection,
CVPR25(30116-30126)
IEEE DOI Code:
WWW Link.
2508
Training, Degradation, Quantization (signal),
Computational modeling, Neural networks, Flora,
vision language model
BibRef
Zhu, Y.Q.[Yi-Qi],
Wang, Z.Y.[Zi-Yue],
Zhang, C.[Can],
Li, P.[Peng],
Liu, Y.[Yang],
CoSpace: Benchmarking Continuous Space Perception Ability for
Vision-Language Models,
CVPR25(29569-29579)
IEEE DOI
2508
Visualization, Analytical models, Accuracy, Computational modeling,
Benchmark testing, Cognition, Image reconstruction,
continuous space perception
BibRef
Kang, H.Q.[Hao-Qiang],
Sachdeva, E.[Enna],
Gupta, P.[Piyush],
Bae, S.J.[Sang-Jae],
Lee, K.[Kwonjoon],
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models
with Generative Flow Networks,
CVPR25(3815-3825)
IEEE DOI Code:
WWW Link.
2508
Training, Decision making, Distributed databases,
Reinforcement learning, Games, Cognition, Planning, Optimization,
gflownets
BibRef
Li, L.[Lei],
Wei, Y.C.[Yuan-Cheng],
Xie, Z.H.[Zhi-Hui],
Yang, X.[Xuqing],
Song, Y.F.[Yi-Fan],
Wang, P.[Peiyi],
An, C.X.[Chen-Xin],
Liu, T.Y.[Tian-Yu],
Li, S.[Sujian],
Lin, B.Y.C.[Bill Yu-Chen],
Kong, L.P.[Ling-Peng],
Liu, Q.[Qi],
VL-RewardBench: A Challenging Benchmark for Vision-Language
Generative Reward Models,
CVPR25(24657-24668)
IEEE DOI Code:
WWW Link.
2508
Training, Analytical models, Visualization, Accuracy, Pipelines,
Benchmark testing, Cognition, Reliability, Probes, Visual perception,
multimodal large language models
BibRef
Chen, J.H.[Jiu-Hai],
Yang, J.W.[Jian-Wei],
Wu, H.P.[Hai-Ping],
Li, D.[Dianqi],
Gao, J.F.[Jian-Feng],
Zhou, T.Y.[Tian-Yi],
Xiao, B.[Bin],
Florence-VL: Enhancing Vision-Language Models with Generative Vision
Encoder and Depth-Breadth Fusion,
CVPR25(24928-24938)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Statistical analysis,
Computational modeling, Optical character recognition, Tuning
BibRef
Yang, C.Y.[Chen-Yu],
Dong, X.[Xuan],
Zhu, X.Z.[Xi-Zhou],
Su, W.J.[Wei-Jie],
Wang, J.H.[Jia-Hao],
Tian, H.[Hao],
Chen, Z.[Zhe],
Wang, W.H.[Wen-Hai],
Lu, L.W.[Le-Wei],
Dai, J.F.[Ji-Feng],
PVC: Progressive Visual Token Compression for Unified Image and Video
Processing in Large Vision-Language Models,
CVPR25(24939-24949)
IEEE DOI Code:
WWW Link.
2508
Visualization, Adaptation models, Image coding, Limiting, Redundancy,
Benchmark testing, Encoding, Data mining, Videos
BibRef
Zhang, K.[Kun],
Li, J.Y.[Jing-Yu],
Li, Z.[Zhe],
Zhou, S.K.[S. Kevin],
DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid
Set-Embeddings Learning,
CVPR25(24993-25003)
IEEE DOI
2508
Accuracy, Computational modeling, Semantics, Benchmark testing,
Computational efficiency, Complexity theory,
set-embeddings learning
BibRef
Guo, Y.C.[Yun-Cheng],
Gu, X.D.[Xiao-Dong],
MMRL: Multi-Modal Representation Learning for Vision-Language Models,
CVPR25(25015-25025)
IEEE DOI Code:
WWW Link.
2508
Representation learning, Training, Adaptation models, Codes,
Transfer learning, Image representation, Data models, Overfitting
BibRef
Zhu, B.[Beier],
Cui, J.[Jiequan],
Zhang, H.W.[Han-Wang],
Zhang, C.[Chi],
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness,
CVPR25(25487-25496)
IEEE DOI
2508
Training, Correlation, Foundation models, Null space, Robustness,
Probes, Faces, group robustness, vision-language models
BibRef
Li, H.Y.[Hao-Yang],
Wang, L.[Liang],
Wang, C.[Chao],
Jiang, J.[Jing],
Peng, Y.[Yan],
Long, G.D.[Guo-Dong],
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models,
CVPR25(25623-25632)
IEEE DOI Code:
WWW Link.
2508
Codes, Semantic segmentation, Collaboration, Cloning,
Object detection, Vectors, Optimization, Tuning, prompt tuning,
multi-modal learning
BibRef
Saravanan, D.[Darshana],
Gupta, V.[Varun],
Singh, D.[Darshan],
Khan, Z.[Zeeshan],
Gandhi, V.[Vineet],
Tapaswi, M.[Makarand],
VELOCITI: Benchmarking Video-Language Compositional Reasoning with
Strict Entailment,
CVPR25(18914-18924)
IEEE DOI
2508
Visualization, Accuracy, Benchmark testing, Cognition, Videos,
video language benchmark
BibRef
Pan, B.[Bikang],
Li, Q.[Qun],
Tang, X.Y.[Xiao-Ying],
Huang, W.[Wei],
Fang, Z.[Zhen],
Liu, F.[Feng],
Wang, J.Y.[Jing-Ya],
Yu, J.Y.[Jing-Yi],
Shi, Y.[Ye],
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models,
CVPR25(19963-19973)
IEEE DOI
2508
Representation learning, Accuracy, Purification, Foundation models,
Transportation, Prototypes, Robustness, Noise measurement, Signal to noise ratio
BibRef
Zhang, Y.T.[Yong-Ting],
Chen, L.[Lu],
Zheng, G.D.[Guo-Dong],
Gao, Y.F.[Yi-Feng],
Zheng, R.[Rui],
Fu, J.[Jinlan],
Yin, Z.F.[Zhen-Fei],
Jin, S.[Senjie],
Qiao, Y.[Yu],
Huang, X.J.[Xuan-Jing],
Zhao, F.[Feng],
Gui, T.[Tao],
Shao, J.[Jing],
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for
Vision Language Models,
CVPR25(19867-19878)
IEEE DOI
2508
Visualization, Computational modeling, Semantics, Data models, Safety
BibRef
Bhattacharjee, S.S.[Subhransu S.],
Campbell, D.[Dylan],
Shome, R.[Rahul],
Believing is Seeing: Unobserved Object Detection using Generative
Models,
CVPR25(19366-19377)
IEEE DOI
2508
Measurement, Training, Solid modeling, Adaptation models,
Visualization, Pipelines, Object detection, Diffusion models,
vision-language models
BibRef
Zhou, E.[Enshen],
Su, Q.[Qi],
Chi, C.[Cheng],
Zhang, Z.Z.[Zhi-Zheng],
Wang, Z.Y.[Zhong-Yuan],
Huang, T.J.[Tie-Jun],
Sheng, L.[Lu],
Wang, H.[He],
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and
Proactive Robotic Failure Detection,
CVPR25(6919-6929)
IEEE DOI Code:
WWW Link.
2508
Visualization, Codes, Accuracy, Prevention and mitigation,
Programming, Real-time systems, Closed loop systems, Monitoring,
vision-language model
BibRef
Zhou, W.J.[Wei-Jie],
Tao, M.[Manli],
Zhao, C.Y.[Chao-Yang],
Guo, H.Y.[Hai-Yun],
Dong, H.H.[Hong-Hui],
Tang, M.[Ming],
Wang, J.Q.[Jin-Qiao],
PhysVLM: Enabling Visual Language Models to Understand Robotic
Physical Reachability,
CVPR25(6940-6949)
IEEE DOI
2508
Visualization, Adaptation models, Service robots, Decision making,
Benchmark testing, Cognition, Reliability, Robots, embodied ai, ,
embodied visual reasoning
BibRef
Song, C.H.[Chan Hee],
Blukis, V.[Valts],
Tremblay, J.[Jonathan],
Tyree, S.[Stephen],
Su, Y.[Yu],
Birchfield, S.[Stan],
RoboSpatial: Teaching Spatial Understanding to 2D and 3D
Vision-Language Models for Robotics,
CVPR25(15768-15780)
IEEE DOI
2508
Training, Solid modeling, Soft sensors, Pipelines, Training data,
Predictive models, Spatial databases, Cognition, Robots,
robot perception
BibRef
Lozano, A.[Alejandro],
Sun, M.W.[Min Woo],
Burgess, J.[James],
Chen, L.[Liangyu],
Nirschl, J.J.[Jeffrey J.],
Gu, J.[Jeffrey],
Lopez, I.[Ivan],
Aklilu, J.[Josiah],
Rau, A.[Anita],
Katzer, A.W.[Austin Wolfgang],
Zhang, Y.H.[Yu-Hui],
Chiu, C.[Collin],
Wang, X.H.[Xiao-Han],
Song, A.S.[Alfred Seunghoon],
Tibshirani, R.[Robert],
Yeung-Levy, S.[Serena],
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and
Vision-Language Models Derived from Scientific Literature,
CVPR25(19724-19735)
IEEE DOI
2508
Annotations, Biological system modeling, Computational modeling,
Dermatology, Surgery, Streaming media, Radiology,
biomedical foundation models
BibRef
Xiao, R.[Rui],
Kim, S.[Sanghwan],
Georgescu, M.I.[Mariana-Iuliana],
Akata, Z.[Zeynep],
Alaniz, S.[Stephan],
FLAIR: VLM with Fine-grained Language-informed Image Representations,
CVPR25(24884-24894)
IEEE DOI Code:
WWW Link.
2508
Visualization, Codes, Semantic segmentation,
Computational modeling, Image representation, Benchmark testing,
multimodal learning
BibRef
Zhang, J.M.[Jia-Ming],
Ye, J.[Junhong],
Ma, X.[Xingjun],
Li, Y.[Yige],
Yang, Y.F.[Yun-Fan],
Chen, Y.H.[Yun-Hao],
Sang, J.[Jitao],
Yeung, D.Y.[Dit-Yan],
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on
Vision-language Models,
CVPR25(19900-19909)
IEEE DOI
2508
Limiting, Foundation models, Scalability,
Prevention and mitigation, Vectors, Internet, Security,
self-supervised
BibRef
Wang, X.[Xin],
Chen, K.[Kai],
Zhang, J.M.[Jia-Ming],
Chen, J.J.[Jing-Jing],
Ma, X.[Xingjun],
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in
Vision-Language Models,
CVPR25(19910-19920)
IEEE DOI Code:
WWW Link.
2508
Visualization, Accuracy, Scalability, Perturbation methods,
Benchmark testing, Robustness, Entropy, Safety, Tuning,
test-time adversarial prompt tuning
BibRef
Yang, C.[Cheng],
Sui, Y.[Yang],
Xiao, J.Q.[Jin-Qi],
Huang, L.[Lingyi],
Gong, Y.[Yu],
Li, C.[Chendi],
Yan, J.H.[Jing-Hua],
Bai, Y.[Yu],
Sadayappan, P.[Ponnuswamy],
Hu, X.[Xia],
Yuan, B.[Bo],
TopV: Compatible Token Pruning with Inference Time Optimization for
Fast and Low-Memory Multimodal Vision Language Model,
CVPR25(19803-19813)
IEEE DOI
2508
Training, Visualization, Computational modeling, Memory management,
Cost function, Cache storage
BibRef
Vasu, P.K.A.[Pavan Kumar Anasosalu],
Faghri, F.[Fartash],
Li, C.L.[Chun-Liang],
Koc, C.[Cem],
True, N.[Nate],
Antony, A.[Albert],
Santhanam, G.[Gokul],
Gabriel, J.[James],
Grasch, P.[Peter],
Tuzel, O.[Oncel],
Pouransari, H.[Hadi],
FastVLM: Efficient Vision Encoding for Vision Language Models,
CVPR25(19769-19780)
IEEE DOI Code:
WWW Link.
2508
Visualization, Image resolution, Accuracy, Image coding, Codes,
Benchmark testing, Encoding, vision-language models, efficiency
BibRef
Chen, Q.Z.[Qi-Zhou],
Wang, C.[Chengyu],
Wang, D.[Dakan],
Zhang, T.[Taolin],
Li, W.[Wangyue],
He, X.F.[Xiao-Feng],
Lifelong Knowledge Editing for Vision Language Models with Low-Rank
Mixture-of-Experts,
CVPR25(9455-9466)
IEEE DOI
2508
Training, Visualization, Filtering, Large language models, Semantics,
Benchmark testing, Routing, Generators, Robustness, model editing,
mixture of expert
BibRef
Chen, T.Y.[Tian-Yu],
Fu, X.C.[Xing-Cheng],
Gao, Y.[Yisen],
Qian, H.D.[Hao-Dong],
Wei, Y.[Yuecen],
Yan, K.[Kun],
Zhou, H.Y.[Hao-Yi],
Li, J.X.[Jian-Xin],
Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding,
CVPR25(4112-4121)
IEEE DOI
2508
Space vehicles, Geometry, Training, Adaptation models,
Extraterrestrial phenomena, Estimation, Stars, Vectors,
multi-modal learning
BibRef
Liu, Z.J.[Zhi-Jian],
Zhu, L.[Ligeng],
Shi, B.[Baifeng],
Zhang, Z.Y.[Zhuo-Yang],
Lou, Y.M.[Yu-Ming],
Yang, S.[Shang],
Xi, H.C.[Hao-Cheng],
Cao, S.Y.[Shi-Yi],
Gu, Y.X.[Yu-Xian],
Li, D.C.[Da-Cheng],
Li, X.[Xiuyu],
Tang, H.T.[Hao-Tian],
Fang, Y.H.[Yun-Hao],
Chen, Y.[Yukang],
Hsieh, C.Y.[Cheng-Yu],
Huang, D.A.[De-An],
Cheng, A.C.[An-Chieh],
Hu, J.Y.[Jin-Yi],
Liu, S.[Sifei],
Krishna, R.[Ranjay],
Molchanov, P.[Pavlo],
Kautz, J.[Jan],
Yin, H.X.[Hong-Xu],
Han, S.[Song],
Lu, Y.[Yao],
NVILA: Efficient Frontier Visual Language Models,
CVPR25(4122-4134)
IEEE DOI
2508
Training, Visualization, Accuracy, Systematics, Image coding, Costs,
Decoding, Spatial resolution, Videos
BibRef
Poppi, T.[Tobia],
Kasarla, T.[Tejaswi],
Mettes, P.[Pascal],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Hyperbolic Safety-Aware Vision-Language Models,
CVPR25(4222-4232)
IEEE DOI Code:
WWW Link.
2508
Adaptation models, Ethics, Law, Source coding, Robustness, Data models,
Safety, Standards, trustworthy, safety, nsfw, hyperbolic, vision-and-language
BibRef
Zhang, H.Y.[Hao-Yu],
Guo, Y.Y.[Yang-Yang],
Kankanhalli, M.[Mohan],
Joint Vision-Language Social Bias Removal for CLIP,
CVPR25(4246-4255)
IEEE DOI Code:
WWW Link.
2508
Measurement, Degradation, Protocols, Codes,
Prevention and mitigation, Computational modeling,
vision-language alignment
BibRef
Zhang, Y.[Yi],
Deng, Y.X.[Yi-Xuan],
Guo, M.H.[Meng-Hao],
Hu, S.M.[Shi-Min],
Adaptive Parameter Selection for Tuning Vision-Language Models,
CVPR25(4280-4290)
IEEE DOI
2508
Adaptation models, Adaptive learning, Manuals, Benchmark testing,
Performance gain, Flowering plants, Tuning, Overfitting
BibRef
Deng, A.[Ailin],
Cao, T.[Tri],
Chen, Z.[Zhirui],
Hooi, B.[Bryan],
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?,
CVPR25(3867-3876)
IEEE DOI
2508
Training, Visualization, Analytical models, Computational modeling,
Reliability theory, Robustness, Data models, Safety,
bias
BibRef
Huang, R.[Runhui],
Ding, X.P.[Xin-Peng],
Wang, C.W.[Chun-Wei],
Han, J.H.[Jian-Hua],
Liu, Y.L.[Yu-Long],
Zhao, H.S.[Heng-Shuang],
Xu, H.[Hang],
Hou, L.[Lu],
Zhang, W.[Wei],
Liang, X.D.[Xiao-Dan],
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large
Vision-Language Models,
CVPR25(29814-29824)
IEEE DOI
2508
Training, Visualization, Costs, Computational modeling,
Benchmark testing, Feature extraction, Image restoration,
visual token compression
BibRef
Wang, S.[Sudong],
Zhang, Y.J.[Yun-Jian],
Zhu, Y.[Yao],
Li, J.N.[Jia-Ning],
Wang, Z.Z.[Zi-Zhe],
Liu, Y.W.[Yan-Wei],
Ji, X.Y.[Xiang-Yang],
Towards Understanding How Knowledge Evolves in Large Vision-Language
Models,
CVPR25(29858-29868)
IEEE DOI Code:
WWW Link.
2508
Dimensionality reduction, Codes, Natural languages,
Probability distribution, Encoding, Trajectory, Model compression,
interpretation
BibRef
Deitke, M.[Matt],
Clark, C.[Christopher],
Lee, S.H.[Sang-Ho],
Tripathi, R.[Rohun],
Yang, Y.[Yue],
Park, J.S.[Jae Sung],
Salehi, M.[Mohammadreza],
Muennighoff, N.[Niklas],
Lo, K.[Kyle],
Soldaini, L.[Luca],
Lu, J.[Jiasen],
Anderson, T.[Taira],
Bransom, E.[Erin],
Ehsani, K.[Kiana],
Ngo, H.[Huong],
Chen, Y.[YenSung],
Patel, A.[Ajay],
Yatskar, M.[Mark],
Callison-Burch, C.[Chris],
Head, A.[Andrew],
Hendrix, R.[Rose],
Bastani, F.[Favyen],
VanderBilt, E.[Eli],
Lambert, N.[Nathan],
Chou, Y.[Yvonne],
Chheda, A.[Arnavi],
Sparks, J.[Jenna],
Skjonsberg, S.[Sam],
Schmitz, M.[Michael],
Sarnat, A.[Aaron],
Bischoff, B.[Byron],
Walsh, P.[Pete],
Newell, C.[Chris],
Wolters, P.[Piper],
Gupta, T.[Tanmay],
Zeng, K.H.[Kuo-Hao],
Borchardt, J.[Jon],
Groeneveld, D.[Dirk],
Nam, C.[Crystal],
Lebrecht, S.[Sophie],
Wittlif, C.[Caitlin],
Schoenick, C.[Carissa],
Michel, O.[Oscar],
Krishna, R.[Ranjay],
Weihs, L.[Luca],
Smith, N.A.[Noah A.],
Hajishirzi, H.[Hannaneh],
Girshick, R.[Ross],
Farhadi, A.[Ali],
Kembhavi, A.[Aniruddha],
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Vision-Language Models,
CVPR25(91-104)
IEEE DOI Code:
WWW Link.
2508
Award, CVPR, Paper HM. Training, Source coding, Computational modeling, Pipelines,
Training data, Data models, Open data, Synthetic data,
visual instruction tuning
BibRef
Zhao, W.[Wangbo],
Han, Y.Z.[Yi-Zeng],
Tang, J.S.[Jia-Sheng],
Li, Z.[Zhikai],
Song, Y.B.[Yi-Bing],
Wang, K.[Kai],
Wang, Z.Y.[Zhang-Yang],
You, Y.[Yang],
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for
Accelerating Large VLMs,
CVPR25(19814-19824)
IEEE DOI Code:
WWW Link.
2508
Visualization, Codes, Accuracy, Benchmark testing, Computational efficiency
BibRef
Lee, B.K.[Byung-Kwan],
Hachiuma, R.[Ryo],
Wang, Y.C.A.F.[Yu-Chi-Ang Frank],
Ro, Y.M.[Yong Man],
Wu, Y.H.[Yueh-Hua],
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
Language Models,
CVPR25(29545-29557)
IEEE DOI
2508
Training, Performance evaluation, Visualization,
Computational modeling, Natural languages, Merging, Tuning
BibRef
Sun, J.C.[Jing-Chen],
Sharma, R.[Rohan],
Lokhande, V.S.[Vishnu Suresh],
Chen, C.Y.[Chang-You],
Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt
Tuning,
WACV25(4714-4724)
IEEE DOI
2505
Training, Adaptation models, Visualization, Codes, Computational modeling,
Stochastic processes, Robustness, Tuning, vision-language model
BibRef
Safaei, B.[Bardia],
Patel, V.M.[Vishal M.],
Active Learning for Vision-Language Models,
WACV25(4902-4912)
IEEE DOI
2505
Training, Bridges, Uncertainty, Computational modeling, Active learning,
Measurement uncertainty, Entropy, Reliability, Image classification
BibRef
Wang, Y.C.[Yi-Cheng],
Zhang, Z.K.[Zhi-Kang],
Wang, J.[Jue],
Fan, D.[David],
Xu, Z.L.[Zhen-Lin],
Liu, L.[Linda],
Hao, X.[Xiang],
Bhat, V.[Vimal],
Li, X.Y.[Xin-Yu],
GEXIA: Granularity Expansion and Iterative Approximation for Scalable
Multi-Grained Video-Language Learning,
WACV25(4725-4735)
IEEE DOI
2505
Computational modeling, Semantics, Benchmark testing, Data models,
Iterative methods, Videos
BibRef
Colman, R.[Roman],
Vu, M.[Minh],
Bhattarai, M.[Manish],
Ma, M.[Martin],
Viswanathan, H.[Hari],
O'Malley, D.[Daniel],
Santos, J.E.[Javier E.],
PatchFinder: Leveraging Visual Language Models for Accurate
Information Retrieval Using Model Uncertainty,
WACV25(9146-9155)
IEEE DOI
2505
Visualization, Uncertainty, Accuracy, Computational modeling,
Software algorithms, Predictive models, Information retrieval,
log likelihood
BibRef
Jawade, B.[Bhavin],
Soares, J.V.B.[João V. B.],
Thadani, K.[Kapil],
Mohan, D.D.[Deen Dayal],
Eshratifar, A.E.[Amir Erfan],
Culpepper, B.[Benjamin],
de Juan, P.[Paloma],
Setlur, S.[Srirangaraj],
Govindaraju, V.[Venu],
SCOT: Self-Supervised Contrastive Pretraining for Zero-Shot
Compositional Retrieval,
WACV25(5509-5519)
IEEE DOI Code:
WWW Link.
2505
Training, Codes, Large language models, Image retrieval,
Benchmark testing, Web search, Standards, zero-shot
BibRef
Talemi, N.A.[Niloufar Alipour],
Kashiani, H.[Hossein],
Afghah, F.[Fatemeh],
Style-Pro: Style-Guided Prompt Learning for Generalizable
Vision-Language Models,
WACV25(6207-6216)
IEEE DOI
2505
Adaptation models, Image recognition, Computational modeling,
Benchmark testing, Data models, Robustness, Overfitting,
style shift learning
BibRef
Chang, H.S.[Hung-Shuo],
Wang, C.Y.[Chien-Yao],
Wang, R.R.[Richard Robert],
Chou, G.[Gene],
Liao, H.Y.M.[Hong-Yuan Mark],
Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual
Language Models,
WACV25(6217-6227)
IEEE DOI Code:
WWW Link.
2505
YOLO, Training, Visualization, Accuracy, Source coding, Semantics,
Predictive models, Real-time systems, Decoding, multi-task
BibRef
Westfechtel, T.[Thomas],
Zhang, D.[Dexuan],
Harada, T.[Tatsuya],
Combining Inherent Knowledge of Vision-Language Models with
Unsupervised Domain Adaptation Through Strong-Weak Guidance,
WACV25(6528-6537)
IEEE DOI
2505
Adaptation models, Accuracy, Predictive models, Benchmark testing,
Prediction algorithms, Labeling
BibRef
Chen, H.N.[Han-Ning],
Ni, Y.[Yang],
Huang, W.J.[Wen-Jun],
Liu, Y.[Yezi],
Jeong, S.[Sung_Heon],
Wen, F.[Fei],
Bastian, N.D.[Nathaniel D.],
Latapie, H.[Hugo],
Imani, M.[Mohsen],
VLTP: Vision-Language Guided Token Pruning for Task-Oriented
Segmentation,
WACV25(9353-9363)
IEEE DOI
2505
Uniform resource locators, Image segmentation, Image recognition,
Computational modeling, Large language models, Transformers, Load modeling
BibRef
Ali, E.[Eman],
Silva, S.[Sathira],
Khan, M.H.[Muhammad Haris],
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of
Vision-Language Models,
WACV25(6083-6093)
IEEE DOI
2505
Training, Adaptation models, Visualization, Accuracy, Prototypes,
Data models, Noise measurement, Image classification
BibRef
Zhang, C.[Ce],
Stepputtis, S.[Simon],
Sycara, K.[Katia],
Xie, Y.Q.[Ya-Qi],
Enhancing Vision-Language Few-Shot Adaptation with Negative Learning,
WACV25(5905-5915)
IEEE DOI Code:
WWW Link.
2505
Adaptation models, Codes, Accuracy, Computational modeling, Noise,
Transforms, Computational efficiency, Noise measurement, Few shot learning
BibRef
Yamada, M.[Moyuru],
Dharamshi, N.[Nimish],
Kohli, A.[Ayushi],
Kasu, P.[Prasad],
Khan, A.[Ainulla],
Ghulyani, M.[Manu],
Unleashing Potentials of Vision-Language Models for Zero-Shot HOI
Detection,
WACV25(5751-5760)
IEEE DOI
2505
Head, Computational modeling, Redundancy, Object detection,
Network architecture, Predictive models, Decoding,
vision-and-language
BibRef
Imam, R.[Raza],
Gani, H.[Hanan],
Huzaifa, M.[Muhammad],
Nandakumar, K.[Karthik],
Test-Time Low Rank Adaptation via Confidence Maximization for
Zero-Shot Generalization of Vision-Language Models,
WACV25(5449-5459)
IEEE DOI Code:
WWW Link.
2505
Adaptation models, Visualization, Codes, Large language models,
Transformers, Entropy, Tuning, Optimization
BibRef
Ghoddoosian, R.[Reza],
Agarwal, N.[Nakul],
Dwivedi, I.[Isht],
Darisuh, B.[Behzad],
ACE: Action Concept Enhancement of Video-Language Models in
Procedural Videos,
WACV25(9521-9531)
IEEE DOI
2505
Training, Visualization, Robustness, Assembly, Videos, Overfitting, zero-shot,
action recognition, vlm, vision language model, synonym, text augmentation
BibRef
Onoe, Y.[Yasumasa],
Rane, S.[Sunayana],
Berger, Z.[Zachary],
Bitton, Y.[Yonatan],
Cho, J.[Jaemin],
Garg, R.[Roopal],
Ku, A.[Alexander],
Parekh, Z.[Zarana],
Pont-Tuset, J.[Jordi],
Tanzer, G.[Garrett],
Wang, S.[Su],
Baldridge, J.[Jason],
DOCCI: Descriptions of Connected and Contrasting Images,
ECCV24(LX: 291-309).
Springer DOI
2412
BibRef
Li, T.[Tang],
Ma, M.M.[Meng-Meng],
Peng, X.[Xi],
DEAL: Disentangle and Localize Concept-level Explanations for VLMs,
ECCV24(XXXIX: 383-401).
Springer DOI
2412
BibRef
Park, K.Y.[Kwan-Yong],
Saito, K.[Kuniaki],
Kim, D.H.[Dong-Hyun],
Weak-to-strong Compositional Learning from Generative Models for
Language-based Object Detection,
ECCV24(XXIII: 1-19).
Springer DOI
2412
BibRef
Li, S.C.[Shi-Cheng],
Li, L.[Lei],
Liu, Y.[Yi],
Ren, S.H.[Shu-Huai],
Liu, Y.X.[Yuan-Xin],
Gao, R.D.[Run-Dong],
Sun, X.[Xu],
Hou, L.[Lu],
Vitatecs: A Diagnostic Dataset for Temporal Concept Understanding of
Video-language Models,
ECCV24(LXX: 331-348).
Springer DOI
2412
BibRef
Yang, Y.T.[Yan-Ting],
Chen, M.H.[Ming-Hao],
Qiu, Q.[Qibo],
Wu, J.H.[Jia-Hao],
Wang, W.X.[Wen-Xiao],
Lin, B.B.[Bin-Bin],
Guan, Z.Y.[Zi-Yu],
He, X.F.[Xiao-Fei],
Adapt2reward: Adapting Video-language Models to Generalizable Robotic
Rewards via Failure Prompts,
ECCV24(LVII: 163-180).
Springer DOI
2412
BibRef
Rahmanzadehgervi, P.[Pooyan],
Bolton, L.[Logan],
Taesiri, M.R.[Mohammad Reza],
Nguyen, A.T.[Anh Totti],
Vision Language Models are blind,
ACCV24(V: 293-309).
Springer DOI
2412
BibRef
Lai, C.G.[Chen-Gen],
Song, S.L.[Sheng-Li],
Yan, S.[Sitong],
Hu, G.[Guangneng],
Improving Vision and Language Concepts Understanding with Multimodal
Counterfactual Samples,
ECCV24(LXIX: 174-191).
Springer DOI
2412
BibRef
Chytas, S.P.[Sotirios Panagiotis],
Kim, H.W.J.[Hyun-Woo J.],
Singh, V.[Vikas],
Understanding Multi-compositional Learning in Vision and Language
Models via Category Theory,
ECCV24(XLVIII: 324-341).
Springer DOI
2412
BibRef
Song, Y.Z.[Yun-Zhu],
Chen, Y.S.[Yi-Syuan],
Lin, T.L.[Tzu-Ling],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
Capture Concept Through Comparison: Vision-and-language Representation
Learning with Intrinsic Information Mining,
ACCV24(III: 220-238).
Springer DOI
2412
BibRef
Adhikari, R.[Rabin],
Thapaliya, S.[Safal],
Dhakal, M.[Manish],
Khanal, B.[Bishesh],
Tunevlseg: Prompt Tuning Benchmark for Vision-language Segmentation
Models,
ACCV24(III: 44-62).
Springer DOI
2412
BibRef
He, H.C.[Hai-Chen],
Liu, W.B.[Wei-Bin],
Xing, W.W.[Wei-Wei],
Biefficient: Bidirectionally Prompting Vision-language Models for
Parameter-efficient Video Recognition,
ACCV24(III: 257-274).
Springer DOI
2412
BibRef
Yang, J.K.[Jing-Kang],
Dong, Y.H.[Yu-Hao],
Liu, S.[Shuai],
Li, B.[Bo],
Wang, Z.Y.[Zi-Yue],
Tan, H.R.[Hao-Ran],
Jiang, C.C.[Chen-Cheng],
Kang, J.[Jiamu],
Zhang, Y.H.[Yuan-Han],
Zhou, K.Y.[Kai-Yang],
Liu, Z.W.[Zi-Wei],
Octopus: Embodied Vision-language Programmer from Environmental
Feedback,
ECCV24(I: 20-38).
Springer DOI
2412
BibRef
Kar, O.F.[Oguzhan Fatih],
Tonioni, A.[Alessio],
Poklukar, P.[Petra],
Kulshrestha, A.[Achin],
Zamir, A.[Amir],
Tombari, F.[Federico],
Brave: Broadening the Visual Encoding of Vision-language Models,
ECCV24(XVI: 113-132).
Springer DOI
2412
BibRef
Kamath, A.[Amita],
Hsieh, C.Y.[Cheng-Yu],
Chang, K.W.[Kai-Wei],
Krishna, R.[Ranjay],
The Hard Positive Truth About Vision-language Compositionality,
ECCV24(XIV: 37-54).
Springer DOI
2412
BibRef
Jia, B.X.[Bao-Xiong],
Chen, Y.X.[Yi-Xin],
Yu, H.Y.[Huang-Yue],
Wang, Y.[Yan],
Niu, X.S.[Xue-Song],
Liu, T.Y.[Teng-Yu],
Li, Q.[Qing],
Huang, S.Y.[Si-Yuan],
Sceneverse: Scaling 3d Vision-language Learning for Grounded Scene
Understanding,
ECCV24(IX: 289-310).
Springer DOI
2412
BibRef
Zhang, Y.F.[Yi-Feng],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Learning Chain of Counterfactual Thought for Bias-robust
Vision-language Reasoning,
ECCV24(VIII: 334-351).
Springer DOI
2412
BibRef
Li, J.[Junyan],
Chen, D.[Delin],
Cai, T.[Tianle],
Chen, P.H.[Pei-Hao],
Hong, Y.[Yining],
Chen, Z.F.[Zhen-Fang],
Shen, Y.K.[Yi-Kang],
Gan, C.[Chuang],
Flexattention for Efficient High-resolution Vision-language Models,
ECCV24(XXV: 286-302).
Springer DOI
2412
BibRef
Li, X.[Xiang],
Ding, J.[Jian],
Chen, Z.Y.[Zhao-Yang],
Elhoseiny, M.[Mohamed],
UNI3DL: A Unified Model for 3d Vision-language Understanding,
ECCV24(XXIII: 74-92).
Springer DOI
2412
BibRef
Hao, T.X.[Tian-Xiang],
Ding, X.H.[Xiao-Han],
Feng, J.X.[Jue-Xiao],
Yang, Y.H.[Yu-Hong],
Chen, H.[Hui],
Ding, G.[Guiguang],
Quantized Prompt for Efficient Generalization of Vision-language Models,
ECCV24(XIX: 54-73).
Springer DOI
2412
BibRef
Xu, H.B.[Huang-Biao],
Ke, X.[Xiao],
Li, Y.Z.[Yue-Zhou],
Xu, R.[Rui],
Wu, H.Q.[Huan-Qi],
Lin, X.F.[Xiao-Feng],
Guo, W.Z.[Wen-Zhong],
Vision-language Action Knowledge Learning for Semantic-aware Action
Quality Assessment,
ECCV24(XLII: 423-440).
Springer DOI
2412
BibRef
Zhu, Z.Y.[Zi-Yu],
Zhang, Z.[Zhuofan],
Ma, X.J.[Xiao-Jian],
Niu, X.S.[Xue-Song],
Chen, Y.X.[Yi-Xin],
Jia, B.X.[Bao-Xiong],
Deng, Z.D.[Zhi-Dong],
Huang, S.Y.[Si-Yuan],
Li, Q.[Qing],
Unifying 3d Vision-language Understanding via Promptable Queries,
ECCV24(XLIV: 188-206).
Springer DOI
2412
BibRef
Zhang, J.M.[Jia-Ming],
Ma, X.J.[Xing-Jun],
Wang, X.[Xin],
Qiu, L.Y.[Ling-Yu],
Wang, J.Q.[Jia-Qi],
Jiang, Y.G.[Yu-Gang],
Sang, J.[Jitao],
Adversarial Prompt Tuning for Vision-language Models,
ECCV24(XLV: 56-72).
Springer DOI
2412
BibRef
Wu, G.[Ge],
Zhang, X.[Xin],
Li, Z.[Zheng],
Chen, Z.W.[Zhao-Wei],
Liang, J.J.[Jia-Jun],
Yang, J.[Jian],
Li, X.[Xiang],
Cascade Prompt Learning for Vision-language Model Adaptation,
ECCV24(L: 304-321).
Springer DOI
2412
BibRef
Gao, S.[Sensen],
Jia, X.J.[Xiao-Jun],
Ren, X.H.[Xu-Hong],
Tsang, I.[Ivor],
Guo, Q.[Qing],
Boosting Transferability in Vision-language Attacks via Diversification
Along the Intersection Region of Adversarial Trajectory,
ECCV24(LVII: 442-460).
Springer DOI
2412
BibRef
Jiang, H.B.[Hao-Bin],
Yue, J.P.[Jun-Peng],
Luo, H.[Hao],
Ding, Z.[Ziluo],
Lu, Z.Q.[Zong-Qing],
Reinforcement Learning Friendly Vision-language Model for Minecraft,
ECCV24(LXVIII: 1-17).
Springer DOI
2412
BibRef
Nguyen, A.T.[A. Tuan],
Tai, K.S.[Kai Sheng],
Chen, B.C.[Bor-Chun],
Shukla, S.N.[Satya Narayan],
Yu, H.C.[Han-Chao],
Torr, P.H.S.[Philip H.S.],
Tian, T.P.[Tai-Peng],
Lim, S.N.[Ser-Nam],
ucap: An Unsupervised Prompting Method for Vision-language Models,
ECCV24(LXXIV: 425-439).
Springer DOI
2412
BibRef
Zhang, Y.[Yi],
Yu, K.[Ke],
Wu, S.Q.[Si-Qi],
He, Z.H.[Zhi-Hai],
Conceptual Codebook Learning for Vision-language Models,
ECCV24(LXXVII: 235-251).
Springer DOI
2412
BibRef
Chatterjee, A.[Agneet],
Luo, Y.R.[Yi-Ran],
Gokhale, T.[Tejas],
Yang, Y.Z.[Ye-Zhou],
Baral, C.[Chitta],
Revision: Rendering Tools Enable Spatial Fidelity in Vision-language
Models,
ECCV24(XXX: 339-357).
Springer DOI
2412
BibRef
Sharma, P.[Pratyusha],
Shaham, T.R.[Tamar Rott],
Baradad, M.[Manel],
Rodriíuez-Muñoz, A.[Adrián],
Duggal, S.[Shivam],
Isola, P.[Phillip],
Torralba, A.[Antonio],
Fu, S.[Stephanie],
A Vision Check-up for Language Models,
CVPR24(14410-14419)
IEEE DOI
2410
Representation learning, Visualization, Analytical models, Codes,
Image synthesis, Computational modeling
BibRef
Parodi, F.[Felipe],
Matelsky, J.K.[Jordan K.],
Regla-Vargas, A.[Alejandra],
Foglia, E.E.[Elizabeth E.],
Lim, C.[Charis],
Weinberg, D.[Danielle],
Kording, K.P.[Konrad P.],
Herrick, H.M.[Heidi M.],
Platt, M.L.[Michael L.],
Vision-language models for decoding provider attention during
neonatal resuscitation,
CVPM24(343-353)
IEEE DOI
2410
Training, Pediatrics, Accuracy, Semantics, Decision making, Transformers
BibRef
Zhang, Y.B.[Ya-Bin],
Zhu, W.J.[Wen-Jie],
Tang, H.[Hui],
Ma, Z.Y.[Zhi-Yuan],
Zhou, K.Y.[Kai-Yang],
Zhang, L.[Lei],
Dual Memory Networks: A Versatile Adaptation Approach for
Vision-Language Models,
CVPR24(28718-28728)
IEEE DOI Code:
WWW Link.
2410
Training, Knowledge engineering, Adaptation models, Codes,
Training data, Data models, Vision-language models,
versatile adaptation
BibRef
Guo, Y.C.[Yun-Cheng],
Gu, X.D.[Xiao-Dong],
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language
Models,
CVPR24(28695-28705)
IEEE DOI
2410
Adaptation models, Adaptive systems, Noise, Manuals, Robustness,
Noise measurement,
prompt learning
BibRef
Han, J.[Jinwei],
Lin, Z.W.[Zhi-Wen],
Sun, Z.Y.[Zhong-Yisun],
Gao, Y.G.[Ying-Guo],
Yan, K.[Ke],
Ding, S.H.[Shou-Hong],
Gao, Y.[Yuan],
Xia, G.S.[Gui-Song],
Anchor-based Robust Finetuning of Vision-Language Models,
CVPR24(26909-26918)
IEEE DOI
2410
Image recognition, Zero-shot learning, Semantics,
Benchmark testing, Anchor, Robust Finetuning
BibRef
Cao, Q.L.[Qing-Long],
Zheng-Qin, X.,
Chen, Y.T.[Yun-Tian],
Chao, M.,
Yang, X.K.[Xiao-Kang],
Domain Prompt Learning with Quaternion Networks,
CVPR24(26627-26636)
IEEE DOI Code:
WWW Link.
2410
Knowledge engineering, Adaptation models, Codes, Quaternions,
Face recognition, Contrastive learning, vision-language models,
quaternion networks
BibRef
Li, L.[Lin],
Guan, H.Y.[Hao-Yan],
Qiu, J.N.[Jia-Ning],
Spratling, M.[Michael],
One Prompt Word is Enough to Boost Adversarial Robustness for
Pre-Trained Vision-Language Models,
CVPR24(24408-24419)
IEEE DOI Code:
WWW Link.
2410
Accuracy, Codes, Training data, Robustness,
Computational efficiency, vision-language models, VLMs
BibRef
Zanella, M.[Maxime],
Fuchs, C.[Clément],
de Vleeschouwer, C.[Christophe],
Ayed, I.B.[Ismail Ben],
Realistic Test-Time Adaptation of Vision-Language Models,
CVPR25(25103-25112)
IEEE DOI Code:
WWW Link.
2508
Adaptation models, Codes, Predictive models, Performance gain,
Robustness, vision-language, test-time adaptation,
regularized maximum likelihood estimation
BibRef
Zanella, M.[Maxime],
Ayed, I.B.[Ismail Ben],
On the Test-Time Zero-Shot Generalization of Vision-Language Models:
Do we Really need Prompt Learning?,
CVPR24(23783-23793)
IEEE DOI
2410
Training, Systematics, Computational modeling, Quality assessment,
Computational efficiency, vision-language,
training-free
BibRef
Yang, S.[Senqiao],
Tian, Z.[Zhuotao],
Jiang, L.[Li],
Jia, J.Y.[Jia-Ya],
Unified Language-Driven Zero-Shot Domain Adaptation,
CVPR24(23407-23415)
IEEE DOI
2410
Representation learning, Adaptation models, Visualization,
Correlation, Scalability, Computational modeling,
Vision-Language Model
BibRef
Cui, J.Q.[Jie-Quan],
Zhu, B.[Beier],
Wen, X.[Xin],
Qi, X.J.[Xiao-Juan],
Yu, B.[Bei],
Zhang, H.W.[Han-Wang],
Classes Are Not Equal: An Empirical Study on Image Recognition
Fairness,
CVPR24(23283-23292)
IEEE DOI
2410
Training, Representation learning, Image recognition, Accuracy,
Predictive models, Network architecture, Prediction algorithms,
Vision-Language Models
BibRef
Stojnic, V.[Vladan],
Kalantidis, Y.[Yannis],
Tolias, G.[Giorgos],
Label Propagation for Zero-shot Classification with Vision-Language
Models,
CVPR24(23209-23218)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Closed box, Encoding, Data models,
vision-language models, label propagation, zero-shot classification
BibRef
Yuan, T.[Tongtong],
Zhang, X.[Xuange],
Liu, K.[Kun],
Liu, B.[Bo],
Chen, C.[Chen],
Jin, J.[Jian],
Jiao, Z.Z.[Zhen-Zhen],
Towards Surveillance Video-and-Language Understanding: New Dataset,
Baselines, and Challenges,
CVPR24(22052-22061)
IEEE DOI Code:
WWW Link.
2410
Annotations, Surveillance, Semantics, Benchmark testing,
Public security, Timing, Security, Dataset Annotation
BibRef
Chen, Y.F.[Yi-Fei],
Chen, D.P.[Da-Peng],
Liu, R.J.[Rui-Jin],
Zhou, S.[Sai],
Xue, W.Y.[Wen-Yuan],
Peng, W.[Wei],
Align Before Adapt: Leveraging Entity-to-Region Alignments for
Generalizable Video Action Recognition,
CVPR24(18688-18698)
IEEE DOI
2410
Representation learning, Adaptation models, Visualization, Semantics,
Transformers, Vectors, Video action recognition, visual-language model
BibRef
Mittal, H.[Himangi],
Agarwal, N.[Nakul],
Lo, S.Y.[Shao-Yuan],
Lee, K.[Kwonjoon],
Can't make an Omelette without Breaking some Eggs: Plausible Action
Anticipation using Large Video-Language Models,
CVPR24(18580-18590)
IEEE DOI
2410
Accuracy, Computational modeling, Linear programming,
Action Anticipation, Video, Large Multimodal Models
BibRef
Kahatapitiya, K.[Kumara],
Arnab, A.[Anurag],
Nagran, A.[Arsha],
Ryoo, M.S.[Michael S.],
VicTR: Video-conditioned Text Representations for Activity
Recognition,
CVPR24(18547-18558)
IEEE DOI
2410
Training, Visualization, Adaptation models, Semantics, Focusing,
Benchmark testing, Vision-language models, Activity Recognition,
Video-conditioned Text
BibRef
Wu, T.Y.[Tz-Ying],
Ho, C.H.[Chih-Hui],
Vasconcelos, N.M.[Nuno M.],
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification,
CVPR24(16531-16540)
IEEE DOI Code:
WWW Link.
2410
Measurement, Training, Frequency modulation, Accuracy, Taxonomy,
Semantics, Hierarchical Classification, Visual-language foundation model
BibRef
Zhao, G.[Ganlong],
Li, G.B.[Guan-Bin],
Chen, W.[Weikai],
Yu, Y.Z.[Yi-Zhou],
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with
Open-Vocabulary Detection and StructurEd Representation,
CVPR24(16296-16306)
IEEE DOI
2410
Art, Accuracy, Navigation, Annotations, Detectors,
Vision-and-Language Navigation, Open-vocabulary, Multi-Modal Learning
BibRef
Li, X.[Xin],
Wu, Y.F.[Yun-Fei],
Jiang, X.H.[Xing-Hua],
Guo, Z.H.[Zhi-Hao],
Gong, M.M.[Ming-Ming],
Cao, H.Y.[Hao-Yu],
Liu, Y.S.[Yin-Song],
Jiang, D.Q.[De-Qiang],
Sun, X.[Xing],
Enhancing Visual Document Understanding with Contrastive Learning in
Large Visual-Language Models,
CVPR24(15546-15555)
IEEE DOI
2410
Visualization, Computational modeling, Contrastive learning,
Benchmark testing, Feature extraction, Filling, Contrastive Learning
BibRef
Pham, K.[Khoi],
Huynh, C.[Chuong],
Lim, S.N.[Ser-Nam],
Shrivastava, A.[Abhinav],
Composing Object Relations and Attributes for Image-Text Matching,
CVPR24(14354-14363)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Image edge detection,
Semantics, Benchmark testing, vision-language, image retrieval,
image-text matching
BibRef
Kim, G.[Gahyeon],
Kim, S.[Sohee],
Lee, S.[Seokju],
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models,
Prompting24(1572-1582)
IEEE DOI
2410
Visualization, Zero-shot learning, Semantics, Focusing,
Feature extraction, Data augmentation, Vectors, prompt learning, VLMs
BibRef
Xu, Z.L.[Zhen-Lin],
Zhu, Y.[Yi],
Deng, S.Q.[Si-Qi],
Mittal, A.[Abhay],
Chen, Y.B.[Yan-Bei],
Wang, M.[Manchen],
Favaro, P.[Paolo],
Tighe, J.[Joseph],
Modolo, D.[Davide],
Benchmarking Zero-Shot Recognition with Vision-Language Models:
Challenges on Granularity and Specificity,
WhatNext24(1827-1836)
IEEE DOI
2410
Computational modeling, Face recognition, Semantics, Training data,
Focusing, Vision and language models, Zero-shot recognition,
Benchmarking
BibRef
Luo, Z.W.[Zi-Wei],
Gustafsson, F.K.[Fredrik K.],
Zhao, Z.[Zheng],
Sjölund, J.[Jens],
Schön, T.B.[Thomas B.],
Photo-Realistic Image Restoration in the Wild with Controlled
Vision-Language Models,
NTIRE24(6641-6651)
IEEE DOI
2410
Degradation, Training, Image synthesis, Pipelines, Transform coding,
Diffusion models, Feature extraction, Image restoration, real-world
BibRef
Huang, C.Q.[Chao-Qin],
Jiang, A.[Aofan],
Feng, J.H.[Jing-Hao],
Zhang, Y.[Ya],
Wang, X.C.[Xin-Chao],
Wang, Y.F.[Yan-Feng],
Adapting Visual-Language Models for Generalizable Anomaly Detection
in Medical Images,
CVPR24(11375-11385)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Image segmentation, Visualization,
Source coding, Semantics, Anomaly Detection, Medical Images
BibRef
Bang, J.[Jihwan],
Ahn, S.[Sumyeong],
Lee, J.G.[Jae-Gil],
Active Prompt Learning in Vision Language Models,
CVPR24(26994-27004)
IEEE DOI Code:
WWW Link.
2410
Learning systems, Adaptation models, Codes, Sampling methods, Labeling
BibRef
Pan, C.[Chenbin],
Yaman, B.[Burhaneddin],
Nesti, T.[Tommaso],
Mallik, A.[Abhirup],
Allievi, A.G.[Alessandro G],
Velipasalar, S.[Senem],
Ren, L.[Liu],
VLP: Vision Language Planning for Autonomous Driving,
CVPR24(14760-14769)
IEEE DOI
2410
Training, Urban areas, Linguistics, Cognition, Robustness, Planning
BibRef
Liang, M.[Mingfu],
Su, J.C.[Jong-Chyi],
Schulter, S.[Samuel],
Garg, S.[Sparsh],
Zhao, S.Y.[Shi-Yu],
Wu, Y.[Ying],
Chandraker, M.[Manmohan],
AIDE: An Automatic Data Engine for Object Detection in Autonomous
Driving,
CVPR24(14695-14706)
IEEE DOI
2410
Training, Costs, Roads, Pipelines, Object detection, Benchmark testing,
Data models, Autonomous Driving, Vision Language Model,
Automatic Data Engine
BibRef
Li, Z.[Zheng],
Li, X.[Xiang],
Fu, X.[Xinyi],
Zhang, X.[Xin],
Wang, W.Q.[Wei-Qiang],
Chen, S.[Shuo],
Yang, J.[Jian],
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models,
CVPR24(26607-26616)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Prediction algorithms, Data models,
Vectors, Probability distribution, knowledge distillation,
zero-shot learning
BibRef
Khandelwal, A.[Anant],
PromptSync: Bridging Domain Gaps in Vision-Language Models through
Class-Aware Prototype Alignment and Discrimination,
ZeroShot24(7819-7828)
IEEE DOI
2410
Adaptation models, Computational modeling, Prototypes,
Contrastive learning, Benchmark testing, Robustness
BibRef
Hirohashi, Y.[Yuki],
Hirakawa, T.[Tsubasa],
Yamashita, T.[Takayoshi],
Fujiyoshi, H.[Hironobu],
Prompt Learning with One-Shot Setting based Feature Space Analysis in
Vision-and-Language Models,
ZeroShot24(7761-7770)
IEEE DOI
2410
Learning systems, Analytical models, Adaptation models,
Image resolution, Accuracy, Vision-and-Language Model, Prompt Learning
BibRef
Zhang, L.[Le],
Awal, R.[Rabiul],
Agrawal, A.[Aishwarya],
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to
Enhance Visio-Linguistic Compositional Understanding,
CVPR24(13774-13784)
IEEE DOI Code:
WWW Link.
2410
Annotations, Semantics, Refining, Text to image,
Contrastive learning, Benchmark testing, Cognition,
contrastive learning
BibRef
Rosasco, A.[Andrea],
Berti, S.[Stefano],
Pasquale, G.[Giulia],
Malafronte, D.[Damiano],
Sato, S.[Shogo],
Segawa, H.[Hiroyuki],
Inada, T.[Tetsugo],
Natale, L.[Lorenzo],
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized
Vision-Language Tasks,
CVPR24(22239-22248)
IEEE DOI Code:
WWW Link.
2410
Measurement, Codes, Image synthesis, Text to image,
Benchmark testing, benchmark, dataset,
compositionality
BibRef
Cheng, S.[Sijie],
Guo, Z.C.[Zhi-Cheng],
Wu, J.[Jinawen],
Fang, K.[Kechen],
Li, P.[Peng],
Liu, H.P.[Hua-Ping],
Liu, Y.[Yang],
EgoThink: Evaluating First-Person Perspective Thinking Capability of
Vision-Language Models,
CVPR24(14291-14302)
IEEE DOI
2410
Bridges, Visualization, Computational modeling, Focusing,
Benchmark testing, Planning, Egocentric, Vision-Language Models, Benchmark
BibRef
Kil, J.[Jihyung],
Song, C.H.[Chan Hee],
Zheng, B.[Boyuan],
Deng, X.[Xiang],
Su, Y.[Yu],
Chao, W.L.[Wei-Lun],
Dual-View Visual Contextualization for Web Navigation,
CVPR24(14445-14454)
IEEE DOI
2410
Visualization, Navigation, Benchmark testing,
AI Agents, Web Agents, Web Navigation, Vision-Language,
Multimodal Agents
BibRef
Guo, Y.Y.[Yang-Yang],
Wang, G.Z.[Guang-Zhi],
Kankanhalli, M.[Mohan],
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation,
CVPR24(15699-15709)
IEEE DOI
2410
Codes, Computational modeling, Perturbation methods, Loading,
Transformers, Vision-Language,
Low-rank Approximation
BibRef
Cao, J.J.[Jian-Jian],
Ye, P.[Peng],
Li, S.Z.[Sheng-Ze],
Yu, C.[Chong],
Tang, Y.S.[Yan-Song],
Lu, J.W.[Ji-Wen],
Chen, T.[Tao],
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for
Accelerating Vision-Language Transformer,
CVPR24(15710-15719)
IEEE DOI Code:
WWW Link.
2410
Degradation, Adaptation models, Visualization, Costs,
Computational modeling, Semantics, Token Pruning, Model Compress
BibRef
Farina, M.[Matteo],
Mancini, M.[Massimiliano],
Cunegatti, E.[Elia],
Cunegatti, E.[Elia],
Iacca, G.[Giovanni],
Ricci, E.[Elisa],
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning,
CVPR24(16185-16195)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Transfer learning, Neurons,
Benchmark testing, multimodal learning,
sparse neural networks
BibRef
Mu, F.Z.[Fang-Zhou],
Mo, S.C.[Si-Cheng],
Li, Y.[Yin],
SnAG: Scalable and Accurate Video Grounding,
CVPR24(18930-18940)
IEEE DOI Code:
WWW Link.
2410
Training, Analytical models, Accuracy, Grounding, Scalability,
Computational modeling, Video understanding,
Vision-Language Learning
BibRef
Cao, Y.H.[Yun-Hao],
Ji, K.X.[Kai-Xiang],
Huang, Z.Y.[Zi-Yuan],
Zheng, C.Y.[Chuan-Yang],
Liu, J.J.[Jia-Jia],
Wang, J.[Jian],
Chen, J.D.[Jing-Dong],
Yang, M.[Ming],
Towards Better Vision-Inspired Vision-Language Models,
CVPR24(13537-13547)
IEEE DOI
2410
Training, Bridges, Visualization, Computational modeling,
Poles and towers, Benchmark testing, deep learning, deep prompt
BibRef
Shi, K.Y.[Kun-Yu],
Dong, Q.[Qi],
Goncalves, L.[Luis],
Tu, Z.W.[Zhuo-Wen],
Soatto, S.[Stefano],
Non-autoregressive Sequence-to-Sequence Vision-Language Models,
CVPR24(13603-13612)
IEEE DOI
2410
Visualization, Technological innovation, Computational modeling,
Predictive models, Drives, Encoding, Non-autoregressive, CTC,
vision language models
BibRef
Man, Y.Z.[Yun-Ze],
Gui, L.Y.[Liang-Yan],
Wang, Y.X.[Yu-Xiong],
Situational Awareness Matters in 3D Vision Language Reasoning,
CVPR24(13678-13688)
IEEE DOI
2410
Visualization, Solid modeling, Estimation, Performance gain,
Cognition, Vision-Language, Multi-modal, 3D Reasoning
BibRef
Zheng, C.H.[Chen-Hao],
Zhang, J.[Jieyu],
Kembhavi, A.[Aniruddha],
Krishna, R.[Ranjay],
Iterated Learning Improves Compositionality in Large Vision-Language
Models,
CVPR24(13785-13795)
IEEE DOI
2410
Training, Training data, Games, Contrastive learning,
Benchmark testing, Performance gain, Cognitive science
BibRef
Song, C.H.[Chull Hwan],
Hwang, T.[Taebaek],
Yoon, J.Y.[Joo-Young],
Choi, S.[Shunghyun],
Gu, Y.H.[Yeong Hyeon],
SyncMask: Synchronized Attentional Masking for Fashion-centric
Vision-Language Pretraining,
CVPR24(13948-13957)
IEEE DOI
2410
Training, Visualization, Image segmentation, Image resolution,
Refining, Contrastive learning
BibRef
Pramanick, S.[Shraman],
Han, G.X.[Guang-Xing],
Hou, R.[Rui],
Nag, S.[Sayan],
Lim, S.N.[Ser-Nam],
Ballas, N.[Nicolas],
Wang, Q.F.[Qi-Fan],
Chellappa, R.[Rama],
Almahairi, A.[Amjad],
Jack of All Tasks, Master of Many: Designing General-purpose
Coarse-to-Fine Vision-Language Model,
CVPR24(14076-14088)
IEEE DOI Code:
WWW Link.
2410
Image segmentation, Visualization, Image coding, Filters, Grounding,
Machine vision, Visual systems
BibRef
Zeng, Y.[Yunan],
Huang, Y.[Yan],
Zhang, J.J.[Jin-Jin],
Jie, Z.Q.[Ze-Qun],
Chai, Z.H.[Zhen-Hua],
Wang, L.[Liang],
Investigating Compositional Challenges in Vision-Language Models for
Visual Grounding,
CVPR24(14141-14151)
IEEE DOI
2410
Visualization, Codes, Grounding, Annotations, Pipelines, Benchmark testing
BibRef
Karmanov, A.[Adilbek],
Guan, D.[Dayan],
Lu, S.J.[Shi-Jian],
El Saddik, A.[Abdulmotaleb],
Xing, E.[Eric],
Efficient Test-Time Adaptation of Vision-Language Models,
CVPR24(14162-14171)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Computational modeling, Noise,
Predictive models, Benchmark testing
BibRef
Sameni, S.[Sepehr],
Kafle, K.[Kushal],
Tan, H.[Hao],
Jenni, S.[Simon],
Building Vision-Language Models on Solid Foundations with Masked
Distillation,
CVPR24(14216-14226)
IEEE DOI
2410
Training, Solid modeling, Visualization, Computational modeling,
Semantic segmentation, Buildings, LLM
BibRef
Peng, W.[Wujian],
Xie, S.C.[Si-Cheng],
You, Z.[Zuyao],
Lan, S.Y.[Shi-Yi],
Wu, Z.X.[Zu-Xuan],
Synthesize, Diagnose, and Optimize: Towards Fine-Grained
Vision-Language Understanding,
CVPR24(13279-13288)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Pipelines, Benchmark testing,
Linguistics, Vision language model, Fine-grained understdanding
BibRef
Zhao, Y.[Yue],
Zhao, L.[Long],
Zhou, X.Y.[Xing-Yi],
Wu, J.L.[Jia-Lin],
Chu, C.T.[Chun-Te],
Miao, H.[Hui],
Schroff, F.[Florian],
Adam, H.[Hartwig],
Liu, T.[Ting],
Gong, B.Q.[Bo-Qing],
Krähenbühl, P.[Philipp],
Yuan, L.Z.[Liang-Zhe],
Distilling Vision-Language Models on Millions of Videos,
CVPR24(13106-13116)
IEEE DOI
2410
Adaptation models, Computational modeling, Benchmark testing,
Data models, Text to video
BibRef
Chen, J.N.[Jie-Neng],
Yu, Q.H.[Qi-Hang],
Shen, X.H.[Xiao-Hui],
Yuille, A.L.[Alan L.],
Chen, L.C.[Liang-Chieh],
ViTamin: Designing Scalable Vision Models in the Vision-Language Era,
CVPR24(12954-12966)
IEEE DOI
2410
Training, Image segmentation, Accuracy, Protocols, Image coding, Scalability,
Computational modeling, Vision-Language Models, Architectural Design
BibRef
Liu, S.H.[Shi-Hong],
Yu, S.[Samuel],
Lin, Z.Q.[Zhi-Qiu],
Pathak, D.[Deepak],
Ramanan, D.[Deva],
Language Models as Black-Box Optimizers for Vision-Language Models,
CVPR24(12687-12697)
IEEE DOI
2410
Computational modeling, Natural languages, Closed box,
Text to image, Human in the loop, Data models,
generative models
BibRef
Howard, P.[Phillip],
Madasu, A.[Avinash],
Le, T.[Tiep],
Moreno, G.L.[Gustavo Lujan],
Bhiwandiwalla, A.[Anahita],
Lal, V.[Vasudev],
SocialCounterfactuals: Probing and Mitigating Intersectional Social
Biases in Vision-Language Models with Counterfactual Examples,
CVPR24(11975-11985)
IEEE DOI
2410
Training, Prevention and mitigation, Text to image,
Diffusion models, Fairness, social bias,
counterfactuals
BibRef
Jiang, Y.K.[Yan-Kai],
Huang, Z.Z.[Zhong-Zhen],
Zhang, R.Z.[Rong-Zhao],
Zhang, X.F.[Xiao-Fan],
Zhang, S.T.[Shao-Ting],
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and
Self-Prompting,
CVPR24(11386-11397)
IEEE DOI
2410
Training, Visualization, Pathology, Image segmentation,
Image analysis, Computational modeling, Vision-Language Model
BibRef
Kim, Y.[Younghyun],
Mo, S.[Sangwoo],
Kim, M.[Minkyu],
Lee, K.[Kyungmin],
Lee, J.[Jaeho],
Shin, J.[Jinwoo],
Discovering and Mitigating Visual Biases Through Keyword Explanation,
CVPR24(11082-11092)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Image recognition, Computational modeling,
Training data, Flowering plants, bias and fairness, explainable AI,
vision-language model
BibRef
Li, R.[Rui],
Fischer, T.[Tobias],
Segu, M.[Mattia],
Pollefeys, M.[Marc],
Van Gool, L.J.[Luc J.],
Tombari, F.[Federico],
Know Your Neighbors: Improving Single-View Reconstruction via Spatial
Vision-Language Reasoning,
CVPR24(9848-9858)
IEEE DOI Code:
WWW Link.
2410
Geometry, Visualization, Attention mechanisms, Shape, Semantics,
radiance field, vision-language model, spatial context, spatial attention
BibRef
Zeng, Z.[Ziyao],
Wang, D.[Daniel],
Yang, F.Y.[Feng-Yu],
Park, H.[Hyoungseob],
Soatto, S.[Stefano],
Lao, D.[Dong],
Wong, A.[Alex],
WorDepth: Variational Language Prior for Monocular Depth Estimation,
CVPR24(9708-9719)
IEEE DOI Code:
WWW Link.
2410
Measurement, Codes, Estimation, Encoding,
Monocular Depth Estimation, Vision-Language Model, Variational Model
BibRef
Hu, Y.S.[Yu-Shi],
Stretcu, O.[Otilia],
Lu, C.T.[Chun-Ta],
Viswanathan, K.[Krishnamurthy],
Hata, K.[Kenji],
Luo, E.[Enming],
Krishna, R.[Ranjay],
Fuxman, A.[Ariel],
Visual Program Distillation: Distilling Tools and Programmatic
Reasoning into Vision-Language Models,
CVPR24(9590-9601)
IEEE DOI
2410
Visualization, Adaptation models, Computational modeling,
Instruments, Loading, Music, Cognition, vision-language model,
tools
BibRef
Silva-Rodríguez, J.[Julio],
Hajimiri, S.[Sina],
Ben Ayed, I.[Ismail],
Dolz, J.[Jose],
A Closer Look at the Few-Shot Adaptation of Large Vision-Language
Models,
CVPR24(23681-23690)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Computational modeling,
Transfer learning, Probes
BibRef
Zanella, M.[Maxime],
Ben Ayed, I.[Ismail],
Low-Rank Few-Shot Adaptation of Vision-Language Models,
Prompting24(1593-1603)
IEEE DOI
2410
Training, Adaptation models, Design methodology,
Few shot learning, Vision-Language, few-shot,
adapter
BibRef
Wang, W.X.[Wen-Xuan],
He, X.J.[Xing-Jian],
Zhang, Y.[Yisi],
Guo, L.T.[Long-Teng],
Shen, J.C.[Jia-Chen],
Li, J.Y.[Jiang-Yun],
Liu, J.[Jing],
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring
Image Segmentation,
MultMed(26), 2024, pp. 6906-6916.
IEEE DOI
2405
Image segmentation, Visualization, Task analysis, Correlation,
Feature extraction, Transformers, Semantics, vision and language
BibRef
Sahin, U.[Ugur],
Li, H.[Hang],
Khan, Q.[Qadeer],
Cremers, D.[Daniel],
Tresp, V.[Volker],
Enhancing Multimodal Compositional Reasoning of Visual Language
Models with Generative Negative Mining,
WACV24(5551-5561)
IEEE DOI Code:
HTML Version.
2404
Training, Visualization, Codes, Pipelines, Self-supervised learning,
Cognition, Algorithms, Vision + language and/or other modalities
BibRef
Yang, C.[Cheng],
Xu, R.[Rui],
Guo, Y.[Ye],
Huang, P.X.[Pei-Xiang],
Chen, Y.[Yiru],
Ding, W.[Wenkui],
Wang, Z.Y.[Zhong-Yuan],
Zhou, H.[Hong],
Improving Vision-and-Language Reasoning via Spatial Relations
Modeling,
WACV24(758-767)
IEEE DOI
2404
Visualization, Analytical models, Graphical models,
Statistical analysis, Computational modeling, Excavation,
Vision + language and/or other modalities
BibRef
Shen, S.[Sheng],
Yang, S.[Shijia],
Zhang, T.J.[Tian-Jun],
Zhai, B.[Bohan],
Gonzalez, J.E.[Joseph E.],
Keutzer, K.[Kurt],
Darrell, T.J.[Trevor J.],
Multitask Vision-Language Prompt Tuning,
WACV24(5644-5655)
IEEE DOI
2404
Learning systems, Visualization, Adaptation models,
Benchmark testing, Vectors, Task analysis, Algorithms,
Vision + language and/or other modalities
BibRef
Zhang, G.[Gengyuan],
Zhang, Y.R.[Yu-Rui],
Zhang, K.[Kerui],
Tresp, V.[Volker],
Can Vision-Language Models be a Good Guesser? Exploring VLMs for
Times and Location Reasoning,
WACV24(625-634)
IEEE DOI Code:
WWW Link.
2404
Visualization, Computational modeling, Feature extraction,
Cognition, Task analysis, Commonsense reasoning, Algorithms,
Vision + language and/or other modalities
BibRef
Ganz, R.[Roy],
Nuriel, O.[Oren],
Aberdam, A.[Aviad],
Kittenplon, Y.[Yair],
Mazor, S.[Shai],
Litman, R.[Ron],
Towards Models that Can See and Read,
ICCV23(21661-21671)
IEEE DOI
2401
BibRef
Zhang, H.[Heng],
Liu, D.[Daqing],
Lv, Z.[Zezhong],
Su, B.[Bing],
Tao, D.C.[Da-Cheng],
Exploring Temporal Concurrency for Video-Language Representation
Learning,
ICCV23(15522-15532)
IEEE DOI Code:
WWW Link.
2401
BibRef
Shukor, M.[Mustafa],
Dancette, C.[Corentin],
Cord, M.[Matthieu],
eP-ALM: Efficient Perceptual Augmentation of Language Models,
ICCV23(21999-22012)
IEEE DOI Code:
WWW Link.
2401
BibRef
Schulter, S.[Samuel],
Kumar, B.G.V.[B.G. Vijay],
Suh, Y.M.[Yu-Min],
Dafnis, K.M.[Konstantinos M.],
Zhang, Z.X.[Zhi-Xing],
Zhao, S.Y.[Shi-Yu],
Metaxas, D.N.[Dimitris N.],
OmniLabel: A Challenging Benchmark for Language-Based Object
Detection,
ICCV23(11919-11928)
IEEE DOI Code:
WWW Link.
2401
BibRef
Chen, Z.L.[Zi-Liang],
Huang, X.[Xin],
Guan, Q.L.[Quan-Long],
Lin, L.[Liang],
Luo, W.Q.[Wei-Qi],
A Retrospect to Multi-prompt Learning across Vision and Language,
ICCV23(22133-22144)
IEEE DOI
2401
BibRef
Derakhshani, M.M.[Mohammad Mahdi],
Sanchez, E.[Enrique],
Bulat, A.[Adrian],
da Costa, V.G.T.[Victor Guilherme Turrisi],
Snoek, C.G.M.[Cees G. M.],
Tzimiropoulos, G.[Georgios],
Martinez, B.[Brais],
Bayesian Prompt Learning for Image-Language Model Generalization,
ICCV23(15191-15200)
IEEE DOI Code:
WWW Link.
2401
BibRef
Cascante-Bonilla, P.[Paola],
Shehada, K.[Khaled],
Smith, J.S.[James Seale],
Doveh, S.[Sivan],
Kim, D.H.[Dong-Hyun],
Panda, R.[Rameswar],
Varol, G.[Gül],
Oliva, A.[Aude],
Ordonez, V.[Vicente],
Feris, R.S.[Rogerio S.],
Karlinsky, L.[Leonid],
Going Beyond Nouns With Vision & Language Models Using Synthetic
Data,
ICCV23(20098-20108)
IEEE DOI
2401
BibRef
Upadhyay, U.[Uddeshya],
Karthik, S.[Shyamgopal],
Mancini, M.[Massimiliano],
Akata, Z.[Zeynep],
ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models,
ICCV23(1899-1910)
IEEE DOI Code:
WWW Link.
2401
BibRef
Bitton-Guetta, N.[Nitzan],
Bitton, Y.[Yonatan],
Hessel, J.[Jack],
Schmidt, L.[Ludwig],
Elovici, Y.[Yuval],
Stanovsky, G.[Gabriel],
Schwartz, R.[Roy],
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
Synthetic and Compositional Images,
ICCV23(2616-2627)
IEEE DOI
2401
BibRef
Hu, Z.Y.[Zi-Yuan],
Li, Y.Y.[Yan-Yang],
Lyu, M.R.[Michael R.],
Wang, L.W.[Li-Wei],
VL-PET: Vision-and-Language Parameter-Efficient Tuning via
Granularity Control,
ICCV23(2998-3008)
IEEE DOI Code:
WWW Link.
2401
BibRef
Slyman, E.[Eric],
Kahng, M.[Minsuk],
Lee, S.[Stefan],
VLSlice: Interactive Vision-and-Language Slice Discovery,
ICCV23(15245-15255)
IEEE DOI
2401
BibRef
Najibi, M.[Mahyar],
Ji, J.W.[Jing-Wei],
Zhou, Y.[Yin],
Qi, C.R.[Charles R.],
Yan, X.C.[Xin-Chen],
Ettinger, S.[Scott],
Anguelov, D.[Dragomir],
Unsupervised 3D Perception with 2D Vision-Language Distillation for
Autonomous Driving,
ICCV23(8568-8578)
IEEE DOI
2401
BibRef
Xu, H.[Hu],
Xie, S.[Saining],
Huang, P.Y.[Po-Yao],
Yu, L.C.[Li-Cheng],
Howes, R.[Russell],
Ghosh, G.[Gargi],
Zettlemoyer, L.[Luke],
Feichtenhofer, C.[Christoph],
CiT: Curation in Training for Effective Vision-Language Data,
ICCV23(15134-15143)
IEEE DOI
2401
BibRef
Trager, M.[Matthew],
Perera, P.[Pramuditha],
Zancato, L.[Luca],
Achille, A.[Alessandro],
Bhatia, P.[Parminder],
Soatto, S.[Stefano],
Linear Spaces of Meanings: Compositional Structures in
Vision-Language Models,
ICCV23(15349-15358)
IEEE DOI
2401
BibRef
Chen, Y.S.[Yi-Syuan],
Song, Y.Z.[Yun-Zhu],
Yeo, C.Y.[Cheng Yu],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks,
ICCV23(15384-15396)
IEEE DOI
2401
BibRef
Wu, C.E.[Cheng-En],
Tian, Y.[Yu],
Yu, H.C.[Hai-Chao],
Wang, H.[Heng],
Morgado, P.[Pedro],
Hu, Y.H.[Yu Hen],
Yang, L.J.[Lin-Jie],
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy
Labels?,
ICCV23(15442-15451)
IEEE DOI Code:
WWW Link.
2401
BibRef
Ouali, Y.[Yassine],
Bulat, A.[Adrian],
Matinez, B.[Brais],
Tzimiropoulos, G.[Georgios],
Black Box Few-Shot Adaptation for Vision-Language models,
ICCV23(15488-15500)
IEEE DOI Code:
WWW Link.
2401
BibRef
Kan, B.[Baoshuo],
Wang, T.[Teng],
Lu, W.P.[Wen-Peng],
Zhen, X.T.[Xian-Tong],
Guan, W.[Weili],
Zheng, F.[Feng],
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language
Models,
ICCV23(15624-15634)
IEEE DOI
2401
BibRef
Zhai, J.T.[Jiang-Tian],
Zhang, Q.[Qi],
Wu, T.[Tong],
Chen, X.Y.[Xing-Yu],
Liu, J.J.[Jiang-Jiang],
Cheng, M.M.[Ming-Ming],
SLAN: Self-Locator Aided Network for Vision-Language Understanding,
ICCV23(21892-21901)
IEEE DOI Code:
WWW Link.
2401
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Yuan, J.[Junkun],
Tan, Z.C.[Zi-Chang],
Liu, J.J.[Jiang-Jiang],
Zhou, L.P.[Lu-Ping],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models,
ICCV23(21902-21912)
IEEE DOI
2401
BibRef
Cho, E.[Eulrang],
Kim, J.[Jooyeon],
Kim, H.W.J.[Hyun-Woo J.],
Distribution-Aware Prompt Tuning for Vision-Language Models,
ICCV23(21947-21956)
IEEE DOI Code:
WWW Link.
2401
BibRef
Varma, M.[Maya],
Delbrouck, J.B.[Jean-Benoit],
Hooper, S.[Sarah],
Chaudhari, A.[Akshay],
Langlotz, C.[Curtis],
ViLLA: Fine-Grained Vision-Language Representation Learning from
Real-World Data,
ICCV23(22168-22178)
IEEE DOI
2401
BibRef
Zhu, H.G.[Hong-Guang],
Wei, Y.C.[Yun-Chao],
Liang, X.D.[Xiao-Dan],
Zhang, C.J.[Chun-Jie],
Zhao, Y.[Yao],
CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation,
ICCV23(22200-22210)
IEEE DOI Code:
WWW Link.
2401
BibRef
Hu, Z.Z.[Zhi-Zhang],
Zhu, X.L.[Xin-Liang],
Tran, S.[Son],
Vidal, R.[René],
Dhua, A.[Arnab],
ProVLA: Compositional Image Search with Progressive Vision-Language
Alignment and Multimodal Fusion,
CLVL23(2764-2769)
IEEE DOI
2401
BibRef
Hall, M.[Melissa],
Gustafson, L.[Laura],
Adcock, A.[Aaron],
Misra, I.[Ishan],
Ross, C.[Candace],
Vision-Language Models Performing Zero-Shot Tasks Exhibit Disparities
Between Gender Groups,
CLVL23(2770-2777)
IEEE DOI
2401
BibRef
Agnolucci, L.[Lorenzo],
Baldrati, A.[Alberto],
Todino, F.[Francesco],
Becattini, F.[Federico],
Bertini, M.[Marco],
del Bimbo, A.[Alberto],
ECO: Ensembling Context Optimization for Vision-Language Models,
CLVL23(2803-2807)
IEEE DOI
2401
BibRef
Palit, V.[Vedant],
Pandey, R.[Rohan],
Arora, A.[Aryaman],
Liang, P.P.[Paul Pu],
Towards Vision-Language Mechanistic Interpretability: A Causal
Tracing Tool for BLIP,
CLVL23(2848-2853)
IEEE DOI
2401
BibRef
Sammani, F.[Fawaz],
Deligiannis, N.[Nikos],
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language
Tasks,
VLAR23(4636-4641)
IEEE DOI
2401
BibRef
Lee, D.J.[Dong-Jun],
Song, S.[Seokwon],
Suh, J.[Jihee],
Choi, J.[Joonmyeong],
Lee, S.[Sanghyeok],
Kim, H.W.J.[Hyun-Woo J.],
Read-only Prompt Optimization for Vision-Language Few-shot Learning,
ICCV23(1401-1411)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, X.[Xuanlin],
Fang, Y.H.[Yun-Hao],
Liu, M.H.[Ming-Hua],
Ling, Z.[Zhan],
Tu, Z.W.[Zhuo-Wen],
Su, H.[Hao],
Distilling Large Vision-Language Model with Out-of-Distribution
Generalizability,
ICCV23(2492-2503)
IEEE DOI
2401
BibRef
Li, J.C.[Jun-Cheng],
Gao, M.[Minghe],
Wei, L.H.[Long-Hui],
Tang, S.L.[Si-Liang],
Zhang, W.Q.[Wen-Qiao],
Li, M.Z.[Meng-Ze],
Ji, W.[Wei],
Tian, Q.[Qi],
Chua, T.S.[Tat-Seng],
Zhuang, Y.T.[Yue-Ting],
Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models,
ICCV23(2551-2562)
IEEE DOI
2401
BibRef
Bi, J.Y.[Jun-Yu],
Cheng, D.[Daixuan],
Yao, P.[Ping],
Pang, B.[Bochen],
Zhan, Y.F.[Yue-Feng],
Yang, C.G.[Chuan-Guang],
Wang, Y.J.[Yu-Jing],
Sun, H.[Hao],
Deng, W.W.[Wei-Wei],
Zhang, Q.[Qi],
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and
Instance-Level Matching,
ICCV23(2584-2593)
IEEE DOI
2401
BibRef
Udandarao, V.[Vishaal],
Gupta, A.[Ankush],
Albanie, S.[Samuel],
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models,
ICCV23(2725-2736)
IEEE DOI Code:
WWW Link.
2401
BibRef
Jiang, C.Y.[Chao-Ya],
Xu, H.Y.[Hai-Yang],
Ye, W.[Wei],
Ye, Q.H.[Qing-Hao],
Li, C.L.[Chen-Liang],
Yan, M.[Ming],
Bi, B.[Bin],
Zhang, S.K.[Shi-Kun],
Huang, F.[Fei],
Huang, S.F.[Song-Fang],
BUS: Efficient and Effective Vision-language Pre-training with
Bottom-Up Patch Summarization,
ICCV23(2888-2898)
IEEE DOI
2401
BibRef
Shi, C.[Cheng],
Yang, S.[Sibei],
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for
Vision-Language Models,
ICCV23(2920-2929)
IEEE DOI
2401
BibRef
Wang, A.J.P.[Alex Jin-Peng],
Lin, K.Q.[Kevin Qinghong],
Zhang, D.J.H.[David Jun-Hao],
Lei, S.W.X.[Stan Wei-Xian],
Shou, M.Z.[Mike Zheng],
Too Large; Data Reduction for Vision-Language Pre-Training,
ICCV23(3124-3134)
IEEE DOI
2401
BibRef
Wang, W.H.[Wei-Han],
Yang, Z.[Zhen],
Xu, B.[Bin],
Li, J.[Juanzi],
Sun, Y.K.[Yan-Kui],
ViLTA: Enhancing Vision-Language Pre-training through Textual
Augmentation,
ICCV23(3135-3146)
IEEE DOI
2401
BibRef
Boecking, B.[Benedikt],
Usuyama, N.[Naoto],
Bannur, S.[Shruthi],
Castro, D.C.[Daniel C.],
Schwaighofer, A.[Anton],
Hyland, S.[Stephanie],
Wetscherek, M.[Maria],
Naumann, T.[Tristan],
Nori, A.[Aditya],
Alvarez-Valle, J.[Javier],
Poon, H.[Hoifung],
Oktay, O.[Ozan],
Making the Most of Text Semantics to Improve Biomedical Vision-Language
Processing,
ECCV22(XXXVI:1-21).
Springer DOI
2211
BibRef
Cui, Q.[Quan],
Zhou, B.[Boyan],
Guo, Y.[Yu],
Yin, W.D.[Wei-Dong],
Wu, H.[Hao],
Yoshie, O.[Osamu],
Chen, Y.[Yubo],
Contrastive Vision-Language Pre-training with Limited Resources,
ECCV22(XXXVI:236-253).
Springer DOI
2211
BibRef
Hu, X.W.[Xiao-Wei],
Gan, Z.[Zhe],
Wang, J.F.[Jian-Feng],
Yang, Z.Y.[Zheng-Yuan],
Liu, Z.C.[Zi-Cheng],
Lu, Y.[Yumao],
Wang, L.J.[Li-Juan],
Scaling Up Vision-Language Pretraining for Image Captioning,
CVPR22(17959-17968)
IEEE DOI
2210
Training, Visualization, Computational modeling, Training data,
Benchmark testing, Transformers, Feature extraction, Vision + language
BibRef
Zhang, P.C.[Peng-Chuan],
Li, X.J.[Xiu-Jun],
Hu, X.W.[Xiao-Wei],
Yang, J.W.[Jian-Wei],
Zhang, L.[Lei],
Wang, L.J.[Li-Juan],
Choi, Y.J.[Ye-Jin],
Gao, J.F.[Jian-Feng],
VinVL: Revisiting Visual Representations in Vision-Language Models,
CVPR21(5575-5584)
IEEE DOI
2111
Training, Visualization, Computational modeling, Object detection,
Benchmark testing, Feature extraction, Transformers
BibRef
Li, Z.W.[Zhuo-Wan],
Stengel-Eskin, E.[Elias],
Zhang, Y.X.[Yi-Xiao],
Xie, C.[Cihang],
Tran, Q.[Quan],
van Durme, B.[Benjamin],
Yuille, A.L.[Alan L.],
Calibrating Concepts and Operations:
Towards Symbolic Reasoning on Real Images,
ICCV21(14890-14899)
IEEE DOI
2203
Visualization, Analytical models, Codes, Computational modeling,
Cognition, Data models, Vision + language
BibRef
Yang, X.[Xu],
Zhang, H.W.[Han-Wang],
Qi, G.J.[Guo-Jun],
Cai, J.F.[Jian-Fei],
Causal Attention for Vision-Language Tasks,
CVPR21(9842-9852)
IEEE DOI
2111
Correlation, Codes, Computational modeling,
Training data, Transformers, Data models
BibRef
Stefanini, M.[Matteo],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
A Novel Attention-based Aggregation Function to Combine Vision and
Language,
ICPR21(1212-1219)
IEEE DOI
2105
Deep learning, Visualization, Image retrieval,
Transforms, Knowledge discovery
BibRef
Zheng, W.B.[Wen-Bo],
Yan, L.[Lan],
Gou, C.[Chao],
Wang, F.Y.[Fei-Yue],
Webly Supervised Knowledge Embedding Model for Visual Reasoning,
CVPR20(12442-12451)
IEEE DOI
2008
Visual reasoning between visual image and natural language description.
Visualization, Cognition, Knowledge based systems, Task analysis,
Knowledge engineering, Modulation, Robustness
BibRef
Nguyen, D.K.[Duy-Kien],
Okatani, T.[Takayuki],
Multi-Task Learning of Hierarchical Vision-Language Representation,
CVPR19(10484-10493).
IEEE DOI
2002
BibRef
Gupta, T.[Tanmay],
Shih, K.J.[Kevin J.],
Singh, S.[Saurabh],
Hoiem, D.[Derek],
Aligned Image-Word Representations Improve Inductive Transfer Across
Vision-Language Tasks,
ICCV17(4223-4232)
IEEE DOI
1802
data visualisation, image recognition,
learning (artificial intelligence),
Visualization
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Large Language Models for Vision, LLM, LVLM .