Tamaazousti, Y.[Youssef],
Le Borgne, H.[Hervé],
Popescu, A.[Adrian],
Gadeski, E.[Etienne],
Ginsca, A.[Alexandru],
Hudelot, C.[Céline],
Vision-language integration using constrained local semantic features,
CVIU(163), No. 1, 2017, pp. 41-57.
Elsevier DOI
1712
Image classification
BibRef
Gouthaman, K.V.,
Nambiar, A.[Athira],
Srinivas, K.S.[Kancheti Sai],
Mittal, A.[Anurag],
Linguistically-aware attention for reducing the semantic gap in
vision-language tasks,
PR(112), 2021, pp. 107812.
Elsevier DOI
2102
Attention models, Visual question answering,
Counting in visual question answering, Image captioning
BibRef
Zhou, K.Y.[Kai-Yang],
Yang, J.K.[Jing-Kang],
Loy, C.C.[Chen Change],
Liu, Z.W.[Zi-Wei],
Learning to Prompt for Vision-Language Models,
IJCV(130), No. 9, September 2022, pp. 2337-2348.
Springer DOI
2208
BibRef
Zhou, K.Y.[Kai-Yang],
Yang, J.K.[Jing-Kang],
Loy, C.C.[Chen Change],
Liu, Z.W.[Zi-Wei],
Conditional Prompt Learning for Vision-Language Models,
CVPR22(16795-16804)
IEEE DOI
2210
Training, Representation learning, Adaptation models,
Neural networks, Manuals, Representation learning
BibRef
Ma, C.C.[Cheng-Cheng],
Liu, Y.[Yang],
Deng, J.K.[Jian-Kang],
Xie, L.X.[Ling-Xi],
Dong, W.M.[Wei-Ming],
Xu, C.S.[Chang-Sheng],
Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models,
CirSysVideo(33), No. 9, September 2023, pp. 4616-4629.
IEEE DOI Code:
WWW Link.
2310
BibRef
Zhu, Y.Q.[Yong-Qing],
Li, X.Y.[Xiang-Yang],
Zheng, M.[Mao],
Yang, J.H.[Jia-Hao],
Wang, Z.H.[Zi-Han],
Guo, X.Q.[Xiao-Qian],
Chai, Z.F.[Zi-Feng],
Yuan, Y.C.[Yu-Chen],
Jiang, S.Q.[Shu-Qiang],
Focus and Align: Learning Tube Tokens for Video-Language Pre-Training,
MultMed(25), 2023, pp. 8036-8050.
IEEE DOI
2312
BibRef
Chen, C.Q.[Chong-Qing],
Han, D.[Dezhi],
Chang, C.C.[Chin-Chen],
MPCCT: Multimodal vision-language learning paradigm with
context-based compact Transformer,
PR(147), 2024, pp. 110084.
Elsevier DOI Code:
WWW Link.
2312
Multimodal vision-language paradigms,
High-dependency modeling, Visual question answering (VQA),
Logical relationship reasoning
BibRef
Wu, W.H.[Wen-Hao],
Sun, Z.[Zhun],
Song, Y.X.[Yu-Xin],
Wang, J.D.[Jing-Dong],
Ouyang, W.L.[Wan-Li],
Transferring Vision-Language Models for Visual Recognition:
A Classifier Perspective,
IJCV(132), No. 2, February 2024, pp. 392-409.
Springer DOI
2402
BibRef
Ming, Y.F.[Yi-Fei],
Li, Y.X.[Yi-Xuan],
How Does Fine-Tuning Impact Out-of-Distribution Detection for
Vision-Language Models?,
IJCV(132), No. 2, February 2024, pp. 596-609.
Springer DOI
2402
BibRef
Zhao, C.R.[Cai-Rong],
Wang, Y.[Yubin],
Jiang, X.Y.[Xin-Yang],
Shen, Y.F.[Yi-Fei],
Song, K.[Kaitao],
Li, D.S.[Dong-Sheng],
Miao, D.Q.[Duo-Qian],
Learning Domain Invariant Prompt for Vision-Language Models,
IP(33), 2024, pp. 1348-1360.
IEEE DOI
2402
Task analysis, Tuning, Training, Adaptation models, Visualization,
Image color analysis, Self-supervised learning, Prompt learning,
domain generalization
BibRef
Yang, X.F.[Xiao-Feng],
Liu, F.[Fayao],
Lin, G.S.[Guo-Sheng],
Neural Logic Vision Language Explainer,
MultMed(26), 2024, pp. 3331-3340.
IEEE DOI
2402
Cognition, Logic programming, Deep learning, Visualization,
Data models, Training, Markov processes,
vision language pretraining
BibRef
Wang, Y.D.[Yi-Dong],
Yu, Z.O.[Zhu-Ohao],
Wang, J.D.[Jin-Dong],
Heng, Q.[Qiang],
Chen, H.[Hao],
Ye, W.[Wei],
Xie, R.[Rui],
Xie, X.[Xing],
Zhang, S.K.[Shi-Kun],
Exploring Vision-Language Models for Imbalanced Learning,
IJCV(132), No. 1, January 2024, pp. 224-237.
Springer DOI
2402
BibRef
Yu, Z.T.[Zheng-Tao],
Zhao, J.[Jia],
Guo, C.L.[Chen-Liang],
Yang, Y.[Ying],
StableNet: Distinguishing the hard samples to overcome language
priors in visual question answering,
IET-CV(18), No. 2, 2024, pp. 315-327.
DOI Link
2403
multimedia systems
BibRef
Zeng, Y.[Yan],
Zhang, X.[Xinsong],
Li, H.[Hang],
Wang, J.W.[Jia-Wei],
Zhang, J.P.[Ji-Peng],
Zhou, W.[Wangchunshu],
X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks,
PAMI(46), No. 5, May 2024, pp. 3156-3168.
IEEE DOI
2404
Task analysis, Visualization, Transformers, Detectors, Training,
Feature extraction, Image coding,
vision language pre-training
BibRef
Zheng, Y.Z.[Yao-Zong],
Zhong, B.[Bineng],
Liang, Q.H.[Qi-Hua],
Li, G.R.[Guo-Rong],
Ji, R.R.[Rong-Rong],
Li, X.X.[Xian-Xian],
Toward Unified Token Learning for Vision-Language Tracking,
CirSysVideo(34), No. 4, April 2024, pp. 2125-2135.
IEEE DOI
2404
Task analysis, Target tracking, Visualization, Feature extraction,
Pipelines, Linguistics, Training, Vision-language tracking,
multi-modal modeling
BibRef
Ye, P.[Ping],
Xiao, G.[Gang],
Liu, J.[Jun],
Multimodal Features Alignment for Vision-Language Object Tracking,
RS(16), No. 7, 2024, pp. 1168.
DOI Link
2404
BibRef
Bazi, Y.[Yakoub],
Bashmal, L.[Laila],
Rahhal, M.M.A.[Mohamad Mahmoud Al],
Ricci, R.[Riccardo],
Melgani, F.[Farid],
RS-LLaVA: A Large Vision-Language Model for Joint Captioning and
Question Answering in Remote Sensing Imagery,
RS(16), No. 9, 2024, pp. 1477.
DOI Link
2405
BibRef
Kong, D.[Daehyeon],
Kong, K.[Kyeongbo],
Kang, S.J.[Suk-Ju],
Image clustering using generated text centroids,
SP:IC(125), 2024, pp. 117128.
Elsevier DOI
2405
Deep neural network, Image clustering, Multimodal task, Vision-language model
BibRef
Chen, X.Y.[Xian-Yu],
Yang, J.H.[Jin-Hui],
Chen, S.[Shi],
Wang, L.[Louis],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Every Problem, Every Step, All in Focus: Learning to Solve
Vision-Language Problems With Integrated Attention,
PAMI(46), No. 7, July 2024, pp. 4720-4735.
IEEE DOI
2406
Problem-solving, Task analysis, Visualization, Measurement,
Graph neural networks, Cognition, Videos, Graph attention,
vision-language problem solving
BibRef
Menon, S.[Sachit],
Chandratreya, I.P.[Ishaan Preetam],
Vondrick, C.[Carl],
Task Bias in Contrastive Vision-Language Models,
IJCV(132), No. 6, June 2024, pp. 2026-2040.
Springer DOI
2406
BibRef
Zhang, J.Y.[Jing-Yi],
Huang, J.X.[Jia-Xing],
Jin, S.[Sheng],
Lu, S.J.[Shi-Jian],
Vision-Language Models for Vision Tasks: A Survey,
PAMI(46), No. 8, August 2024, pp. 5625-5644.
IEEE DOI
2407
Task analysis, Visualization, Training, Deep learning, Surveys,
Data models, Predictive models, Big Data, big model, deep learning,
image classification
BibRef
Dong, M.P.[Meng-Ping],
Li, F.[Fei],
Li, Z.B.[Zhen-Bo],
Liu, X.[Xue],
Cluster prototype earth mover's distance adapters and
alignment-guided prompt learning for vision-language models,
PR(156), 2024, pp. 110861.
Elsevier DOI
2408
Cluster prototype, Earth mover's distance, Adapter,
Prompt learning, Vision-language models
BibRef
Liu, Y.[Ye],
Pan, Y.[Yan],
Yin, J.[Jian],
Enhancing Multi-Label Deep Hashing for Image and Audio With Joint
Internal Global Loss Constraints and Large Vision-Language Model,
SPLetters(31), 2024, pp. 2550-2554.
IEEE DOI
2410
Codes, Transformers, Adaptation models, Training,
Convolutional neural networks, Feature extraction,
vision transformer
BibRef
Zhan, C.[Chenlu],
Zhang, Y.F.[Yu-Fei],
Lin, Y.[Yu],
Wang, G.[Gaoang],
Wang, H.W.[Hong-Wei],
UniDCP: Unifying Multiple Medical Vision-Language Tasks via Dynamic
Cross-Modal Learnable Prompts,
MultMed(26), 2024, pp. 9736-9748.
IEEE DOI
2410
Task analysis, Adaptation models, Visualization,
Medical diagnostic imaging, Tuning, Multitasking, Plastics,
cross-modal shareable space
BibRef
Su, K.[Ke],
Zhang, X.X.[Xing-Xing],
Zhang, S.Y.[Si-Yang],
Zhu, J.[Jun],
Zhang, B.[Bo],
To Boost Zero-Shot Generalization for Embodied Reasoning With
Vision-Language Pre-Training,
IP(33), 2024, pp. 5370-5381.
IEEE DOI
2410
Cognition, Visualization, Artificial intelligence, Training,
Image reconstruction, Navigation, vision-language pre-training
BibRef
Xuan, S.Y.[Shi-Yu],
Yang, M.[Ming],
Zhang, S.L.[Shi-Liang],
Adapting Vision-Language Models via Learning to Inject Knowledge,
IP(33), 2024, pp. 5798-5809.
IEEE DOI
2410
Feature extraction, Visualization, Adaptation models, Tuning,
Training, Transformers, Dogs, Accuracy, Robustness, Few shot learning,
knowledge injection
BibRef
Zhou, W.[Wenlve],
Zhou, Z.H.[Zhi-Heng],
Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training,
CirSysVideo(34), No. 9, September 2024, pp. 8201-8214.
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Task analysis, Training, Computational modeling, Tuning,
Data models, Visualization, Unsupervised domain adaptation, model deployment
BibRef
Guo, M.H.[Meng-Hao],
Zhang, Y.[Yi],
Mu, T.J.[Tai-Jiang],
Huang, S.X.[Sharon X.],
Hu, S.M.[Shi-Min],
Tuning Vision-Language Models With Multiple Prototypes Clustering,
PAMI(46), No. 12, December 2024, pp. 11186-11199.
IEEE DOI
2411
Prototypes, Adaptation models, Tuning, Visualization,
Benchmark testing, Computational modeling, Data models, clustering
BibRef
Sun, B.[Bo],
Wu, Z.C.[Zhi-Chao],
Zhang, H.[Hao],
He, J.[Jun],
VTPL: Visual and text prompt learning for visual-language models,
JVCIR(104), 2024, pp. 104280.
Elsevier DOI
2411
V-L models, Prompt learning, Visual and text prompts,
Poly-1 information NCE loss, Center loss
BibRef
Liu, L.C.[Liang-Chen],
Wang, N.N.[Nan-Nan],
Liu, D.[Decheng],
Yang, X.[Xi],
Gao, X.B.[Xin-Bo],
Liu, T.L.[Tong-Liang],
Towards Specific Domain Prompt Learning via Improved Text Label
Optimization,
MultMed(26), 2024, pp. 10805-10815.
IEEE DOI
2411
Visualization, Optimization, Semantics, Task analysis, Terminology,
Learning systems, Adaptation models, vision-language model
BibRef
Liu, X.[Xin],
Wu, J.[Jiamin],
Yang, W.F.[Wen-Fei],
Zhou, X.[Xu],
Zhang, T.Z.[Tian-Zhu],
Multi-Modal Attribute Prompting for Vision-Language Models,
CirSysVideo(34), No. 11, November 2024, pp. 11579-11591.
IEEE DOI
2412
Visualization, Task analysis, Semantics, Adaptation models,
Integrated circuit modeling, Vectors,
attribute
BibRef
Jiang, H.J.[Hao-Jun],
Zhang, J.K.[Jian-Ke],
Huang, R.[Rui],
Ge, C.J.[Chun-Jiang],
Ni, Z.[Zanlin],
Song, S.[Shiji],
Huang, G.[Gao],
Cross-modal adapter for vision-language retrieval,
PR(159), 2025, pp. 111144.
Elsevier DOI
2412
Adapter, Cross-modal interaction, Cross-modal retrieval,
Parameter-efficient training, Multi-modal learning
BibRef
Tan, Y.T.[Ying-Tao],
Chen, Y.Y.[Ying-Ying],
Wang, J.Q.[Jin-Qiao],
DSTA: Reinforcing Vision-Language Understanding for Scene-Text VQA
With Dual-Stream Training Approach,
SPLetters(32), 2025, pp. 6-10.
IEEE DOI
2501
Optical character recognition, Training, Visualization,
Feature extraction, Transformers, Text recognition,
sence-text understanding
BibRef
Yellinek, N.[Nir],
Karlinsky, L.[Leonid],
Giryes, R.[Raja],
3VL: Using Trees to Improve Vision-Language Models' Interpretability,
IP(34), 2025, pp. 495-509.
IEEE DOI
2501
aligning image and text representations.
Random forests, Visualization, Training, Cognition, Feature extraction,
Transformers, Forestry, Animals, compositional reasoning
BibRef
Yang, L.F.[Ling-Feng],
Li, X.[Xiang],
Wang, Y.[Yueze],
Wang, X.L.[Xin-Long],
Yang, J.[Jian],
Fine-Grained Visual Text Prompting,
PAMI(47), No. 3, March 2025, pp. 1594-1609.
IEEE DOI
2502
What kind of visual prompts to add.
Visualization, Semantics, Image segmentation, Crops, Tuning, Detectors,
Proposals, Location awareness, Grounding, Gray-scale, zero-shot
BibRef
Wang, F.[Fan],
Han, Z.Y.[Zhong-Yi],
Liu, X.[Xingbo],
Yin, Y.L.[Yi-Long],
Gao, X.[Xin],
CTPT: Continual Test-time Prompt Tuning for vision-language models,
PR(161), 2025, pp. 111300.
Elsevier DOI
2502
Test-time adaptation,
Contrastive Language-Image Pretraining (CLIP),
Stable self-learning
BibRef
Liang, N.[Nanhao],
Liu, Y.[Yong],
DPO: Discrete Prompt Optimization for Vision-Language Models,
SPLetters(32), 2025, pp. 671-675.
IEEE DOI
2502
Training, Optimization, Adaptation models, Visualization,
Overfitting, Vectors, Vocabulary, Signal processing algorithms,
vision-language model
BibRef
Ondeng, O.[Oscar],
Ouma, H.[Heywood],
Akuon, P.[Peter],
Enriching visual feature representations for vision-language tasks
using spectral transforms,
IVC(154), 2025, pp. 105390.
Elsevier DOI
2502
Visual feature enrichment, Transformers, Image captioning,
Discrete Fourier Transform, MS COCO, Kylberg dataset, Diversity
BibRef
Xu, C.[Chen],
Zhu, Y.H.[Yu-Han],
Shen, H.C.[Hao-Cheng],
Chen, B.H.[Bo-Heng],
Liao, Y.X.[Yi-Xuan],
Chen, X.X.[Xiao-Xin],
Wang, L.M.[Li-Min],
Progressive Visual Prompt Learning with Contrastive Feature
Re-formation,
IJCV(133), No. 2, February 2025, pp. 511-526.
Springer DOI
2502
Adapting the pre-trained Vision-Language Models.
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Yuan, J.K.[Jun-Kun],
Tan, Z.C.[Zi-Chang],
Liu, J.J.[Jiang-Jiang],
Feng, J.Y.[Jing-Yuan],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Mutual Prompt Leaning for Vision Language Models,
IJCV(133), No. 3, March 2025, pp. 1258-1276.
Springer DOI
2502
BibRef
Alsabbagh, A.R.[Abdel Rahman],
Mansour, T.[Tariq],
Al-Kharabsheh, M.[Mohammad],
Ebdah, A.S.[Abdel Salam],
Al-Emaryeen, R.[Roa'a],
Al-Nahhas, S.[Sara],
Mahafza, W.[Waleed],
Al-Kadi, O.[Omar],
MiniMedGPT: Efficient Large Vision-Language Model for medical Visual
Question Answering,
PRL(189), 2025, pp. 8-16.
Elsevier DOI Code:
WWW Link.
2503
Medical VQA, Large Vision-Language Model, MedGPT,
Generative pre-trained transformers, Natural language processing
BibRef
Yin, J.H.[Jun-Hui],
Zhang, X.Y.[Xin-Yu],
Wu, L.[Lin],
Wang, X.J.[Xiao-Jie],
Context-aware prompt learning for test-time vision recognition with
frozen vision-language model,
PR(162), 2025, pp. 111359.
Elsevier DOI Code:
WWW Link.
2503
In-context learning, Prompt learning, Vision-language model,
Vision recognition, Test-time adaptation
BibRef
Wang, X.[Xiao],
Wu, J.L.[Jian-Long],
Lin, Z.[Zijia],
Zhang, F.Z.[Fu-Zheng],
Zhang, D.[Di],
Nie, L.Q.[Li-Qiang],
Video DataFlywheel: Resolving the Impossible Data Trinity in
Video-Language Understanding,
PAMI(47), No. 4, April 2025, pp. 2912-2923.
IEEE DOI
2503
Noise, Annotations, Iterative methods, Scalability, Data models,
Question answering (information retrieval), Foundation models,
text-video retrieval
BibRef
Chen, Y.[Yeming],
Zhang, S.[Siyu],
Sun, Y.[Yaoru],
Yang, J.[Jun],
Liang, W.J.[Wei-Jian],
Wang, H.R.[Hao-Ran],
Artificial-Spiking Hierarchical Networks for Vision-Language
Representation Learning,
CirSysVideo(35), No. 3, March 2025, pp. 2768-2781.
IEEE DOI Code:
WWW Link.
2503
Visualization, Semantics, Computational modeling, Transformers,
Feature extraction, Object detection,
multimodal alignment
BibRef
Li, B.Z.[Bin-Zhe],
Wang, S.[Shurun],
Wang, S.Q.[Shi-Qi],
Ye, Y.[Yan],
High Efficiency Image Compression for Large Visual-Language Models,
CirSysVideo(35), No. 3, March 2025, pp. 2870-2880.
IEEE DOI
2503
Image coding, Visualization, Machine vision, Codecs, Semantics,
Standards, Image reconstruction, Bit rate, pre-editing process
BibRef
Liu, L.C.[Liang-Chen],
Wang, N.N.[Nan-Nan],
Zhou, D.W.[Da-Wei],
Liu, D.C.[De-Cheng],
Yang, X.[Xi],
Gao, X.B.[Xin-Bo],
Liu, T.L.[Tong-Liang],
Generalizable Prompt Learning via Gradient Constrained
Sharpness-Aware Minimization,
MultMed(27), 2025, pp. 1100-1113.
IEEE DOI
2503
Improving the performance on unseen classes while maintaining the
performance on seen classes.
Optimization, Minimization, Visualization, Training, Degradation,
Vectors, Telecommunications, Intserv networks, Geometry,
sharpness-aware minimization
BibRef
Lu, Z.[Zhihe],
Bai, J.[Jiawang],
Li, X.[Xin],
Xiao, Z.[Zeyu],
Wang, X.C.[Xin-Chao],
Task-to-Instance Prompt Learning for Vision-Language Models at Test
Time,
IP(34), 2025, pp. 1908-1920.
IEEE DOI Code:
WWW Link.
2504
Training, Training data, Visualization, Adaptation models, Learning systems,
Image recognition, Dogs, Vectors, Entropy, task-to-instance
BibRef
Kuang, J.Y.[Jia-Yi],
Shen, Y.[Ying],
Xie, J.[Jingyou],
Luo, H.[Haohao],
Xu, Z.[Zhe],
Li, R.H.[Rong-Hao],
Li, Y.H.[Ying-Hui],
Cheng, X.F.[Xian-Feng],
Lin, X.[Xika],
Han, Y.[Yu],
Natural Language Understanding and Inference with MLLM in Visual
Question Answering: A Survey,
Surveys(57), No. 8, March 2025, pp. xx-yy.
DOI Link
2504
Survey, Large Language Models. Visual question answering,
multimodal representation and reasoning, multimodal large language models
BibRef
Fang, Z.Q.[Zheng-Qing],
Yuan, Z.H.[Zhou-Hang],
Li, Z.Y.[Zi-Yu],
Chen, J.Y.[Jing-Yuan],
Kuang, K.[Kun],
Yao, Y.F.[Yu-Feng],
Wu, F.[Fei],
Cross-Modality Image Interpretation via Concept Decomposition Vector
of Visual-Language Models,
CirSysVideo(35), No. 4, April 2025, pp. 3024-3038.
IEEE DOI
2504
Visualization, Vectors, Semantics, Training, Image representation,
Task analysis, visual-language models
BibRef
Ramzi, E.[Elias],
Audebert, N.[Nicolas],
Rambour, C.[Clément],
Araujo, A.[André],
Bitot, X.[Xavier],
Thome, N.[Nicolas],
Optimization of Rank Losses for Image Retrieval,
PAMI(47), No. 6, June 2025, pp. 4317-4329.
IEEE DOI
2505
Training, Image retrieval, Measurement, Standards, Data mining,
Artificial intelligence, Loss measurement, non-decomposable
BibRef
Lafon, M.[Marc],
Ramzi, E.[Elias],
Rambour, C.[Clément],
Audebert, N.[Nicolas],
Thome, N.[Nicolas],
Gallop: Learning Global and Local Prompts for Vision-language Models,
ECCV24(LXI: 264-282).
Springer DOI
2412
BibRef
Liu, K.C.[Kang-Cheng],
Wang, C.Q.[Chao-Qun],
Han, X.D.[Xiao-Dong],
Liu, Y.J.[Yong-Jin],
Chen, B.Q.[Bao-Quan],
Generalized Robot Vision-Language Model via Linguistic Foreground-Aware
Contrast,
IJCV(133), No. 6, June 2025, pp. Psges 3481-3518.
Springer DOI
2505
BibRef
Yang, L.X.[Ling-Xiao],
Zhang, R.Y.[Ru-Yuan],
Chen, Q.[Qi],
Xie, X.H.[Xiao-Hua],
Learning with Enriched Inductive Biases for Vision-Language Models,
IJCV(133), No. 6, June 2025, pp. Psges 3746-3761.
Springer DOI
2505
BibRef
Safaei, B.[Bardia],
Patel, V.M.[Vishal M.],
Active Learning for Vision-Language Models,
WACV25(4902-4912)
IEEE DOI
2505
Training, Bridges, Uncertainty, Computational modeling, Active learning,
Measurement uncertainty, Entropy, Reliability, Image classification
BibRef
Wang, Y.C.[Yi-Cheng],
Zhang, Z.K.[Zhi-Kang],
Wang, J.[Jue],
Fan, D.[David],
Xu, Z.[Zhenlin],
Liu, L.[Linda],
Hao, X.[Xiang],
Bhat, V.[Vimal],
Li, X.Y.[Xin-Yu],
GEXIA: Granularity Expansion and Iterative Approximation for Scalable
Multi-Grained Video-Language Learning,
WACV25(4725-4735)
IEEE DOI
2505
Computational modeling, Semantics, Benchmark testing, Data models,
Iterative methods, Videos
BibRef
Colman, R.[Roman],
Vu, M.[Minh],
Bhattarai, M.[Manish],
Ma, M.[Martin],
Viswanathan, H.[Hari],
O'Malley, D.[Daniel],
Santos, J.E.[Javier E.],
PatchFinder: Leveraging Visual Language Models for Accurate
Information Retrieval Using Model Uncertainty,
WACV25(9146-9155)
IEEE DOI
2505
Visualization, Uncertainty, Accuracy, Computational modeling,
Software algorithms, Predictive models, Information retrieval,
log likelihood
BibRef
Basak, D.[Debolena],
Bhatt, S.[Soham],
Kanduri, S.[Sahith],
Desarkar, M.S.[Maunendra Sankar],
Aerial Mirage: Unmasking Hallucinations in Large Vision Language
Models,
WACV25(5500-5508)
IEEE DOI
2505
Training, Reviews, Annotations, Surveillance, Computational modeling,
Decision making, Data models, Reliability, Drones
BibRef
Jawade, B.[Bhavin],
Soares, J.V.B.[João V. B.],
Thadani, K.[Kapil],
Mohan, D.D.[Deen Dayal],
Eshratifar, A.E.[Amir Erfan],
Culpepper, B.[Benjamin],
de Juan, P.[Paloma],
Setlur, S.[Srirangaraj],
Govindaraju, V.[Venu],
SCOT: Self-Supervised Contrastive Pretraining for Zero-Shot
Compositional Retrieval,
WACV25(5509-5519)
IEEE DOI Code:
WWW Link.
2505
Training, Codes, Large language models, Image retrieval,
Benchmark testing, Web search, Standards, zero-shot
BibRef
Huang, P.H.[Po-Hsuan],
Li, J.L.[Jeng-Lin],
Chen, C.P.[Chin-Po],
Chang, M.C.[Ming-Ching],
Chen, W.C.[Wei-Chao],
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large
Vision-Language Model via Causality Analysis,
WACV25(6125-6135)
IEEE DOI
2505
Training, Visualization, Prevention and mitigation,
Computational modeling, Semantics, Natural languages,
causal analysis
BibRef
Talemi, N.A.[Niloufar Alipour],
Kashiani, H.[Hossein],
Afghah, F.[Fatemeh],
Style-Pro: Style-Guided Prompt Learning for Generalizable
Vision-Language Models,
WACV25(6207-6216)
IEEE DOI
2505
Adaptation models, Image recognition, Computational modeling,
Benchmark testing, Data models, Robustness, Overfitting,
style shift learning
BibRef
Chang, H.S.[Hung-Shuo],
Wang, C.Y.[Chien-Yao],
Wang, R.R.[Richard Robert],
Chou, G.[Gene],
Liao, H.Y.M.[Hong-Yuan Mark],
Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual
Language Models,
WACV25(6217-6227)
IEEE DOI Code:
WWW Link.
2505
YOLO, Training, Visualization, Accuracy, Source coding, Semantics,
Predictive models, Real-time systems, Decoding, multi-task
BibRef
Westfechtel, T.[Thomas],
Zhang, D.[Dexuan],
Harada, T.[Tatsuya],
Combining Inherent Knowledge of Vision-Language Models with
Unsupervised Domain Adaptation Through Strong-Weak Guidance,
WACV25(6528-6537)
IEEE DOI
2505
Adaptation models, Accuracy, Predictive models, Benchmark testing,
Prediction algorithms, Labeling
BibRef
Chen, H.N.[Han-Ning],
Ni, Y.[Yang],
Huang, W.J.[Wen-Jun],
Liu, Y.[Yezi],
Jeong, S.[Sung_Heon],
Wen, F.[Fei],
Bastian, N.D.[Nathaniel D.],
Latapie, H.[Hugo],
Imani, M.[Mohsen],
VLTP: Vision-Language Guided Token Pruning for Task-Oriented
Segmentation,
WACV25(9353-9363)
IEEE DOI
2505
Uniform resource locators, Image segmentation, Image recognition,
Computational modeling, Large language models, Transformers, Load modeling
BibRef
Ali, E.[Eman],
Silva, S.[Sathira],
Khan, M.H.[Muhammad Haris],
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of
Vision-Language Models,
WACV25(6083-6093)
IEEE DOI
2505
Training, Adaptation models, Visualization, Accuracy, Prototypes,
Data models, Noise measurement, Image classification
BibRef
Zhang, C.[Ce],
Stepputtis, S.[Simon],
Sycara, K.[Katia],
Xie, Y.Q.[Ya-Qi],
Enhancing Vision-Language Few-Shot Adaptation with Negative Learning,
WACV25(5905-5915)
IEEE DOI Code:
WWW Link.
2505
Adaptation models, Codes, Accuracy, Computational modeling, Noise,
Transforms, Computational efficiency, Noise measurement, Few shot learning
BibRef
Yamada, M.[Moyuru],
Dharamshi, N.[Nimish],
Kohli, A.[Ayushi],
Kasu, P.[Prasad],
Khan, A.[Ainulla],
Ghulyani, M.[Manu],
Unleashing Potentials of Vision-Language Models for Zero-Shot HOI
Detection,
WACV25(5751-5760)
IEEE DOI
2505
Head, Computational modeling, Redundancy, Object detection,
Network architecture, Predictive models, Decoding,
vision-and-language
BibRef
Imam, R.[Raza],
Gani, H.[Hanan],
Huzaifa, M.[Muhammad],
Nandakumar, K.[Karthik],
Test-Time Low Rank Adaptation via Confidence Maximization for
Zero-Shot Generalization of Vision-Language Models,
WACV25(5449-5459)
IEEE DOI Code:
WWW Link.
2505
Adaptation models, Visualization, Codes, Large language models,
Transformers, Entropy, Tuning, Optimization
BibRef
Ghoddoosian, R.[Reza],
Agarwal, N.[Nakul],
Dwivedi, I.[Isht],
Darisuh, B.[Behzad],
ACE: Action Concept Enhancement of Video-Language Models in
Procedural Videos,
WACV25(9521-9531)
IEEE DOI
2505
Training, Visualization, Robustness, Assembly, Videos, Overfitting, zero-shot,
action recognition, vlm, vision language model, synonym, text augmentation
BibRef
Das, D.[Deepayan],
Talon, D.[Davide],
Mancini, M.[Massimiliano],
Wang, Y.M.[Yi-Ming],
Ricci, E.[Elisa],
One VLM to Keep it Learning: Generation and Balancing for Data-free
Continual Visual Question Answering,
WACV25(5635-5645)
IEEE DOI Code:
WWW Link.
2505
Visualization, Adaptation models, Prevention and mitigation,
Training data, Quality control, Benchmark testing, Data models,
catastrophic forgetting
BibRef
Ishmam, M.F.[Md Farhan],
Tashdeed, I.[Ishmam],
Saadat, T.A.[Talukder Asir],
Ashmafee, M.H.[Md Hamjajul],
Kamal, A.R.M.[Abu Raihan Mostofa],
Hossain, M.A.[Md. Azam],
Visual Robustness Benchmark for Visual Question Answering (VQA),
WACV25(6623-6633)
IEEE DOI
2505
Measurement, Visualization, Computational modeling,
Large language models, Benchmark testing, Linguistics, Robustness, multimodal
BibRef
Jiang, X.[Xin],
Zheng, J.W.[Jun-Wei],
Liu, R.P.[Rui-Ping],
Li, J.H.[Jia-Hang],
Zhang, J.[JiaMing],
Matthiesen, S.[Sven],
Stiefelhagen, R.[Rainer],
@BENCH: Benchmarking Vision-Language Models for Human-centered
Assistive Technology,
WACV25(3934-3943)
IEEE DOI
2505
Image segmentation, Visualization, Depth measurement,
Optical character recognition, Visual impairment, VQA
BibRef
Onoe, Y.[Yasumasa],
Rane, S.[Sunayana],
Berger, Z.[Zachary],
Bitton, Y.[Yonatan],
Cho, J.[Jaemin],
Garg, R.[Roopal],
Ku, A.[Alexander],
Parekh, Z.[Zarana],
Pont-Tuset, J.[Jordi],
Tanzer, G.[Garrett],
Wang, S.[Su],
Baldridge, J.[Jason],
DOCCI: Descriptions of Connected and Contrasting Images,
ECCV24(LX: 291-309).
Springer DOI
2412
BibRef
Li, T.[Tang],
Ma, M.M.[Meng-Meng],
Peng, X.[Xi],
DEAL: Disentangle and Localize Concept-level Explanations for VLMs,
ECCV24(XXXIX: 383-401).
Springer DOI
2412
BibRef
Park, K.Y.[Kwan-Yong],
Saito, K.[Kuniaki],
Kim, D.H.[Dong-Hyun],
Weak-to-strong Compositional Learning from Generative Models for
Language-based Object Detection,
ECCV24(XXIII: 1-19).
Springer DOI
2412
BibRef
Li, S.C.[Shi-Cheng],
Li, L.[Lei],
Liu, Y.[Yi],
Ren, S.[Shuhuai],
Liu, Y.Y.X.[Yuan-Yan-Xin],
Gao, R.D.[Run-Dong],
Sun, X.[Xu],
Hou, L.[Lu],
Vitatecs: A Diagnostic Dataset for Temporal Concept Understanding of
Video-language Models,
ECCV24(LXX: 331-348).
Springer DOI
2412
BibRef
Yang, Y.T.[Yan-Ting],
Chen, M.H.[Ming-Hao],
Qiu, Q.[Qibo],
Wu, J.H.[Jia-Hao],
Wang, W.X.[Wen-Xiao],
Lin, B.B.[Bin-Bin],
Guan, Z.Y.[Zi-Yu],
He, X.F.[Xiao-Fei],
Adapt2reward: Adapting Video-language Models to Generalizable Robotic
Rewards via Failure Prompts,
ECCV24(LVII: 163-180).
Springer DOI
2412
BibRef
Rahmanzadehgervi, P.[Pooyan],
Bolton, L.[Logan],
Taesiri, M.R.[Mohammad Reza],
Nguyen, A.T.[Anh Totti],
Vision Language Models are blind,
ACCV24(V: 293-309).
Springer DOI
2412
BibRef
Lai, C.G.[Chen-Gen],
Song, S.L.[Sheng-Li],
Yan, S.[Sitong],
Hu, G.[Guangneng],
Improving Vision and Language Concepts Understanding with Multimodal
Counterfactual Samples,
ECCV24(LXIX: 174-191).
Springer DOI
2412
BibRef
Chytas, S.P.[Sotirios Panagiotis],
Kim, H.W.J.[Hyun-Woo J.],
Singh, V.[Vikas],
Understanding Multi-compositional Learning in Vision and Language
Models via Category Theory,
ECCV24(XLVIII: 324-341).
Springer DOI
2412
BibRef
Song, Y.Z.[Yun-Zhu],
Chen, Y.S.[Yi-Syuan],
Lin, T.L.[Tzu-Ling],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
Capture Concept Through Comparison: Vision-and-language Representation
Learning with Intrinsic Information Mining,
ACCV24(III: 220-238).
Springer DOI
2412
BibRef
Adhikari, R.[Rabin],
Thapaliya, S.[Safal],
Dhakal, M.[Manish],
Khanal, B.[Bishesh],
Tunevlseg: Prompt Tuning Benchmark for Vision-language Segmentation
Models,
ACCV24(III: 44-62).
Springer DOI
2412
BibRef
He, H.C.[Hai-Chen],
Liu, W.B.[Wei-Bin],
Xing, W.W.[Wei-Wei],
Biefficient: Bidirectionally Prompting Vision-language Models for
Parameter-efficient Video Recognition,
ACCV24(III: 257-274).
Springer DOI
2412
BibRef
Yang, J.K.[Jing-Kang],
Dong, Y.H.[Yu-Hao],
Liu, S.[Shuai],
Li, B.[Bo],
Wang, Z.Y.[Zi-Yue],
Tan, H.R.[Hao-Ran],
Jiang, C.C.[Chen-Cheng],
Kang, J.[Jiamu],
Zhang, Y.[Yuanhan],
Zhou, K.Y.[Kai-Yang],
Liu, Z.W.[Zi-Wei],
Octopus: Embodied Vision-language Programmer from Environmental
Feedback,
ECCV24(I: 20-38).
Springer DOI
2412
BibRef
Kar, O.F.[Oguzhan Fatih],
Tonioni, A.[Alessio],
Poklukar, P.[Petra],
Kulshrestha, A.[Achin],
Zamir, A.[Amir],
Tombari, F.[Federico],
Brave: Broadening the Visual Encoding of Vision-language Models,
ECCV24(XVI: 113-132).
Springer DOI
2412
BibRef
Kamath, A.[Amita],
Hsieh, C.Y.[Cheng-Yu],
Chang, K.W.[Kai-Wei],
Krishna, R.[Ranjay],
The Hard Positive Truth About Vision-language Compositionality,
ECCV24(XIV: 37-54).
Springer DOI
2412
BibRef
Ye-Bin, M.[Moon],
Hyeon-Woo, N.[Nam],
Choi, W.[Wonseok],
Oh, T.H.[Tae-Hyun],
Beaf: Observing Before-after Changes to Evaluate Hallucination in
Vision-language Models,
ECCV24(XI: 232-248).
Springer DOI
2412
BibRef
Jia, B.X.[Bao-Xiong],
Chen, Y.X.[Yi-Xin],
Yu, H.[Huangyue],
Wang, Y.[Yan],
Niu, X.S.[Xue-Song],
Liu, T.[Tengyu],
Li, Q.[Qing],
Huang, S.Y.[Si-Yuan],
Sceneverse: Scaling 3d Vision-language Learning for Grounded Scene
Understanding,
ECCV24(IX: 289-310).
Springer DOI
2412
BibRef
Zhang, Y.F.[Yi-Feng],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Learning Chain of Counterfactual Thought for Bias-robust
Vision-language Reasoning,
ECCV24(VIII: 334-351).
Springer DOI
2412
BibRef
Li, J.[Junyan],
Chen, D.[Delin],
Cai, T.[Tianle],
Chen, P.H.[Pei-Hao],
Hong, Y.[Yining],
Chen, Z.F.[Zhen-Fang],
Shen, Y.[Yikang],
Gan, C.[Chuang],
Flexattention for Efficient High-resolution Vision-language Models,
ECCV24(XXV: 286-302).
Springer DOI
2412
BibRef
Li, X.[Xiang],
Ding, J.[Jian],
Chen, Z.Y.[Zhao-Yang],
Elhoseiny, M.[Mohamed],
UNI3DL: A Unified Model for 3d Vision-language Understanding,
ECCV24(XXIII: 74-92).
Springer DOI
2412
BibRef
Hao, T.X.[Tian-Xiang],
Ding, X.H.[Xiao-Han],
Feng, J.X.[Jue-Xiao],
Yang, Y.H.[Yu-Hong],
Chen, H.[Hui],
Ding, G.[Guiguang],
Quantized Prompt for Efficient Generalization of Vision-language Models,
ECCV24(XIX: 54-73).
Springer DOI
2412
BibRef
Xu, H.B.[Huang-Biao],
Ke, X.[Xiao],
Li, Y.Z.[Yue-Zhou],
Xu, R.[Rui],
Wu, H.Q.[Huan-Qi],
Lin, X.F.[Xiao-Feng],
Guo, W.Z.[Wen-Zhong],
Vision-language Action Knowledge Learning for Semantic-aware Action
Quality Assessment,
ECCV24(XLII: 423-440).
Springer DOI
2412
BibRef
Zhu, Z.Y.[Zi-Yu],
Zhang, Z.[Zhuofan],
Ma, X.J.[Xiao-Jian],
Niu, X.S.[Xue-Song],
Chen, Y.X.[Yi-Xin],
Jia, B.X.[Bao-Xiong],
Deng, Z.D.[Zhi-Dong],
Huang, S.Y.[Si-Yuan],
Li, Q.[Qing],
Unifying 3d Vision-language Understanding via Promptable Queries,
ECCV24(XLIV: 188-206).
Springer DOI
2412
BibRef
Zhang, J.M.[Jia-Ming],
Ma, X.J.[Xing-Jun],
Wang, X.[Xin],
Qiu, L.Y.[Ling-Yu],
Wang, J.Q.[Jia-Qi],
Jiang, Y.G.[Yu-Gang],
Sang, J.[Jitao],
Adversarial Prompt Tuning for Vision-language Models,
ECCV24(XLV: 56-72).
Springer DOI
2412
BibRef
Wu, G.[Ge],
Zhang, X.[Xin],
Li, Z.[Zheng],
Chen, Z.W.[Zhao-Wei],
Liang, J.J.[Jia-Jun],
Yang, J.[Jian],
Li, X.[Xiang],
Cascade Prompt Learning for Vision-language Model Adaptation,
ECCV24(L: 304-321).
Springer DOI
2412
BibRef
Gao, S.[Sensen],
Jia, X.J.[Xiao-Jun],
Ren, X.H.[Xu-Hong],
Tsang, I.[Ivor],
Guo, Q.[Qing],
Boosting Transferability in Vision-language Attacks via Diversification
Along the Intersection Region of Adversarial Trajectory,
ECCV24(LVII: 442-460).
Springer DOI
2412
BibRef
Jiang, H.B.[Hao-Bin],
Yue, J.P.[Jun-Peng],
Luo, H.[Hao],
Ding, Z.[Ziluo],
Lu, Z.Q.[Zong-Qing],
Reinforcement Learning Friendly Vision-language Model for Minecraft,
ECCV24(LXVIII: 1-17).
Springer DOI
2412
BibRef
Nguyen, A.T.[A. Tuan],
Tai, K.S.[Kai Sheng],
Chen, B.C.[Bor-Chun],
Shukla, S.N.[Satya Narayan],
Yu, H.C.[Han-Chao],
Torr, P.H.S.[Philip H.S.],
Tian, T.P.[Tai-Peng],
Lim, S.N.[Ser-Nam],
ucap: An Unsupervised Prompting Method for Vision-language Models,
ECCV24(LXXIV: 425-439).
Springer DOI
2412
BibRef
Zhang, Y.[Yi],
Yu, K.[Ke],
Wu, S.Q.[Si-Qi],
He, Z.H.[Zhi-Hai],
Conceptual Codebook Learning for Vision-language Models,
ECCV24(LXXVII: 235-251).
Springer DOI
2412
BibRef
Kim, M.[Minchan],
Kim, M.[Minyeong],
Bae, J.[Junik],
Choi, S.[Suhwan],
Kim, S.[Sungkyung],
Chang, B.[Buru],
Exploiting Semantic Reconstruction to Mitigate Hallucinations in
Vision-language Models,
ECCV24(LXXXVI: 236-252).
Springer DOI
2412
BibRef
Chatterjee, A.[Agneet],
Luo, Y.[Yiran],
Gokhale, T.[Tejas],
Yang, Y.Z.[Ye-Zhou],
Baral, C.[Chitta],
Revision: Rendering Tools Enable Spatial Fidelity in Vision-language
Models,
ECCV24(XXX: 339-357).
Springer DOI
2412
BibRef
Ataallah, K.[Kirolos],
Shen, X.Q.[Xiao-Qian],
Abdelrahman, E.[Eslam],
Sleiman, E.[Essam],
Zhuge, M.C.[Ming-Chen],
Ding, J.[Jian],
Zhu, D.[Deyao],
Schmidhuber, J.[Jürgen],
Elhoseiny, M.[Mohamed],
Goldfish: Vision-language Understanding of Arbitrarily Long Videos,
ECCV24(XXIX: 251-267).
Springer DOI
2412
BibRef
Shen, R.[Ruoyue],
Inoue, N.[Nakamasa],
Shinoda, K.[Koichi],
Pyramid Coder: Hierarchical Code Generator for Compositional Visual
Question Answering,
ICIP24(430-436)
IEEE DOI
2411
Training, Visualization, Codes, Accuracy, Large language models,
Natural languages, Visual question answering, Prompting methods
BibRef
Sharma, P.[Pratyusha],
Shaham, T.R.[Tamar Rott],
Baradad, M.[Manel],
Rodriíuez-Muñoz, A.[Adrián],
Duggal, S.[Shivam],
Isola, P.[Phillip],
Torralba, A.[Antonio],
Fu, S.[Stephanie],
A Vision Check-up for Language Models,
CVPR24(14410-14419)
IEEE DOI
2410
Representation learning, Visualization, Analytical models, Codes,
Image synthesis, Computational modeling
BibRef
Chen, X.[Xi],
Djolonga, J.[Josip],
Padlewski, P.[Piotr],
Mustafa, B.[Basil],
Changpinyo, S.[Soravit],
Wu, J.L.[Jia-Lin],
Ruiz, C.R.[Carlos Riquelme],
Goodman, S.[Sebastian],
Wang, X.[Xiao],
Tay, Y.[Yi],
Shakeri, S.[Siamak],
Dehghani, M.[Mostafa],
Salz, D.[Daniel],
Lucic, M.[Mario],
Tschannen, M.[Michael],
Nagrani, A.[Arsha],
Hu, H.[Hexiang],
Joshi, M.[Mandar],
Pang, B.[Bo],
Montgomery, C.[Ceslee],
Pietrzyk, P.[Paulina],
Ritter, M.[Marvin],
Piergiovanni, A.[AJ],
Minderer, M.[Matthias],
Pavetic, F.[Filip],
Waters, A.[Austin],
Li, G.[Gang],
Alabdulmohsin, I.[Ibrahim],
Beyer, L.[Lucas],
Amelot, J.[Julien],
Lee, K.[Kenton],
Steiner, A.P.[Andreas Peter],
Li, Y.[Yang],
Keysers, D.[Daniel],
Arnab, A.[Anurag],
Xu, Y.Z.[Yuan-Zhong],
Rong, K.[Keran],
Kolesnikov, A.[Alexander],
Seyedhosseini, M.[Mojtaba],
Angelova, A.[Anelia],
Zhai, X.H.[Xiao-Hua],
Houlsby, N.[Neil],
Soricut, R.[Radu],
On Scaling Up a Multilingual Vision and Language Model,
CVPR24(14432-14444)
IEEE DOI
2410
Training, Visualization, Computational modeling, Object detection,
Benchmark testing, Question answering (information retrieval),
pretraining
BibRef
Parodi, F.[Felipe],
Matelsky, J.K.[Jordan K.],
Regla-Vargas, A.[Alejandra],
Foglia, E.E.[Elizabeth E.],
Lim, C.[Charis],
Weinberg, D.[Danielle],
Kording, K.P.[Konrad P.],
Herrick, H.M.[Heidi M.],
Platt, M.L.[Michael L.],
Vision-language models for decoding provider attention during
neonatal resuscitation,
CVPM24(343-353)
IEEE DOI
2410
Training, Pediatrics, Accuracy, Semantics, Decision making, Transformers
BibRef
Zhang, Y.[Yabin],
Zhu, W.J.[Wen-Jie],
Tang, H.[Hui],
Ma, Z.Y.[Zhi-Yuan],
Zhou, K.Y.[Kai-Yang],
Zhang, L.[Lei],
Dual Memory Networks: A Versatile Adaptation Approach for
Vision-Language Models,
CVPR24(28718-28728)
IEEE DOI Code:
WWW Link.
2410
Training, Knowledge engineering, Adaptation models, Codes,
Training data, Data models, Vision-language models,
versatile adaptation
BibRef
Guo, Y.[Yuncheng],
Gu, X.D.[Xiao-Dong],
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language
Models,
CVPR24(28695-28705)
IEEE DOI
2410
Adaptation models, Adaptive systems, Noise, Manuals, Robustness,
Noise measurement,
prompt learning
BibRef
Han, J.[Jinwei],
Lin, Z.W.[Zhi-Wen],
Sun, Z.Y.[Zhong-Yisun],
Gao, Y.G.[Ying-Guo],
Yan, K.[Ke],
Ding, S.H.[Shou-Hong],
Gao, Y.[Yuan],
Xia, G.S.[Gui-Song],
Anchor-based Robust Finetuning of Vision-Language Models,
CVPR24(26909-26918)
IEEE DOI
2410
Image recognition, Zero-shot learning, Semantics,
Benchmark testing, Anchor, Robust Finetuning
BibRef
Cao, Q.L.[Qing-Long],
Zheng-Qin, X.,
Chen, Y.[Yuntian],
Chao, M.,
Yang, X.K.[Xiao-Kang],
Domain Prompt Learning with Quaternion Networks,
CVPR24(26627-26636)
IEEE DOI Code:
WWW Link.
2410
Knowledge engineering, Adaptation models, Codes, Quaternions,
Face recognition, Contrastive learning, vision-language models,
quaternion networks
BibRef
Li, L.[Lin],
Guan, H.Y.[Hao-Yan],
Qiu, J.N.[Jia-Ning],
Spratling, M.[Michael],
One Prompt Word is Enough to Boost Adversarial Robustness for
Pre-Trained Vision-Language Models,
CVPR24(24408-24419)
IEEE DOI Code:
WWW Link.
2410
Accuracy, Codes, Training data, Robustness,
Computational efficiency, vision-language models, VLMs
BibRef
Zanella, M.[Maxime],
Ayed, I.B.[Ismail Ben],
On the Test-Time Zero-Shot Generalization of Vision-Language Models:
Do we Really need Prompt Learning?,
CVPR24(23783-23793)
IEEE DOI
2410
Training, Systematics, Computational modeling, Quality assessment,
Computational efficiency, vision-language,
training-free
BibRef
Yao, H.T.[Han-Tao],
Zhang, R.[Rui],
Xu, C.S.[Chang-Sheng],
TCP: Textual-Based Class-Aware Prompt Tuning for Visual-Language
Model,
CVPR24(23438-23448)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Benchmark testing,
Tuning
BibRef
Yang, S.[Senqiao],
Tian, Z.[Zhuotao],
Jiang, L.[Li],
Jia, J.Y.[Jia-Ya],
Unified Language-Driven Zero-Shot Domain Adaptation,
CVPR24(23407-23415)
IEEE DOI
2410
Representation learning, Adaptation models, Visualization,
Correlation, Scalability, Computational modeling,
Vision-Language Model
BibRef
Cui, J.Q.[Jie-Quan],
Zhu, B.[Beier],
Wen, X.[Xin],
Qi, X.J.[Xiao-Juan],
Yu, B.[Bei],
Zhang, H.W.[Han-Wang],
Classes Are Not Equal: An Empirical Study on Image Recognition
Fairness,
CVPR24(23283-23292)
IEEE DOI
2410
Training, Representation learning, Image recognition, Accuracy,
Predictive models, Network architecture, Prediction algorithms,
Vision-Language Models
BibRef
Stojnic, V.[Vladan],
Kalantidis, Y.[Yannis],
Tolias, G.[Giorgos],
Label Propagation for Zero-shot Classification with Vision-Language
Models,
CVPR24(23209-23218)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Closed box, Encoding, Data models,
vision-language models, label propagation, zero-shot classification
BibRef
Yuan, T.[Tongtong],
Zhang, X.[Xuange],
Liu, K.[Kun],
Liu, B.[Bo],
Chen, C.[Chen],
Jin, J.[Jian],
Jiao, Z.Z.[Zhen-Zhen],
Towards Surveillance Video-and-Language Understanding: New Dataset,
Baselines, and Challenges,
CVPR24(22052-22061)
IEEE DOI Code:
WWW Link.
2410
Annotations, Surveillance, Semantics, Benchmark testing,
Public security, Timing, Security, Dataset Annotation
BibRef
Chen, Y.F.[Yi-Fei],
Chen, D.P.[Da-Peng],
Liu, R.J.[Rui-Jin],
Zhou, S.[Sai],
Xue, W.Y.[Wen-Yuan],
Peng, W.[Wei],
Align Before Adapt: Leveraging Entity-to-Region Alignments for
Generalizable Video Action Recognition,
CVPR24(18688-18698)
IEEE DOI
2410
Representation learning, Adaptation models, Visualization, Semantics,
Transformers, Vectors, Video action recognition, visual-language model
BibRef
Mittal, H.[Himangi],
Agarwal, N.[Nakul],
Lo, S.Y.[Shao-Yuan],
Lee, K.[Kwonjoon],
Can't make an Omelette without Breaking some Eggs: Plausible Action
Anticipation using Large Video-Language Models,
CVPR24(18580-18590)
IEEE DOI
2410
Accuracy, Computational modeling, Linear programming,
Action Anticipation, Video, Large Multimodal Models
BibRef
Kahatapitiya, K.[Kumara],
Arnab, A.[Anurag],
Nagran, A.[Arsha],
Ryoo, M.S.[Michael S.],
VicTR: Video-conditioned Text Representations for Activity
Recognition,
CVPR24(18547-18558)
IEEE DOI
2410
Training, Visualization, Adaptation models, Semantics, Focusing,
Benchmark testing, Vision-language models, Activity Recognition,
Video-conditioned Text
BibRef
Wu, T.Y.[Tz-Ying],
Ho, C.H.[Chih-Hui],
Vasconcelos, N.M.[Nuno M.],
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification,
CVPR24(16531-16540)
IEEE DOI Code:
WWW Link.
2410
Measurement, Training, Frequency modulation, Accuracy, Taxonomy,
Semantics, Hierarchical Classification, Visual-language foundation model
BibRef
Zhao, G.[Ganlong],
Li, G.B.[Guan-Bin],
Chen, W.[Weikai],
Yu, Y.Z.[Yi-Zhou],
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with
Open-Vocabulary Detection and StructurEd Representation,
CVPR24(16296-16306)
IEEE DOI
2410
Art, Accuracy, Navigation, Annotations, Detectors,
Vision-and-Language Navigation, Open-vocabulary, Multi-Modal Learning
BibRef
Li, X.[Xin],
Wu, Y.F.[Yun-Fei],
Jiang, X.H.[Xing-Hua],
Guo, Z.H.[Zhi-Hao],
Gong, M.M.[Ming-Ming],
Cao, H.Y.[Hao-Yu],
Liu, Y.S.[Yin-Song],
Jiang, D.Q.[De-Qiang],
Sun, X.[Xing],
Enhancing Visual Document Understanding with Contrastive Learning in
Large Visual-Language Models,
CVPR24(15546-15555)
IEEE DOI
2410
Visualization, Computational modeling, Contrastive learning,
Benchmark testing, Feature extraction, Filling, Contrastive Learning
BibRef
Pham, K.[Khoi],
Huynh, C.[Chuong],
Lim, S.N.[Ser-Nam],
Shrivastava, A.[Abhinav],
Composing Object Relations and Attributes for Image-Text Matching,
CVPR24(14354-14363)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Image edge detection,
Semantics, Benchmark testing, vision-language, image retrieval,
image-text matching
BibRef
Kim, G.[Gahyeon],
Kim, S.[Sohee],
Lee, S.[Seokju],
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models,
Prompting24(1572-1582)
IEEE DOI
2410
Visualization, Zero-shot learning, Semantics, Focusing,
Feature extraction, Data augmentation, Vectors, prompt learning, VLMs
BibRef
Xu, Z.[Zhenlin],
Zhu, Y.[Yi],
Deng, S.Q.[Si-Qi],
Mittal, A.[Abhay],
Chen, Y.B.[Yan-Bei],
Wang, M.[Manchen],
Favaro, P.[Paolo],
Tighe, J.[Joseph],
Modolo, D.[Davide],
Benchmarking Zero-Shot Recognition with Vision-Language Models:
Challenges on Granularity and Specificity,
WhatNext24(1827-1836)
IEEE DOI
2410
Computational modeling, Face recognition, Semantics, Training data,
Focusing, Vision and language models, Zero-shot recognition,
Benchmarking
BibRef
Luo, Z.W.[Zi-Wei],
Gustafsson, F.K.[Fredrik K.],
Zhao, Z.[Zheng],
Sjölund, J.[Jens],
Schön, T.B.[Thomas B.],
Photo-Realistic Image Restoration in the Wild with Controlled
Vision-Language Models,
NTIRE24(6641-6651)
IEEE DOI
2410
Degradation, Training, Image synthesis, Pipelines, Transform coding,
Diffusion models, Feature extraction, Image restoration, real-world
BibRef
Huang, C.Q.[Chao-Qin],
Jiang, A.[Aofan],
Feng, J.H.[Jing-Hao],
Zhang, Y.[Ya],
Wang, X.C.[Xin-Chao],
Wang, Y.F.[Yan-Feng],
Adapting Visual-Language Models for Generalizable Anomaly Detection
in Medical Images,
CVPR24(11375-11385)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Image segmentation, Visualization,
Source coding, Semantics, Anomaly Detection, Medical Images
BibRef
Bang, J.[Jihwan],
Ahn, S.[Sumyeong],
Lee, J.G.[Jae-Gil],
Active Prompt Learning in Vision Language Models,
CVPR24(26994-27004)
IEEE DOI Code:
WWW Link.
2410
Learning systems, Adaptation models, Codes, Sampling methods, Labeling
BibRef
Pan, C.[Chenbin],
Yaman, B.[Burhaneddin],
Nesti, T.[Tommaso],
Mallik, A.[Abhirup],
Allievi, A.G.[Alessandro G],
Velipasalar, S.[Senem],
Ren, L.[Liu],
VLP: Vision Language Planning for Autonomous Driving,
CVPR24(14760-14769)
IEEE DOI
2410
Training, Urban areas, Linguistics, Cognition, Robustness, Planning
BibRef
Liang, M.[Mingfu],
Su, J.C.[Jong-Chyi],
Schulter, S.[Samuel],
Garg, S.[Sparsh],
Zhao, S.Y.[Shi-Yu],
Wu, Y.[Ying],
Chandraker, M.[Manmohan],
AIDE: An Automatic Data Engine for Object Detection in Autonomous
Driving,
CVPR24(14695-14706)
IEEE DOI
2410
Training, Costs, Roads, Pipelines, Object detection, Benchmark testing,
Data models, Autonomous Driving, Vision Language Model,
Automatic Data Engine
BibRef
Li, Z.[Zheng],
Li, X.[Xiang],
Fu, X.[Xinyi],
Zhang, X.[Xin],
Wang, W.Q.[Wei-Qiang],
Chen, S.[Shuo],
Yang, J.[Jian],
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models,
CVPR24(26607-26616)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Prediction algorithms, Data models,
Vectors, Probability distribution, knowledge distillation,
zero-shot learning
BibRef
Khandelwal, A.[Anant],
PromptSync: Bridging Domain Gaps in Vision-Language Models through
Class-Aware Prototype Alignment and Discrimination,
ZeroShot24(7819-7828)
IEEE DOI
2410
Adaptation models, Computational modeling, Prototypes,
Contrastive learning, Benchmark testing, Robustness
BibRef
Hirohashi, Y.[Yuki],
Hirakawa, T.[Tsubasa],
Yamashita, T.[Takayoshi],
Fujiyoshi, H.[Hironobu],
Prompt Learning with One-Shot Setting based Feature Space Analysis in
Vision-and-Language Models,
ZeroShot24(7761-7770)
IEEE DOI
2410
Learning systems, Analytical models, Adaptation models,
Image resolution, Accuracy, Vision-and-Language Model, Prompt Learning
BibRef
Zhang, L.[Le],
Awal, R.[Rabiul],
Agrawal, A.[Aishwarya],
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to
Enhance Visio-Linguistic Compositional Understanding,
CVPR24(13774-13784)
IEEE DOI Code:
WWW Link.
2410
Annotations, Semantics, Refining, Text to image,
Contrastive learning, Benchmark testing, Cognition,
contrastive learning
BibRef
Rosasco, A.[Andrea],
Berti, S.[Stefano],
Pasquale, G.[Giulia],
Malafronte, D.[Damiano],
Sato, S.[Shogo],
Segawa, H.[Hiroyuki],
Inada, T.[Tetsugo],
Natale, L.[Lorenzo],
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized
Vision-Language Tasks,
CVPR24(22239-22248)
IEEE DOI Code:
WWW Link.
2410
Measurement, Codes, Image synthesis, Text to image,
Benchmark testing, benchmark, dataset,
compositionality
BibRef
Cheng, S.[Sijie],
Guo, Z.C.[Zhi-Cheng],
Wu, J.[Jinawen],
Fang, K.[Kechen],
Li, P.[Peng],
Liu, H.P.[Hua-Ping],
Liu, Y.[Yang],
EgoThink: Evaluating First-Person Perspective Thinking Capability of
Vision-Language Models,
CVPR24(14291-14302)
IEEE DOI
2410
Bridges, Visualization, Computational modeling, Focusing,
Benchmark testing, Planning, Egocentric, Vision-Language Models, Benchmark
BibRef
Guan, T.R.[Tian-Rui],
Liu, F.[Fuxiao],
Wu, X.[Xiyang],
Xian, R.Q.[Rui-Qi],
Li, Z.X.[Zong-Xia],
Liu, X.Y.[Xiao-Yu],
Wang, X.[Xijun],
Chen, L.[Lichang],
Huang, F.[Furong],
Yacoob, Y.[Yaser],
Manocha, D.[Dinesh],
Zhou, T.Y.[Tian-Yi],
Hallusionbench: An Advanced Diagnostic Suite for Entangled Language
Hallucination and Visual Illusion in Large Vision-Language Models,
CVPR24(14375-14385)
IEEE DOI Code:
WWW Link.
2410
Visualization, Analytical models, Accuracy, Statistical analysis,
Computational modeling, Benchmark testing, Vision language model,
VLM Evaluation
BibRef
Kil, J.[Jihyung],
Song, C.H.[Chan Hee],
Zheng, B.[Boyuan],
Deng, X.[Xiang],
Su, Y.[Yu],
Chao, W.L.[Wei-Lun],
Dual-View Visual Contextualization for Web Navigation,
CVPR24(14445-14454)
IEEE DOI
2410
Visualization, Navigation, Benchmark testing,
AI Agents, Web Agents, Web Navigation, Vision-Language,
Multimodal Agents
BibRef
Guo, Y.Y.[Yang-Yang],
Wang, G.Z.[Guang-Zhi],
Kankanhalli, M.[Mohan],
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation,
CVPR24(15699-15709)
IEEE DOI
2410
Codes, Computational modeling, Perturbation methods, Loading,
Transformers, Vision-Language,
Low-rank Approximation
BibRef
Cao, J.J.[Jian-Jian],
Ye, P.[Peng],
Li, S.Z.[Sheng-Ze],
Yu, C.[Chong],
Tang, Y.S.[Yan-Song],
Lu, J.W.[Ji-Wen],
Chen, T.[Tao],
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for
Accelerating Vision-Language Transformer,
CVPR24(15710-15719)
IEEE DOI Code:
WWW Link.
2410
Degradation, Adaptation models, Visualization, Costs,
Computational modeling, Semantics, Token Pruning, Model Compress
BibRef
Farina, M.[Matteo],
Mancini, M.[Massimiliano],
Cunegatti, E.[Elia],
Cunegatti, E.[Elia],
Iacca, G.[Giovanni],
Ricci, E.[Elisa],
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning,
CVPR24(16185-16195)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Transfer learning, Neurons,
Benchmark testing, multimodal learning,
sparse neural networks
BibRef
Mu, F.Z.[Fang-Zhou],
Mo, S.C.[Si-Cheng],
Li, Y.[Yin],
SnAG: Scalable and Accurate Video Grounding,
CVPR24(18930-18940)
IEEE DOI Code:
WWW Link.
2410
Training, Analytical models, Accuracy, Grounding, Scalability,
Computational modeling, Video understanding,
Vision-Language Learning
BibRef
Cao, Y.H.[Yun-Hao],
Ji, K.X.[Kai-Xiang],
Huang, Z.Y.[Zi-Yuan],
Zheng, C.Y.[Chuan-Yang],
Liu, J.J.[Jia-Jia],
Wang, J.[Jian],
Chen, J.D.[Jing-Dong],
Yang, M.[Ming],
Towards Better Vision-Inspired Vision-Language Models,
CVPR24(13537-13547)
IEEE DOI
2410
Training, Bridges, Visualization, Computational modeling,
Poles and towers, Benchmark testing, deep learning, deep prompt
BibRef
Shi, K.Y.[Kun-Yu],
Dong, Q.[Qi],
Goncalves, L.[Luis],
Tu, Z.W.[Zhuo-Wen],
Soatto, S.[Stefano],
Non-autoregressive Sequence-to-Sequence Vision-Language Models,
CVPR24(13603-13612)
IEEE DOI
2410
Visualization, Technological innovation, Computational modeling,
Predictive models, Drives, Encoding, Non-autoregressive, CTC,
vision language models
BibRef
Man, Y.Z.[Yun-Ze],
Gui, L.Y.[Liang-Yan],
Wang, Y.X.[Yu-Xiong],
Situational Awareness Matters in 3D Vision Language Reasoning,
CVPR24(13678-13688)
IEEE DOI
2410
Visualization, Solid modeling, Estimation, Performance gain,
Cognition, Vision-Language, Multi-modal, 3D Reasoning
BibRef
Zheng, C.H.[Chen-Hao],
Zhang, J.[Jieyu],
Kembhavi, A.[Aniruddha],
Krishna, R.[Ranjay],
Iterated Learning Improves Compositionality in Large Vision-Language
Models,
CVPR24(13785-13795)
IEEE DOI
2410
Training, Training data, Games, Contrastive learning,
Benchmark testing, Performance gain, Cognitive science
BibRef
Leng, S.[Sicong],
Zhang, H.[Hang],
Chen, G.Z.[Guan-Zheng],
Li, X.[Xin],
Lu, S.J.[Shi-Jian],
Miao, C.Y.[Chun-Yan],
Bing, L.[Lidong],
Mitigating Object Hallucinations in Large Vision-Language Models
through Visual Contrastive Decoding,
CVPR24(13872-13882)
IEEE DOI
2410
Training, Visualization, Accuracy, Computational modeling,
Benchmark testing, Decoding, Multimodality,
Vision and Language
BibRef
Song, C.H.[Chull Hwan],
Hwang, T.[Taebaek],
Yoon, J.Y.[Joo-Young],
Choi, S.[Shunghyun],
Gu, Y.H.[Yeong Hyeon],
SyncMask: Synchronized Attentional Masking for Fashion-centric
Vision-Language Pretraining,
CVPR24(13948-13957)
IEEE DOI
2410
Training, Visualization, Image segmentation, Image resolution,
Refining, Contrastive learning
BibRef
Pramanick, S.[Shraman],
Han, G.X.[Guang-Xing],
Hou, R.[Rui],
Nag, S.[Sayan],
Lim, S.N.[Ser-Nam],
Ballas, N.[Nicolas],
Wang, Q.F.[Qi-Fan],
Chellappa, R.[Rama],
Almahairi, A.[Amjad],
Jack of All Tasks, Master of Many: Designing General-purpose
Coarse-to-Fine Vision-Language Model,
CVPR24(14076-14088)
IEEE DOI Code:
WWW Link.
2410
Image segmentation, Visualization, Image coding, Filters, Grounding,
Machine vision, Visual systems
BibRef
Zeng, Y.[Yunan],
Huang, Y.[Yan],
Zhang, J.J.[Jin-Jin],
Jie, Z.Q.[Ze-Qun],
Chai, Z.H.[Zhen-Hua],
Wang, L.[Liang],
Investigating Compositional Challenges in Vision-Language Models for
Visual Grounding,
CVPR24(14141-14151)
IEEE DOI
2410
Visualization, Codes, Grounding, Annotations, Pipelines, Benchmark testing
BibRef
Karmanov, A.[Adilbek],
Guan, D.[Dayan],
Lu, S.J.[Shi-Jian],
El Saddik, A.[Abdulmotaleb],
Xing, E.[Eric],
Efficient Test-Time Adaptation of Vision-Language Models,
CVPR24(14162-14171)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Computational modeling, Noise,
Predictive models, Benchmark testing
BibRef
Sameni, S.[Sepehr],
Kafle, K.[Kushal],
Tan, H.[Hao],
Jenni, S.[Simon],
Building Vision-Language Models on Solid Foundations with Masked
Distillation,
CVPR24(14216-14226)
IEEE DOI
2410
Training, Solid modeling, Visualization, Computational modeling,
Semantic segmentation, Buildings, LLM
BibRef
Li, R.J.[Rong-Jie],
Wu, Y.[Yu],
He, X.M.[Xu-Ming],
Learning by Correction: Efficient Tuning Task for Zero-Shot
Generative Vision-Language Reasoning,
CVPR24(13428-13437)
IEEE DOI
2410
Training, Visualization, Costs, Computational modeling, Cognition,
Question answering (information retrieval),
Vision-Language
BibRef
Peng, W.[Wujian],
Xie, S.C.[Si-Cheng],
You, Z.[Zuyao],
Lan, S.Y.[Shi-Yi],
Wu, Z.X.[Zu-Xuan],
Synthesize, Diagnose, and Optimize: Towards Fine-Grained
Vision-Language Understanding,
CVPR24(13279-13288)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Pipelines, Benchmark testing,
Linguistics, Vision language model, Fine-grained understdanding
BibRef
Zhao, Y.[Yue],
Zhao, L.[Long],
Zhou, X.Y.[Xing-Yi],
Wu, J.L.[Jia-Lin],
Chu, C.T.[Chun-Te],
Miao, H.[Hui],
Schroff, F.[Florian],
Adam, H.[Hartwig],
Liu, T.[Ting],
Gong, B.Q.[Bo-Qing],
Krähenbühl, P.[Philipp],
Yuan, L.Z.[Liang-Zhe],
Distilling Vision-Language Models on Millions of Videos,
CVPR24(13106-13116)
IEEE DOI
2410
Adaptation models, Computational modeling, Benchmark testing,
Data models, Text to video
BibRef
Chen, J.[Jieneng],
Yu, Q.H.[Qi-Hang],
Shen, X.H.[Xiao-Hui],
Yuille, A.L.[Alan L.],
Chen, L.C.[Liang-Chieh],
ViTamin: Designing Scalable Vision Models in the Vision-Language Era,
CVPR24(12954-12966)
IEEE DOI
2410
Training, Image segmentation, Accuracy, Protocols, Image coding, Scalability,
Computational modeling, Vision-Language Models, Architectural Design
BibRef
Liu, S.H.[Shi-Hong],
Yu, S.[Samuel],
Lin, Z.Q.[Zhi-Qiu],
Pathak, D.[Deepak],
Ramanan, D.[Deva],
Language Models as Black-Box Optimizers for Vision-Language Models,
CVPR24(12687-12697)
IEEE DOI
2410
Computational modeling, Natural languages, Closed box,
Text to image, Human in the loop, Data models,
generative models
BibRef
Howard, P.[Phillip],
Madasu, A.[Avinash],
Le, T.[Tiep],
Moreno, G.L.[Gustavo Lujan],
Bhiwandiwalla, A.[Anahita],
Lal, V.[Vasudev],
SocialCounterfactuals: Probing and Mitigating Intersectional Social
Biases in Vision-Language Models with Counterfactual Examples,
CVPR24(11975-11985)
IEEE DOI
2410
Training, Prevention and mitigation, Text to image,
Diffusion models, Fairness, social bias,
counterfactuals
BibRef
Jiang, Y.[Yankai],
Huang, Z.Z.[Zhong-Zhen],
Zhang, R.Z.[Rong-Zhao],
Zhang, X.F.[Xiao-Fan],
Zhang, S.T.[Shao-Ting],
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and
Self-Prompting,
CVPR24(11386-11397)
IEEE DOI
2410
Training, Visualization, Pathology, Image segmentation,
Image analysis, Computational modeling, Vision-Language Model
BibRef
Kim, Y.[Younghyun],
Mo, S.[Sangwoo],
Kim, M.[Minkyu],
Lee, K.[Kyungmin],
Lee, J.[Jaeho],
Shin, J.[Jinwoo],
Discovering and Mitigating Visual Biases Through Keyword Explanation,
CVPR24(11082-11092)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Image recognition, Computational modeling,
Training data, Flowering plants, bias and fairness, explainable AI,
vision-language model
BibRef
Li, R.[Rui],
Fischer, T.[Tobias],
Segu, M.[Mattia],
Pollefeys, M.[Marc],
Van Gool, L.J.[Luc J.],
Tombari, F.[Federico],
Know Your Neighbors: Improving Single-View Reconstruction via Spatial
Vision-Language Reasoning,
CVPR24(9848-9858)
IEEE DOI Code:
WWW Link.
2410
Geometry, Visualization, Attention mechanisms, Shape, Semantics,
radiance field, vision-language model, spatial context, spatial attention
BibRef
Zeng, Z.[Ziyao],
Wang, D.[Daniel],
Yang, F.Y.[Feng-Yu],
Park, H.[Hyoungseob],
Soatto, S.[Stefano],
Lao, D.[Dong],
Wong, A.[Alex],
WorDepth: Variational Language Prior for Monocular Depth Estimation,
CVPR24(9708-9719)
IEEE DOI Code:
WWW Link.
2410
Measurement, Codes, Estimation, Encoding,
Monocular Depth Estimation, Vision-Language Model, Variational Model
BibRef
Hu, Y.S.[Yu-Shi],
Stretcu, O.[Otilia],
Lu, C.T.[Chun-Ta],
Viswanathan, K.[Krishnamurthy],
Hata, K.[Kenji],
Luo, E.[Enming],
Krishna, R.[Ranjay],
Fuxman, A.[Ariel],
Visual Program Distillation: Distilling Tools and Programmatic
Reasoning into Vision-Language Models,
CVPR24(9590-9601)
IEEE DOI
2410
Visualization, Adaptation models, Computational modeling,
Instruments, Loading, Music, Cognition, vision-language model,
tools
BibRef
Khan, Z.[Zaid],
Fu, Y.[Yun],
Consistency and Uncertainty: Identifying Unreliable Responses From
Black-Box Vision-Language Models for Selective Visual Question
Answering,
CVPR24(10854-10863)
IEEE DOI
2410
Visualization, Uncertainty, Computational modeling, Closed box,
Predictive models, Question answering (information retrieval),
trustworthy ml
BibRef
Gu, T.C.[Tian-Cheng],
Yang, K.C.[Kai-Cheng],
Liu, D.[Dongnan],
Cai, W.D.[Wei-Dong],
LaPA: Latent Prompt Assist Model for Medical Visual Question
Answering,
DEF-AI-MIA24(4971-4980)
IEEE DOI Code:
WWW Link.
2410
Visualization, Accuracy, Medical services, Predictive models,
Feature extraction, Question answering (information retrieval), Data mining
BibRef
Silva-Rodríguez, J.[Julio],
Hajimiri, S.[Sina],
Ben Ayed, I.[Ismail],
Dolz, J.[Jose],
A Closer Look at the Few-Shot Adaptation of Large Vision-Language
Models,
CVPR24(23681-23690)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Computational modeling,
Transfer learning, Probes
BibRef
Zanella, M.[Maxime],
Ben Ayed, I.[Ismail],
Low-Rank Few-Shot Adaptation of Vision-Language Models,
Prompting24(1593-1603)
IEEE DOI
2410
Training, Adaptation models, Design methodology,
Few shot learning, Vision-Language, few-shot,
adapter
BibRef
Wang, W.X.[Wen-Xuan],
He, X.J.[Xing-Jian],
Zhang, Y.[Yisi],
Guo, L.T.[Long-Teng],
Shen, J.C.[Jia-Chen],
Li, J.Y.[Jiang-Yun],
Liu, J.[Jing],
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring
Image Segmentation,
MultMed(26), 2024, pp. 6906-6916.
IEEE DOI
2405
Image segmentation, Visualization, Task analysis, Correlation,
Feature extraction, Transformers, Semantics, vision and language
BibRef
Sahin, U.[Ugur],
Li, H.[Hang],
Khan, Q.[Qadeer],
Cremers, D.[Daniel],
Tresp, V.[Volker],
Enhancing Multimodal Compositional Reasoning of Visual Language
Models with Generative Negative Mining,
WACV24(5551-5561)
IEEE DOI Code:
HTML Version.
2404
Training, Visualization, Codes, Pipelines, Self-supervised learning,
Cognition, Algorithms, Vision + language and/or other modalities
BibRef
Yang, C.[Cheng],
Xu, R.[Rui],
Guo, Y.[Ye],
Huang, P.X.[Pei-Xiang],
Chen, Y.[Yiru],
Ding, W.[Wenkui],
Wang, Z.Y.[Zhong-Yuan],
Zhou, H.[Hong],
Improving Vision-and-Language Reasoning via Spatial Relations
Modeling,
WACV24(758-767)
IEEE DOI
2404
Visualization, Analytical models, Graphical models,
Statistical analysis, Computational modeling, Excavation,
Vision + language and/or other modalities
BibRef
Shen, S.[Sheng],
Yang, S.[Shijia],
Zhang, T.J.[Tian-Jun],
Zhai, B.[Bohan],
Gonzalez, J.E.[Joseph E.],
Keutzer, K.[Kurt],
Darrell, T.J.[Trevor J.],
Multitask Vision-Language Prompt Tuning,
WACV24(5644-5655)
IEEE DOI
2404
Learning systems, Visualization, Adaptation models,
Benchmark testing, Vectors, Task analysis, Algorithms,
Vision + language and/or other modalities
BibRef
Zhang, G.[Gengyuan],
Zhang, Y.R.[Yu-Rui],
Zhang, K.[Kerui],
Tresp, V.[Volker],
Can Vision-Language Models be a Good Guesser? Exploring VLMs for
Times and Location Reasoning,
WACV24(625-634)
IEEE DOI Code:
WWW Link.
2404
Visualization, Computational modeling, Feature extraction,
Cognition, Task analysis, Commonsense reasoning, Algorithms,
Vision + language and/or other modalities
BibRef
Feinglass, J.[Joshua],
Yang, Y.Z.[Ye-Zhou],
Towards Addressing the Misalignment of Object Proposal Evaluation for
Vision-Language Tasks via Semantic Grounding,
WACV24(4385-4395)
IEEE DOI
2404
Measurement, Visualization, Protocols, Annotations, Grounding,
Semantics, Question answering (information retrieval),
Image recognition and understanding
BibRef
Nadeem, A.[Asmar],
Hilton, A.[Adrian],
Dawes, R.[Robert],
Thomas, G.[Graham],
Mustafa, A.[Armin],
CAD: Contextual Multi-modal Alignment for Dynamic AVQA,
WACV24(7236-7248)
IEEE DOI
2404
Visualization, Semantics, Decision making, Robustness,
Question answering (information retrieval), Complexity theory,
Smartphones / end user devices
BibRef
Wu, W.[Wenyi],
Li, Q.[Qi],
Zhong, W.L.[Wen-Liang],
Huang, J.Z.[Jun-Zhou],
MIVC: Multiple Instance Visual Component for Visual-Language Models,
WACV24(8102-8111)
IEEE DOI
2404
Visualization, Computational modeling, Neural networks,
Question answering (information retrieval),
Image recognition and understanding
BibRef
Ganz, R.[Roy],
Nuriel, O.[Oren],
Aberdam, A.[Aviad],
Kittenplon, Y.[Yair],
Mazor, S.[Shai],
Litman, R.[Ron],
Towards Models that Can See and Read,
ICCV23(21661-21671)
IEEE DOI
2401
BibRef
Zhang, H.[Heng],
Liu, D.[Daqing],
Lv, Z.[Zezhong],
Su, B.[Bing],
Tao, D.C.[Da-Cheng],
Exploring Temporal Concurrency for Video-Language Representation
Learning,
ICCV23(15522-15532)
IEEE DOI Code:
WWW Link.
2401
BibRef
Shukor, M.[Mustafa],
Dancette, C.[Corentin],
Cord, M.[Matthieu],
eP-ALM: Efficient Perceptual Augmentation of Language Models,
ICCV23(21999-22012)
IEEE DOI Code:
WWW Link.
2401
BibRef
Schulter, S.[Samuel],
Kumar, B.G.V.[B.G. Vijay],
Suh, Y.M.[Yu-Min],
Dafnis, K.M.[Konstantinos M.],
Zhang, Z.X.[Zhi-Xing],
Zhao, S.Y.[Shi-Yu],
Metaxas, D.N.[Dimitris N.],
OmniLabel: A Challenging Benchmark for Language-Based Object
Detection,
ICCV23(11919-11928)
IEEE DOI Code:
WWW Link.
2401
BibRef
Chen, Z.L.[Zi-Liang],
Huang, X.[Xin],
Guan, Q.L.[Quan-Long],
Lin, L.[Liang],
Luo, W.Q.[Wei-Qi],
A Retrospect to Multi-prompt Learning across Vision and Language,
ICCV23(22133-22144)
IEEE DOI
2401
BibRef
Derakhshani, M.M.[Mohammad Mahdi],
Sanchez, E.[Enrique],
Bulat, A.[Adrian],
da Costa, V.G.T.[Victor Guilherme Turrisi],
Snoek, C.G.M.[Cees G. M.],
Tzimiropoulos, G.[Georgios],
Martinez, B.[Brais],
Bayesian Prompt Learning for Image-Language Model Generalization,
ICCV23(15191-15200)
IEEE DOI Code:
WWW Link.
2401
BibRef
Cascante-Bonilla, P.[Paola],
Shehada, K.[Khaled],
Smith, J.S.[James Seale],
Doveh, S.[Sivan],
Kim, D.H.[Dong-Hyun],
Panda, R.[Rameswar],
Varol, G.[Gül],
Oliva, A.[Aude],
Ordonez, V.[Vicente],
Feris, R.S.[Rogerio S.],
Karlinsky, L.[Leonid],
Going Beyond Nouns With Vision & Language Models Using Synthetic
Data,
ICCV23(20098-20108)
IEEE DOI
2401
BibRef
Upadhyay, U.[Uddeshya],
Karthik, S.[Shyamgopal],
Mancini, M.[Massimiliano],
Akata, Z.[Zeynep],
ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models,
ICCV23(1899-1910)
IEEE DOI Code:
WWW Link.
2401
BibRef
Bitton-Guetta, N.[Nitzan],
Bitton, Y.[Yonatan],
Hessel, J.[Jack],
Schmidt, L.[Ludwig],
Elovici, Y.[Yuval],
Stanovsky, G.[Gabriel],
Schwartz, R.[Roy],
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
Synthetic and Compositional Images,
ICCV23(2616-2627)
IEEE DOI
2401
BibRef
Hu, Z.Y.[Zi-Yuan],
Li, Y.[Yanyang],
Lyu, M.R.[Michael R.],
Wang, L.W.[Li-Wei],
VL-PET: Vision-and-Language Parameter-Efficient Tuning via
Granularity Control,
ICCV23(2998-3008)
IEEE DOI Code:
WWW Link.
2401
BibRef
Slyman, E.[Eric],
Kahng, M.[Minsuk],
Lee, S.[Stefan],
VLSlice: Interactive Vision-and-Language Slice Discovery,
ICCV23(15245-15255)
IEEE DOI
2401
BibRef
Najibi, M.[Mahyar],
Ji, J.W.[Jing-Wei],
Zhou, Y.[Yin],
Qi, C.R.[Charles R.],
Yan, X.C.[Xin-Chen],
Ettinger, S.[Scott],
Anguelov, D.[Dragomir],
Unsupervised 3D Perception with 2D Vision-Language Distillation for
Autonomous Driving,
ICCV23(8568-8578)
IEEE DOI
2401
BibRef
Xu, H.[Hu],
Xie, S.[Saining],
Huang, P.Y.[Po-Yao],
Yu, L.C.[Li-Cheng],
Howes, R.[Russell],
Ghosh, G.[Gargi],
Zettlemoyer, L.[Luke],
Feichtenhofer, C.[Christoph],
CiT: Curation in Training for Effective Vision-Language Data,
ICCV23(15134-15143)
IEEE DOI
2401
BibRef
Trager, M.[Matthew],
Perera, P.[Pramuditha],
Zancato, L.[Luca],
Achille, A.[Alessandro],
Bhatia, P.[Parminder],
Soatto, S.[Stefano],
Linear Spaces of Meanings: Compositional Structures in
Vision-Language Models,
ICCV23(15349-15358)
IEEE DOI
2401
BibRef
Chen, Y.S.[Yi-Syuan],
Song, Y.Z.[Yun-Zhu],
Yeo, C.Y.[Cheng Yu],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks,
ICCV23(15384-15396)
IEEE DOI
2401
BibRef
Wu, C.E.[Cheng-En],
Tian, Y.[Yu],
Yu, H.C.[Hai-Chao],
Wang, H.[Heng],
Morgado, P.[Pedro],
Hu, Y.H.[Yu Hen],
Yang, L.J.[Lin-Jie],
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy
Labels?,
ICCV23(15442-15451)
IEEE DOI Code:
WWW Link.
2401
BibRef
Ouali, Y.[Yassine],
Bulat, A.[Adrian],
Matinez, B.[Brais],
Tzimiropoulos, G.[Georgios],
Black Box Few-Shot Adaptation for Vision-Language models,
ICCV23(15488-15500)
IEEE DOI Code:
WWW Link.
2401
BibRef
Kan, B.[Baoshuo],
Wang, T.[Teng],
Lu, W.P.[Wen-Peng],
Zhen, X.T.[Xian-Tong],
Guan, W.[Weili],
Zheng, F.[Feng],
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language
Models,
ICCV23(15624-15634)
IEEE DOI
2401
BibRef
Zhai, J.T.[Jiang-Tian],
Zhang, Q.[Qi],
Wu, T.[Tong],
Chen, X.Y.[Xing-Yu],
Liu, J.J.[Jiang-Jiang],
Cheng, M.M.[Ming-Ming],
SLAN: Self-Locator Aided Network for Vision-Language Understanding,
ICCV23(21892-21901)
IEEE DOI Code:
WWW Link.
2401
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Yuan, J.[Junkun],
Tan, Z.C.[Zi-Chang],
Liu, J.J.[Jiang-Jiang],
Zhou, L.P.[Lu-Ping],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models,
ICCV23(21902-21912)
IEEE DOI
2401
BibRef
Cho, E.[Eulrang],
Kim, J.[Jooyeon],
Kim, H.W.J.[Hyun-Woo J.],
Distribution-Aware Prompt Tuning for Vision-Language Models,
ICCV23(21947-21956)
IEEE DOI Code:
WWW Link.
2401
BibRef
Varma, M.[Maya],
Delbrouck, J.B.[Jean-Benoit],
Hooper, S.[Sarah],
Chaudhari, A.[Akshay],
Langlotz, C.[Curtis],
ViLLA: Fine-Grained Vision-Language Representation Learning from
Real-World Data,
ICCV23(22168-22178)
IEEE DOI
2401
BibRef
Zhu, H.G.[Hong-Guang],
Wei, Y.C.[Yun-Chao],
Liang, X.D.[Xiao-Dan],
Zhang, C.J.[Chun-Jie],
Zhao, Y.[Yao],
CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation,
ICCV23(22200-22210)
IEEE DOI Code:
WWW Link.
2401
BibRef
Hu, Z.Z.[Zhi-Zhang],
Zhu, X.L.[Xin-Liang],
Tran, S.[Son],
Vidal, R.[René],
Dhua, A.[Arnab],
ProVLA: Compositional Image Search with Progressive Vision-Language
Alignment and Multimodal Fusion,
CLVL23(2764-2769)
IEEE DOI
2401
BibRef
Hall, M.[Melissa],
Gustafson, L.[Laura],
Adcock, A.[Aaron],
Misra, I.[Ishan],
Ross, C.[Candace],
Vision-Language Models Performing Zero-Shot Tasks Exhibit Disparities
Between Gender Groups,
CLVL23(2770-2777)
IEEE DOI
2401
BibRef
Agnolucci, L.[Lorenzo],
Baldrati, A.[Alberto],
Todino, F.[Francesco],
Becattini, F.[Federico],
Bertini, M.[Marco],
del Bimbo, A.[Alberto],
ECO: Ensembling Context Optimization for Vision-Language Models,
CLVL23(2803-2807)
IEEE DOI
2401
BibRef
Palit, V.[Vedant],
Pandey, R.[Rohan],
Arora, A.[Aryaman],
Liang, P.P.[Paul Pu],
Towards Vision-Language Mechanistic Interpretability: A Causal
Tracing Tool for BLIP,
CLVL23(2848-2853)
IEEE DOI
2401
BibRef
Sammani, F.[Fawaz],
Deligiannis, N.[Nikos],
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language
Tasks,
VLAR23(4636-4641)
IEEE DOI
2401
BibRef
Lee, D.J.[Dong-Jun],
Song, S.[Seokwon],
Suh, J.[Jihee],
Choi, J.[Joonmyeong],
Lee, S.[Sanghyeok],
Kim, H.W.J.[Hyun-Woo J.],
Read-only Prompt Optimization for Vision-Language Few-shot Learning,
ICCV23(1401-1411)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, X.[Xuanlin],
Fang, Y.H.[Yun-Hao],
Liu, M.H.[Ming-Hua],
Ling, Z.[Zhan],
Tu, Z.W.[Zhuo-Wen],
Su, H.[Hao],
Distilling Large Vision-Language Model with Out-of-Distribution
Generalizability,
ICCV23(2492-2503)
IEEE DOI
2401
BibRef
Li, J.C.[Jun-Cheng],
Gao, M.[Minghe],
Wei, L.H.[Long-Hui],
Tang, S.L.[Si-Liang],
Zhang, W.Q.[Wen-Qiao],
Li, M.Z.[Meng-Ze],
Ji, W.[Wei],
Tian, Q.[Qi],
Chua, T.S.[Tat-Seng],
Zhuang, Y.T.[Yue-Ting],
Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models,
ICCV23(2551-2562)
IEEE DOI
2401
BibRef
Bi, J.Y.[Jun-Yu],
Cheng, D.[Daixuan],
Yao, P.[Ping],
Pang, B.[Bochen],
Zhan, Y.F.[Yue-Feng],
Yang, C.G.[Chuan-Guang],
Wang, Y.J.[Yu-Jing],
Sun, H.[Hao],
Deng, W.W.[Wei-Wei],
Zhang, Q.[Qi],
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and
Instance-Level Matching,
ICCV23(2584-2593)
IEEE DOI
2401
BibRef
Udandarao, V.[Vishaal],
Gupta, A.[Ankush],
Albanie, S.[Samuel],
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models,
ICCV23(2725-2736)
IEEE DOI Code:
WWW Link.
2401
BibRef
Jiang, C.Y.[Chao-Ya],
Xu, H.Y.[Hai-Yang],
Ye, W.[Wei],
Ye, Q.H.[Qing-Hao],
Li, C.L.[Chen-Liang],
Yan, M.[Ming],
Bi, B.[Bin],
Zhang, S.K.[Shi-Kun],
Huang, F.[Fei],
Huang, S.F.[Song-Fang],
BUS: Efficient and Effective Vision-language Pre-training with
Bottom-Up Patch Summarization,
ICCV23(2888-2898)
IEEE DOI
2401
BibRef
Shi, C.[Cheng],
Yang, S.[Sibei],
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for
Vision-Language Models,
ICCV23(2920-2929)
IEEE DOI
2401
BibRef
Wang, A.J.P.[Alex Jin-Peng],
Lin, K.Q.[Kevin Qinghong],
Zhang, D.J.H.[David Jun-Hao],
Lei, S.W.X.[Stan Wei-Xian],
Shou, M.Z.[Mike Zheng],
Too Large; Data Reduction for Vision-Language Pre-Training,
ICCV23(3124-3134)
IEEE DOI
2401
BibRef
Wang, W.H.[Wei-Han],
Yang, Z.[Zhen],
Xu, B.[Bin],
Li, J.[Juanzi],
Sun, Y.[Yankui],
ViLTA: Enhancing Vision-Language Pre-training through Textual
Augmentation,
ICCV23(3135-3146)
IEEE DOI
2401
BibRef
Wang, T.J.J.[Tzu-Jui Julius],
Laaksonen, J.[Jorma],
Langer, T.[Tomas],
Arponen, H.[Heikki],
Bishop, T.E.[Tom E.],
Learning by Hallucinating:
Vision-Language Pre-training with Weak Supervision,
WACV23(1073-1083)
IEEE DOI
2302
Visualization, Vocabulary, Computational modeling, Detectors,
Benchmark testing, Transformers, un-supervised learning
BibRef
Boecking, B.[Benedikt],
Usuyama, N.[Naoto],
Bannur, S.[Shruthi],
Castro, D.C.[Daniel C.],
Schwaighofer, A.[Anton],
Hyland, S.[Stephanie],
Wetscherek, M.[Maria],
Naumann, T.[Tristan],
Nori, A.[Aditya],
Alvarez-Valle, J.[Javier],
Poon, H.[Hoifung],
Oktay, O.[Ozan],
Making the Most of Text Semantics to Improve Biomedical Vision-Language
Processing,
ECCV22(XXXVI:1-21).
Springer DOI
2211
BibRef
Cui, Q.[Quan],
Zhou, B.[Boyan],
Guo, Y.[Yu],
Yin, W.D.[Wei-Dong],
Wu, H.[Hao],
Yoshie, O.[Osamu],
Chen, Y.[Yubo],
Contrastive Vision-Language Pre-training with Limited Resources,
ECCV22(XXXVI:236-253).
Springer DOI
2211
BibRef
Walmer, M.[Matthew],
Sikka, K.[Karan],
Sur, I.[Indranil],
Shrivastava, A.[Abhinav],
Jha, S.[Susmit],
Dual-Key Multimodal Backdoors for Visual Question Answering,
CVPR22(15354-15364)
IEEE DOI
2210
Visualization, Training data, Detectors, Feature extraction,
Question answering (information retrieval),
Vision + language
BibRef
Ding, Y.[Yang],
Yu, J.[Jing],
Liu, B.[Bang],
Hu, Y.[Yue],
Cui, M.X.[Ming-Xin],
Wu, Q.[Qi],
MuKEA: Multimodal Knowledge Extraction and Accumulation for
Knowledge-based Visual Question Answering,
CVPR22(5079-5088)
IEEE DOI
2210
Bridges, Visualization, Codes, Computational modeling,
Knowledge based systems, Semantics, Vision + language
BibRef
Gao, F.[Feng],
Ping, Q.[Qing],
Thattai, G.[Govind],
Reganti, A.[Aishwarya],
Wu, Y.N.[Ying Nian],
Natarajan, P.[Prem],
Transform-Retrieve-Generate: Natural Language-Centric
Outside-Knowledge Visual Question Answering,
CVPR22(5057-5067)
IEEE DOI
2210
Knowledge engineering, Visualization, Solid modeling,
Knowledge based systems, Natural languages, Transforms,
Visual reasoning
BibRef
Aflalo, E.[Estelle],
Du, M.[Meng],
Tseng, S.Y.[Shao-Yen],
Liu, Y.F.[Yong-Fei],
Wu, C.[Chenfei],
Duan, N.[Nan],
Lal, V.[Vasudev],
VL-InterpreT: An Interactive Visualization Tool for Interpreting
Vision-Language Transformers,
CVPR22(21374-21383)
IEEE DOI
2210
Heating systems, Visualization, Machine vision,
Computational modeling, Transformers, Question answering (information retrieval)
BibRef
Hu, X.W.[Xiao-Wei],
Gan, Z.[Zhe],
Wang, J.F.[Jian-Feng],
Yang, Z.Y.[Zheng-Yuan],
Liu, Z.C.[Zi-Cheng],
Lu, Y.[Yumao],
Wang, L.J.[Li-Juan],
Scaling Up Vision-Language Pretraining for Image Captioning,
CVPR22(17959-17968)
IEEE DOI
2210
Training, Visualization, Computational modeling, Training data,
Benchmark testing, Transformers, Feature extraction, Vision + language
BibRef
Zhang, P.C.[Peng-Chuan],
Li, X.J.[Xiu-Jun],
Hu, X.W.[Xiao-Wei],
Yang, J.W.[Jian-Wei],
Zhang, L.[Lei],
Wang, L.J.[Li-Juan],
Choi, Y.J.[Ye-Jin],
Gao, J.F.[Jian-Feng],
VinVL: Revisiting Visual Representations in Vision-Language Models,
CVPR21(5575-5584)
IEEE DOI
2111
Training, Visualization, Computational modeling, Object detection,
Benchmark testing, Feature extraction, Transformers
BibRef
Li, Z.W.[Zhuo-Wan],
Stengel-Eskin, E.[Elias],
Zhang, Y.X.[Yi-Xiao],
Xie, C.[Cihang],
Tran, Q.[Quan],
van Durme, B.[Benjamin],
Yuille, A.L.[Alan L.],
Calibrating Concepts and Operations:
Towards Symbolic Reasoning on Real Images,
ICCV21(14890-14899)
IEEE DOI
2203
Visualization, Analytical models, Codes, Computational modeling,
Cognition, Data models, Vision + language
BibRef
Yang, X.[Xu],
Zhang, H.W.[Han-Wang],
Qi, G.J.[Guo-Jun],
Cai, J.F.[Jian-Fei],
Causal Attention for Vision-Language Tasks,
CVPR21(9842-9852)
IEEE DOI
2111
Correlation, Codes, Computational modeling,
Training data, Transformers, Data models
BibRef
Stefanini, M.[Matteo],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
A Novel Attention-based Aggregation Function to Combine Vision and
Language,
ICPR21(1212-1219)
IEEE DOI
2105
Deep learning, Visualization, Image retrieval,
Transforms, Knowledge discovery
BibRef
Jain, V.,
Lodhavia, J.,
Automatic Question Tagging using k-Nearest Neighbors and Random
Forest,
ISCV20(1-4)
IEEE DOI
2011
learning (artificial intelligence),
question answering (information retrieval),
Natural Language Processing
BibRef
Zheng, W.B.[Wen-Bo],
Yan, L.[Lan],
Gou, C.[Chao],
Wang, F.Y.[Fei-Yue],
Webly Supervised Knowledge Embedding Model for Visual Reasoning,
CVPR20(12442-12451)
IEEE DOI
2008
Visual reasoning between visual image and natural language description.
Visualization, Cognition, Knowledge based systems, Task analysis,
Knowledge engineering, Modulation, Robustness
BibRef
Nguyen, D.K.[Duy-Kien],
Okatani, T.[Takayuki],
Multi-Task Learning of Hierarchical Vision-Language Representation,
CVPR19(10484-10493).
IEEE DOI
2002
BibRef
Gupta, T.[Tanmay],
Shih, K.J.[Kevin J.],
Singh, S.[Saurabh],
Hoiem, D.[Derek],
Aligned Image-Word Representations Improve Inductive Transfer Across
Vision-Language Tasks,
ICCV17(4223-4232)
IEEE DOI
1802
data visualisation, image recognition,
learning (artificial intelligence),
Visualization
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Video Question Answering, Movies, Spatio-Temporal, Query, VQA .