Tamaazousti, Y.[Youssef],
Le Borgne, H.[Hervé],
Popescu, A.[Adrian],
Gadeski, E.[Etienne],
Ginsca, A.[Alexandru],
Hudelot, C.[Céline],
Vision-language integration using constrained local semantic features,
CVIU(163), No. 1, 2017, pp. 41-57.
Elsevier DOI
1712
Image classification
BibRef
Gouthaman, K.V.,
Nambiar, A.[Athira],
Srinivas, K.S.[Kancheti Sai],
Mittal, A.[Anurag],
Linguistically-aware attention for reducing the semantic gap in
vision-language tasks,
PR(112), 2021, pp. 107812.
Elsevier DOI
2102
Attention models, Visual question answering,
Counting in visual question answering, Image captioning
BibRef
Zhou, K.Y.[Kai-Yang],
Yang, J.K.[Jing-Kang],
Loy, C.C.[Chen Change],
Liu, Z.W.[Zi-Wei],
Learning to Prompt for Vision-Language Models,
IJCV(130), No. 9, September 2022, pp. 2337-2348.
Springer DOI
2208
BibRef
Zhou, K.Y.[Kai-Yang],
Yang, J.K.[Jing-Kang],
Loy, C.C.[Chen Change],
Liu, Z.W.[Zi-Wei],
Conditional Prompt Learning for Vision-Language Models,
CVPR22(16795-16804)
IEEE DOI
2210
Training, Representation learning, Adaptation models,
Neural networks, Manuals, Representation learning
BibRef
Ma, C.C.[Cheng-Cheng],
Liu, Y.[Yang],
Deng, J.K.[Jian-Kang],
Xie, L.X.[Ling-Xi],
Dong, W.M.[Wei-Ming],
Xu, C.S.[Chang-Sheng],
Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models,
CirSysVideo(33), No. 9, September 2023, pp. 4616-4629.
IEEE DOI Code:
WWW Link.
2310
BibRef
Zhu, Y.Q.[Yong-Qing],
Li, X.Y.[Xiang-Yang],
Zheng, M.[Mao],
Yang, J.H.[Jia-Hao],
Wang, Z.H.[Zi-Han],
Guo, X.Q.[Xiao-Qian],
Chai, Z.F.[Zi-Feng],
Yuan, Y.C.[Yu-Chen],
Jiang, S.Q.[Shu-Qiang],
Focus and Align: Learning Tube Tokens for Video-Language Pre-Training,
MultMed(25), 2023, pp. 8036-8050.
IEEE DOI
2312
BibRef
Chen, C.Q.[Chong-Qing],
Han, D.[Dezhi],
Chang, C.C.[Chin-Chen],
MPCCT: Multimodal vision-language learning paradigm with
context-based compact Transformer,
PR(147), 2024, pp. 110084.
Elsevier DOI Code:
WWW Link.
2312
Multimodal vision-language paradigms,
High-dependency modeling, Visual question answering (VQA),
Logical relationship reasoning
BibRef
Wu, W.H.[Wen-Hao],
Sun, Z.[Zhun],
Song, Y.X.[Yu-Xin],
Wang, J.D.[Jing-Dong],
Ouyang, W.L.[Wan-Li],
Transferring Vision-Language Models for Visual Recognition:
A Classifier Perspective,
IJCV(132), No. 2, February 2024, pp. 392-409.
Springer DOI
2402
BibRef
Ming, Y.F.[Yi-Fei],
Li, Y.X.[Yi-Xuan],
How Does Fine-Tuning Impact Out-of-Distribution Detection for
Vision-Language Models?,
IJCV(132), No. 2, February 2024, pp. 596-609.
Springer DOI
2402
BibRef
Zhao, C.R.[Cai-Rong],
Wang, Y.[Yubin],
Jiang, X.Y.[Xin-Yang],
Shen, Y.F.[Yi-Fei],
Song, K.[Kaitao],
Li, D.S.[Dong-Sheng],
Miao, D.Q.[Duo-Qian],
Learning Domain Invariant Prompt for Vision-Language Models,
IP(33), 2024, pp. 1348-1360.
IEEE DOI
2402
Task analysis, Tuning, Training, Adaptation models, Visualization,
Image color analysis, Self-supervised learning, Prompt learning,
domain generalization
BibRef
Yang, X.F.[Xiao-Feng],
Liu, F.[Fayao],
Lin, G.S.[Guo-Sheng],
Neural Logic Vision Language Explainer,
MultMed(26), 2024, pp. 3331-3340.
IEEE DOI
2402
Cognition, Logic programming, Deep learning, Visualization,
Data models, Training, Markov processes,
vision language pretraining
BibRef
Wang, Y.D.[Yi-Dong],
Yu, Z.O.[Zhu-Ohao],
Wang, J.D.[Jin-Dong],
Heng, Q.[Qiang],
Chen, H.[Hao],
Ye, W.[Wei],
Xie, R.[Rui],
Xie, X.[Xing],
Zhang, S.K.[Shi-Kun],
Exploring Vision-Language Models for Imbalanced Learning,
IJCV(132), No. 1, January 2024, pp. 224-237.
Springer DOI
2402
BibRef
Yu, Z.T.[Zheng-Tao],
Zhao, J.[Jia],
Guo, C.L.[Chen-Liang],
Yang, Y.[Ying],
StableNet: Distinguishing the hard samples to overcome language
priors in visual question answering,
IET-CV(18), No. 2, 2024, pp. 315-327.
DOI Link
2403
multimedia systems
BibRef
Zeng, Y.[Yan],
Zhang, X.[Xinsong],
Li, H.[Hang],
Wang, J.W.[Jia-Wei],
Zhang, J.P.[Ji-Peng],
Zhou, W.[Wangchunshu],
X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks,
PAMI(46), No. 5, May 2024, pp. 3156-3168.
IEEE DOI
2404
Task analysis, Visualization, Transformers, Detectors, Training,
Feature extraction, Image coding,
vision language pre-training
BibRef
Zheng, Y.Z.[Yao-Zong],
Zhong, B.[Bineng],
Liang, Q.H.[Qi-Hua],
Li, G.R.[Guo-Rong],
Ji, R.R.[Rong-Rong],
Li, X.X.[Xian-Xian],
Toward Unified Token Learning for Vision-Language Tracking,
CirSysVideo(34), No. 4, April 2024, pp. 2125-2135.
IEEE DOI
2404
Task analysis, Target tracking, Visualization, Feature extraction,
Pipelines, Linguistics, Training, Vision-language tracking,
multi-modal modeling
BibRef
Ye, P.[Ping],
Xiao, G.[Gang],
Liu, J.[Jun],
Multimodal Features Alignment for Vision-Language Object Tracking,
RS(16), No. 7, 2024, pp. 1168.
DOI Link
2404
BibRef
Bazi, Y.[Yakoub],
Bashmal, L.[Laila],
Rahhal, M.M.A.[Mohamad Mahmoud Al],
Ricci, R.[Riccardo],
Melgani, F.[Farid],
RS-LLaVA: A Large Vision-Language Model for Joint Captioning and
Question Answering in Remote Sensing Imagery,
RS(16), No. 9, 2024, pp. 1477.
DOI Link
2405
BibRef
Kong, D.[Daehyeon],
Kong, K.[Kyeongbo],
Kang, S.J.[Suk-Ju],
Image clustering using generated text centroids,
SP:IC(125), 2024, pp. 117128.
Elsevier DOI
2405
Deep neural network, Image clustering, Multimodal task, Vision-language model
BibRef
Chen, X.Y.[Xian-Yu],
Yang, J.H.[Jin-Hui],
Chen, S.[Shi],
Wang, L.[Louis],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Every Problem, Every Step, All in Focus: Learning to Solve
Vision-Language Problems With Integrated Attention,
PAMI(46), No. 7, July 2024, pp. 4720-4735.
IEEE DOI
2406
Problem-solving, Task analysis, Visualization, Measurement,
Graph neural networks, Cognition, Videos, Graph attention,
vision-language problem solving
BibRef
Menon, S.[Sachit],
Chandratreya, I.P.[Ishaan Preetam],
Vondrick, C.[Carl],
Task Bias in Contrastive Vision-Language Models,
IJCV(132), No. 6, June 2024, pp. 2026-2040.
Springer DOI
2406
BibRef
Zhang, J.Y.[Jing-Yi],
Huang, J.X.[Jia-Xing],
Jin, S.[Sheng],
Lu, S.J.[Shi-Jian],
Vision-Language Models for Vision Tasks: A Survey,
PAMI(46), No. 8, August 2024, pp. 5625-5644.
IEEE DOI
2407
Task analysis, Visualization, Training, Deep learning, Surveys,
Data models, Predictive models, Big Data, big model, deep learning,
image classification
BibRef
Dong, M.P.[Meng-Ping],
Li, F.[Fei],
Li, Z.B.[Zhen-Bo],
Liu, X.[Xue],
Cluster prototype earth mover's distance adapters and
alignment-guided prompt learning for vision-language models,
PR(156), 2024, pp. 110861.
Elsevier DOI
2408
Cluster prototype, Earth mover's distance, Adapter,
Prompt learning, Vision-language models
BibRef
Liu, Y.[Ye],
Pan, Y.[Yan],
Yin, J.[Jian],
Enhancing Multi-Label Deep Hashing for Image and Audio With Joint
Internal Global Loss Constraints and Large Vision-Language Model,
SPLetters(31), 2024, pp. 2550-2554.
IEEE DOI
2410
Codes, Transformers, Adaptation models, Training,
Convolutional neural networks, Feature extraction,
vision transformer
BibRef
Zhan, C.[Chenlu],
Zhang, Y.F.[Yu-Fei],
Lin, Y.[Yu],
Wang, G.[Gaoang],
Wang, H.W.[Hong-Wei],
UniDCP: Unifying Multiple Medical Vision-Language Tasks via Dynamic
Cross-Modal Learnable Prompts,
MultMed(26), 2024, pp. 9736-9748.
IEEE DOI
2410
Task analysis, Adaptation models, Visualization,
Medical diagnostic imaging, Tuning, Multitasking, Plastics,
cross-modal shareable space
BibRef
Su, K.[Ke],
Zhang, X.X.[Xing-Xing],
Zhang, S.Y.[Si-Yang],
Zhu, J.[Jun],
Zhang, B.[Bo],
To Boost Zero-Shot Generalization for Embodied Reasoning With
Vision-Language Pre-Training,
IP(33), 2024, pp. 5370-5381.
IEEE DOI
2410
Cognition, Visualization, Artificial intelligence, Training,
Image reconstruction, Navigation, vision-language pre-training
BibRef
Xuan, S.Y.[Shi-Yu],
Yang, M.[Ming],
Zhang, S.L.[Shi-Liang],
Adapting Vision-Language Models via Learning to Inject Knowledge,
IP(33), 2024, pp. 5798-5809.
IEEE DOI
2410
Feature extraction, Visualization, Adaptation models, Tuning,
Training, Transformers, Dogs, Accuracy, Robustness, Few shot learning,
knowledge injection
BibRef
Zhou, W.[Wenlve],
Zhou, Z.H.[Zhi-Heng],
Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training,
CirSysVideo(34), No. 9, September 2024, pp. 8201-8214.
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Task analysis, Training, Computational modeling, Tuning,
Data models, Visualization, Unsupervised domain adaptation, model deployment
BibRef
Guo, M.H.[Meng-Hao],
Zhang, Y.[Yi],
Mu, T.J.[Tai-Jiang],
Huang, S.X.[Sharon X.],
Hu, S.M.[Shi-Min],
Tuning Vision-Language Models With Multiple Prototypes Clustering,
PAMI(46), No. 12, December 2024, pp. 11186-11199.
IEEE DOI
2411
Prototypes, Adaptation models, Tuning, Visualization,
Benchmark testing, Computational modeling, Data models, clustering
BibRef
Sun, B.[Bo],
Wu, Z.C.[Zhi-Chao],
Zhang, H.[Hao],
He, J.[Jun],
VTPL: Visual and text prompt learning for visual-language models,
JVCIR(104), 2024, pp. 104280.
Elsevier DOI
2411
V-L models, Prompt learning, Visual and text prompts,
Poly-1 information NCE loss, Center loss
BibRef
Liu, L.C.[Liang-Chen],
Wang, N.N.[Nan-Nan],
Liu, D.[Decheng],
Yang, X.[Xi],
Gao, X.B.[Xin-Bo],
Liu, T.L.[Tong-Liang],
Towards Specific Domain Prompt Learning via Improved Text Label
Optimization,
MultMed(26), 2024, pp. 10805-10815.
IEEE DOI
2411
Visualization, Optimization, Semantics, Task analysis, Terminology,
Learning systems, Adaptation models, vision-language model
BibRef
Liu, X.[Xin],
Wu, J.[Jiamin],
Yang, W.F.[Wen-Fei],
Zhou, X.[Xu],
Zhang, T.Z.[Tian-Zhu],
Multi-Modal Attribute Prompting for Vision-Language Models,
CirSysVideo(34), No. 11, November 2024, pp. 11579-11591.
IEEE DOI
2412
Visualization, Task analysis, Semantics, Adaptation models,
Integrated circuit modeling, Vectors,
attribute
BibRef
Jiang, H.J.[Hao-Jun],
Zhang, J.K.[Jian-Ke],
Huang, R.[Rui],
Ge, C.J.[Chun-Jiang],
Ni, Z.[Zanlin],
Song, S.[Shiji],
Huang, G.[Gao],
Cross-modal adapter for vision-language retrieval,
PR(159), 2025, pp. 111144.
Elsevier DOI
2412
Adapter, Cross-modal interaction, Cross-modal retrieval,
Parameter-efficient training, Multi-modal learning
BibRef
Tan, Y.T.[Ying-Tao],
Chen, Y.Y.[Ying-Ying],
Wang, J.Q.[Jin-Qiao],
DSTA: Reinforcing Vision-Language Understanding for Scene-Text VQA
With Dual-Stream Training Approach,
SPLetters(32), 2025, pp. 6-10.
IEEE DOI
2501
Optical character recognition, Training, Visualization,
Feature extraction, Transformers, Text recognition,
sence-text understanding
BibRef
Li, T.[Tang],
Ma, M.M.[Meng-Meng],
Peng, X.[Xi],
DEAL: Disentangle and Localize Concept-level Explanations for VLMs,
ECCV24(XXXIX: 383-401).
Springer DOI
2412
BibRef
Park, K.Y.[Kwan-Yong],
Saito, K.[Kuniaki],
Kim, D.H.[Dong-Hyun],
Weak-to-strong Compositional Learning from Generative Models for
Language-based Object Detection,
ECCV24(XXIII: 1-19).
Springer DOI
2412
BibRef
Li, S.C.[Shi-Cheng],
Li, L.[Lei],
Liu, Y.[Yi],
Ren, S.[Shuhuai],
Liu, Y.Y.X.[Yuan-Yan-Xin],
Gao, R.D.[Run-Dong],
Sun, X.[Xu],
Hou, L.[Lu],
Vitatecs: A Diagnostic Dataset for Temporal Concept Understanding of
Video-language Models,
ECCV24(LXX: 331-348).
Springer DOI
2412
BibRef
Yang, Y.T.[Yan-Ting],
Chen, M.H.[Ming-Hao],
Qiu, Q.[Qibo],
Wu, J.H.[Jia-Hao],
Wang, W.X.[Wen-Xiao],
Lin, B.B.[Bin-Bin],
Guan, Z.Y.[Zi-Yu],
He, X.F.[Xiao-Fei],
Adapt2reward: Adapting Video-language Models to Generalizable Robotic
Rewards via Failure Prompts,
ECCV24(LVII: 163-180).
Springer DOI
2412
BibRef
Rahmanzadehgervi, P.[Pooyan],
Bolton, L.[Logan],
Taesiri, M.R.[Mohammad Reza],
Nguyen, A.T.[Anh Totti],
Vision Language Models are blind,
ACCV24(V: 293-309).
Springer DOI
2412
BibRef
Lai, C.G.[Chen-Gen],
Song, S.L.[Sheng-Li],
Yan, S.[Sitong],
Hu, G.[Guangneng],
Improving Vision and Language Concepts Understanding with Multimodal
Counterfactual Samples,
ECCV24(LXIX: 174-191).
Springer DOI
2412
BibRef
Chytas, S.P.[Sotirios Panagiotis],
Kim, H.W.J.[Hyun-Woo J.],
Singh, V.[Vikas],
Understanding Multi-compositional Learning in Vision and Language
Models via Category Theory,
ECCV24(XLVIII: 324-341).
Springer DOI
2412
BibRef
Song, Y.Z.[Yun-Zhu],
Chen, Y.S.[Yi-Syuan],
Lin, T.L.[Tzu-Ling],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
Capture Concept Through Comparison: Vision-and-language Representation
Learning with Intrinsic Information Mining,
ACCV24(III: 220-238).
Springer DOI
2412
BibRef
Adhikari, R.[Rabin],
Thapaliya, S.[Safal],
Dhakal, M.[Manish],
Khanal, B.[Bishesh],
Tunevlseg: Prompt Tuning Benchmark for Vision-language Segmentation
Models,
ACCV24(III: 44-62).
Springer DOI
2412
BibRef
He, H.C.[Hai-Chen],
Liu, W.B.[Wei-Bin],
Xing, W.W.[Wei-Wei],
Biefficient: Bidirectionally Prompting Vision-language Models for
Parameter-efficient Video Recognition,
ACCV24(III: 257-274).
Springer DOI
2412
BibRef
Yang, J.K.[Jing-Kang],
Dong, Y.H.[Yu-Hao],
Liu, S.[Shuai],
Li, B.[Bo],
Wang, Z.Y.[Zi-Yue],
Tan, H.R.[Hao-Ran],
Jiang, C.C.[Chen-Cheng],
Kang, J.[Jiamu],
Zhang, Y.[Yuanhan],
Zhou, K.Y.[Kai-Yang],
Liu, Z.W.[Zi-Wei],
Octopus: Embodied Vision-language Programmer from Environmental
Feedback,
ECCV24(I: 20-38).
Springer DOI
2412
BibRef
Kar, O.F.[Oguzhan Fatih],
Tonioni, A.[Alessio],
Poklukar, P.[Petra],
Kulshrestha, A.[Achin],
Zamir, A.[Amir],
Tombari, F.[Federico],
Brave: Broadening the Visual Encoding of Vision-language Models,
ECCV24(XVI: 113-132).
Springer DOI
2412
BibRef
Kamath, A.[Amita],
Hsieh, C.Y.[Cheng-Yu],
Chang, K.W.[Kai-Wei],
Krishna, R.[Ranjay],
The Hard Positive Truth About Vision-language Compositionality,
ECCV24(XIV: 37-54).
Springer DOI
2412
BibRef
Ye-Bin, M.[Moon],
Hyeon-Woo, N.[Nam],
Choi, W.[Wonseok],
Oh, T.H.[Tae-Hyun],
Beaf: Observing Before-after Changes to Evaluate Hallucination in
Vision-language Models,
ECCV24(XI: 232-248).
Springer DOI
2412
BibRef
Jia, B.X.[Bao-Xiong],
Chen, Y.X.[Yi-Xin],
Yu, H.[Huangyue],
Wang, Y.[Yan],
Niu, X.S.[Xue-Song],
Liu, T.[Tengyu],
Li, Q.[Qing],
Huang, S.Y.[Si-Yuan],
Sceneverse: Scaling 3d Vision-language Learning for Grounded Scene
Understanding,
ECCV24(IX: 289-310).
Springer DOI
2412
BibRef
Zhang, Y.F.[Yi-Feng],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Learning Chain of Counterfactual Thought for Bias-robust
Vision-language Reasoning,
ECCV24(VIII: 334-351).
Springer DOI
2412
BibRef
Ruan, S.[Shouwei],
Dong, Y.P.[Yin-Peng],
Liu, H.Q.[Han-Qing],
Huang, Y.[Yao],
Su, H.[Hang],
Wei, X.X.[Xing-Xing],
Omniview-tuning: Boosting Viewpoint Invariance of Vision-language
Pre-training Models,
ECCV24(XXVI: 309-327).
Springer DOI
2412
BibRef
Li, J.[Junyan],
Chen, D.[Delin],
Cai, T.[Tianle],
Chen, P.H.[Pei-Hao],
Hong, Y.[Yining],
Chen, Z.F.[Zhen-Fang],
Shen, Y.[Yikang],
Gan, C.[Chuang],
Flexattention for Efficient High-resolution Vision-language Models,
ECCV24(XXV: 286-302).
Springer DOI
2412
BibRef
Li, X.[Xiang],
Ding, J.[Jian],
Chen, Z.Y.[Zhao-Yang],
Elhoseiny, M.[Mohamed],
UNI3DL: A Unified Model for 3d Vision-language Understanding,
ECCV24(XXIII: 74-92).
Springer DOI
2412
BibRef
Hao, T.X.[Tian-Xiang],
Ding, X.H.[Xiao-Han],
Feng, J.X.[Jue-Xiao],
Yang, Y.H.[Yu-Hong],
Chen, H.[Hui],
Ding, G.[Guiguang],
Quantized Prompt for Efficient Generalization of Vision-language Models,
ECCV24(XIX: 54-73).
Springer DOI
2412
BibRef
Xu, H.B.[Huang-Biao],
Ke, X.[Xiao],
Li, Y.Z.[Yue-Zhou],
Xu, R.[Rui],
Wu, H.Q.[Huan-Qi],
Lin, X.F.[Xiao-Feng],
Guo, W.Z.[Wen-Zhong],
Vision-language Action Knowledge Learning for Semantic-aware Action
Quality Assessment,
ECCV24(XLII: 423-440).
Springer DOI
2412
BibRef
Zhu, Z.Y.[Zi-Yu],
Zhang, Z.[Zhuofan],
Ma, X.J.[Xiao-Jian],
Niu, X.S.[Xue-Song],
Chen, Y.X.[Yi-Xin],
Jia, B.X.[Bao-Xiong],
Deng, Z.D.[Zhi-Dong],
Huang, S.Y.[Si-Yuan],
Li, Q.[Qing],
Unifying 3d Vision-language Understanding via Promptable Queries,
ECCV24(XLIV: 188-206).
Springer DOI
2412
BibRef
Zhang, J.M.[Jia-Ming],
Ma, X.J.[Xing-Jun],
Wang, X.[Xin],
Qiu, L.Y.[Ling-Yu],
Wang, J.Q.[Jia-Qi],
Jiang, Y.G.[Yu-Gang],
Sang, J.[Jitao],
Adversarial Prompt Tuning for Vision-language Models,
ECCV24(XLV: 56-72).
Springer DOI
2412
BibRef
Wu, G.[Ge],
Zhang, X.[Xin],
Li, Z.[Zheng],
Chen, Z.W.[Zhao-Wei],
Liang, J.J.[Jia-Jun],
Yang, J.[Jian],
Li, X.[Xiang],
Cascade Prompt Learning for Vision-language Model Adaptation,
ECCV24(L: 304-321).
Springer DOI
2412
BibRef
Gao, S.[Sensen],
Jia, X.J.[Xiao-Jun],
Ren, X.H.[Xu-Hong],
Tsang, I.[Ivor],
Guo, Q.[Qing],
Boosting Transferability in Vision-language Attacks via Diversification
Along the Intersection Region of Adversarial Trajectory,
ECCV24(LVII: 442-460).
Springer DOI
2412
BibRef
Lafon, M.[Marc],
Ramzi, E.[Elias],
Rambour, C.[Clément],
Audebert, N.[Nicolas],
Thome, N.[Nicolas],
Gallop: Learning Global and Local Prompts for Vision-language Models,
ECCV24(LXI: 264-282).
Springer DOI
2412
BibRef
Jiang, H.B.[Hao-Bin],
Yue, J.P.[Jun-Peng],
Luo, H.[Hao],
Ding, Z.[Ziluo],
Lu, Z.Q.[Zong-Qing],
Reinforcement Learning Friendly Vision-language Model for Minecraft,
ECCV24(LXVIII: 1-17).
Springer DOI
2412
BibRef
Nguyen, A.T.[A. Tuan],
Tai, K.S.[Kai Sheng],
Chen, B.C.[Bor-Chun],
Shukla, S.N.[Satya Narayan],
Yu, H.[Hanchao],
Torr, P.H.S.[Philip H.S.],
Tian, T.P.[Tai-Peng],
Lim, S.N.[Ser-Nam],
ucap: An Unsupervised Prompting Method for Vision-language Models,
ECCV24(LXXIV: 425-439).
Springer DOI
2412
BibRef
Zhang, Y.[Yi],
Yu, K.[Ke],
Wu, S.Q.[Si-Qi],
He, Z.H.[Zhi-Hai],
Conceptual Codebook Learning for Vision-language Models,
ECCV24(LXXVII: 235-251).
Springer DOI
2412
BibRef
Kim, M.[Minchan],
Kim, M.[Minyeong],
Bae, J.[Junik],
Choi, S.[Suhwan],
Kim, S.[Sungkyung],
Chang, B.[Buru],
Exploiting Semantic Reconstruction to Mitigate Hallucinations in
Vision-language Models,
ECCV24(LXXXVI: 236-252).
Springer DOI
2412
BibRef
Chatterjee, A.[Agneet],
Luo, Y.[Yiran],
Gokhale, T.[Tejas],
Yang, Y.Z.[Ye-Zhou],
Baral, C.[Chitta],
Revision: Rendering Tools Enable Spatial Fidelity in Vision-language
Models,
ECCV24(XXX: 339-357).
Springer DOI
2412
BibRef
Ataallah, K.[Kirolos],
Shen, X.Q.[Xiao-Qian],
Abdelrahman, E.[Eslam],
Sleiman, E.[Essam],
Zhuge, M.C.[Ming-Chen],
Ding, J.[Jian],
Zhu, D.[Deyao],
Schmidhuber, J.[Jürgen],
Elhoseiny, M.[Mohamed],
Goldfish: Vision-language Understanding of Arbitrarily Long Videos,
ECCV24(XXIX: 251-267).
Springer DOI
2412
BibRef
Shen, R.[Ruoyue],
Inoue, N.[Nakamasa],
Shinoda, K.[Koichi],
Pyramid Coder: Hierarchical Code Generator for Compositional Visual
Question Answering,
ICIP24(430-436)
IEEE DOI
2411
Training, Visualization, Codes, Accuracy, Large language models,
Natural languages, Visual question answering, Prompting methods
BibRef
Sharma, P.[Pratyusha],
Shaham, T.R.[Tamar Rott],
Baradad, M.[Manel],
Rodriíuez-Muñoz, A.[Adrián],
Duggal, S.[Shivam],
Isola, P.[Phillip],
Torralba, A.[Antonio],
Fu, S.[Stephanie],
A Vision Check-up for Language Models,
CVPR24(14410-14419)
IEEE DOI
2410
Representation learning, Visualization, Analytical models, Codes,
Image synthesis, Computational modeling
BibRef
Chen, X.[Xi],
Djolonga, J.[Josip],
Padlewski, P.[Piotr],
Mustafa, B.[Basil],
Changpinyo, S.[Soravit],
Wu, J.L.[Jia-Lin],
Ruiz, C.R.[Carlos Riquelme],
Goodman, S.[Sebastian],
Wang, X.[Xiao],
Tay, Y.[Yi],
Shakeri, S.[Siamak],
Dehghani, M.[Mostafa],
Salz, D.[Daniel],
Lucic, M.[Mario],
Tschannen, M.[Michael],
Nagrani, A.[Arsha],
Hu, H.[Hexiang],
Joshi, M.[Mandar],
Pang, B.[Bo],
Montgomery, C.[Ceslee],
Pietrzyk, P.[Paulina],
Ritter, M.[Marvin],
Piergiovanni, A.[AJ],
Minderer, M.[Matthias],
Pavetic, F.[Filip],
Waters, A.[Austin],
Li, G.[Gang],
Alabdulmohsin, I.[Ibrahim],
Beyer, L.[Lucas],
Amelot, J.[Julien],
Lee, K.[Kenton],
Steiner, A.P.[Andreas Peter],
Li, Y.[Yang],
Keysers, D.[Daniel],
Arnab, A.[Anurag],
Xu, Y.Z.[Yuan-Zhong],
Rong, K.[Keran],
Kolesnikov, A.[Alexander],
Seyedhosseini, M.[Mojtaba],
Angelova, A.[Anelia],
Zhai, X.H.[Xiao-Hua],
Houlsby, N.[Neil],
Soricut, R.[Radu],
On Scaling Up a Multilingual Vision and Language Model,
CVPR24(14432-14444)
IEEE DOI
2410
Training, Visualization, Computational modeling, Object detection,
Benchmark testing, Question answering (information retrieval),
pretraining
BibRef
Parodi, F.[Felipe],
Matelsky, J.K.[Jordan K.],
Regla-Vargas, A.[Alejandra],
Foglia, E.E.[Elizabeth E.],
Lim, C.[Charis],
Weinberg, D.[Danielle],
Kording, K.P.[Konrad P.],
Herrick, H.M.[Heidi M.],
Platt, M.L.[Michael L.],
Vision-language models for decoding provider attention during
neonatal resuscitation,
CVPM24(343-353)
IEEE DOI
2410
Training, Pediatrics, Accuracy, Semantics, Decision making, Transformers
BibRef
Zhang, Y.[Yabin],
Zhu, W.J.[Wen-Jie],
Tang, H.[Hui],
Ma, Z.Y.[Zhi-Yuan],
Zhou, K.Y.[Kai-Yang],
Zhang, L.[Lei],
Dual Memory Networks: A Versatile Adaptation Approach for
Vision-Language Models,
CVPR24(28718-28728)
IEEE DOI Code:
WWW Link.
2410
Training, Knowledge engineering, Adaptation models, Codes,
Training data, Data models, Vision-language models,
versatile adaptation
BibRef
Guo, Y.[Yuncheng],
Gu, X.D.[Xiao-Dong],
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language
Models,
CVPR24(28695-28705)
IEEE DOI
2410
Adaptation models, Adaptive systems, Noise, Manuals, Robustness,
Noise measurement,
prompt learning
BibRef
Byun, J.[Jaeseok],
Kim, D.[Dohoon],
Moon, T.[Taesup],
MAFA: Managing False Negatives for Vision-Language Pre-Training,
CVPR24(27304-27314)
IEEE DOI Code:
WWW Link.
2410
Smoothing methods, Codes, Computational modeling, Buildings
BibRef
Han, J.[Jinwei],
Lin, Z.W.[Zhi-Wen],
Sun, Z.Y.[Zhong-Yisun],
Gao, Y.G.[Ying-Guo],
Yan, K.[Ke],
Ding, S.H.[Shou-Hong],
Gao, Y.[Yuan],
Xia, G.S.[Gui-Song],
Anchor-based Robust Finetuning of Vision-Language Models,
CVPR24(26909-26918)
IEEE DOI
2410
Image recognition, Zero-shot learning, Semantics,
Benchmark testing, Anchor, Robust Finetuning
BibRef
Wei, Z.[Zihao],
Pan, Z.X.[Zi-Xuan],
Owens, A.[Andrew],
Efficient Vision-Language Pre-Training by Cluster Masking,
CVPR24(26805-26815)
IEEE DOI
2410
Training, Visualization, Semantics, Contrastive learning, Writing,
Predictive models
BibRef
Cao, Q.L.[Qing-Long],
Zheng-Qin, X.,
Chen, Y.[Yuntian],
Chao, M.,
Yang, X.K.[Xiao-Kang],
Domain Prompt Learning with Quaternion Networks,
CVPR24(26627-26636)
IEEE DOI Code:
WWW Link.
2410
Knowledge engineering, Adaptation models, Codes, Quaternions,
Face recognition, Contrastive learning, vision-language models,
quaternion networks
BibRef
Wang, S.[Sibo],
Zhang, J.[Jie],
Yuan, Z.[Zheng],
Shan, S.G.[Shi-Guang],
Pre-Trained Model Guided Fine-Tuning for Zero-Shot Adversarial
Robustness,
CVPR24(24502-24511)
IEEE DOI
2410
Training, Accuracy, Codes, Minimization, Robustness,
Zero-Shot, Adversarial Robustness, Large-scale vision-language models
BibRef
Li, L.[Lin],
Guan, H.Y.[Hao-Yan],
Qiu, J.N.[Jia-Ning],
Spratling, M.[Michael],
One Prompt Word is Enough to Boost Adversarial Robustness for
Pre-Trained Vision-Language Models,
CVPR24(24408-24419)
IEEE DOI Code:
WWW Link.
2410
Accuracy, Codes, Training data, Robustness,
Computational efficiency, vision-language models,
VLMs
BibRef
Zanella, M.[Maxime],
Ayed, I.B.[Ismail Ben],
On the Test-Time Zero-Shot Generalization of Vision-Language Models:
Do we Really need Prompt Learning?,
CVPR24(23783-23793)
IEEE DOI
2410
Training, Systematics, Computational modeling, Quality assessment,
Computational efficiency, vision-language,
training-free
BibRef
Yao, H.T.[Han-Tao],
Zhang, R.[Rui],
Xu, C.S.[Chang-Sheng],
TCP: Textual-Based Class-Aware Prompt Tuning for Visual-Language
Model,
CVPR24(23438-23448)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Benchmark testing,
Tuning
BibRef
Yang, S.[Senqiao],
Tian, Z.[Zhuotao],
Jiang, L.[Li],
Jia, J.Y.[Jia-Ya],
Unified Language-Driven Zero-Shot Domain Adaptation,
CVPR24(23407-23415)
IEEE DOI
2410
Representation learning, Adaptation models, Visualization,
Correlation, Scalability, Computational modeling,
Vision-Language Model
BibRef
Cui, J.Q.[Jie-Quan],
Zhu, B.[Beier],
Wen, X.[Xin],
Qi, X.J.[Xiao-Juan],
Yu, B.[Bei],
Zhang, H.W.[Han-Wang],
Classes Are Not Equal: An Empirical Study on Image Recognition
Fairness,
CVPR24(23283-23292)
IEEE DOI
2410
Training, Representation learning, Image recognition, Accuracy,
Predictive models, Network architecture, Prediction algorithms,
Vision-Language Models
BibRef
Stojnic, V.[Vladan],
Kalantidis, Y.[Yannis],
Tolias, G.[Giorgos],
Label Propagation for Zero-shot Classification with Vision-Language
Models,
CVPR24(23209-23218)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Closed box, Encoding, Data models,
vision-language models, label propagation, zero-shot classification
BibRef
Yuan, T.[Tongtong],
Zhang, X.[Xuange],
Liu, K.[Kun],
Liu, B.[Bo],
Chen, C.[Chen],
Jin, J.[Jian],
Jiao, Z.Z.[Zhen-Zhen],
Towards Surveillance Video-and-Language Understanding: New Dataset,
Baselines, and Challenges,
CVPR24(22052-22061)
IEEE DOI Code:
WWW Link.
2410
Annotations, Surveillance, Semantics, Benchmark testing,
Public security, Timing, Security, Dataset Annotation
BibRef
Chen, Y.F.[Yi-Fei],
Chen, D.P.[Da-Peng],
Liu, R.J.[Rui-Jin],
Zhou, S.[Sai],
Xue, W.Y.[Wen-Yuan],
Peng, W.[Wei],
Align Before Adapt: Leveraging Entity-to-Region Alignments for
Generalizable Video Action Recognition,
CVPR24(18688-18698)
IEEE DOI
2410
Representation learning, Adaptation models, Visualization, Semantics,
Transformers, Vectors, Video action recognition, visual-language model
BibRef
Mittal, H.[Himangi],
Agarwal, N.[Nakul],
Lo, S.Y.[Shao-Yuan],
Lee, K.[Kwonjoon],
Can't make an Omelette without Breaking some Eggs: Plausible Action
Anticipation using Large Video-Language Models,
CVPR24(18580-18590)
IEEE DOI
2410
Accuracy, Computational modeling, Linear programming,
Action Anticipation, Video, Large Multimodal Models
BibRef
Kahatapitiya, K.[Kumara],
Arnab, A.[Anurag],
Nagran, A.[Arsha],
Ryoo, M.S.[Michael S.],
VicTR: Video-conditioned Text Representations for Activity
Recognition,
CVPR24(18547-18558)
IEEE DOI
2410
Training, Visualization, Adaptation models, Semantics, Focusing,
Benchmark testing, Vision-language models, Activity Recognition,
Video-conditioned Text
BibRef
Wu, T.Y.[Tz-Ying],
Ho, C.H.[Chih-Hui],
Vasconcelos, N.M.[Nuno M.],
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification,
CVPR24(16531-16540)
IEEE DOI Code:
WWW Link.
2410
Measurement, Training, Frequency modulation, Accuracy, Taxonomy,
Semantics, Hierarchical Classification, Visual-language foundation model
BibRef
Zhao, G.[Ganlong],
Li, G.B.[Guan-Bin],
Chen, W.[Weikai],
Yu, Y.Z.[Yi-Zhou],
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with
Open-Vocabulary Detection and StructurEd Representation,
CVPR24(16296-16306)
IEEE DOI
2410
Art, Accuracy, Navigation, Annotations, Detectors,
Vision-and-Language Navigation, Open-vocabulary, Multi-Modal Learning
BibRef
Li, X.[Xin],
Wu, Y.F.[Yun-Fei],
Jiang, X.H.[Xing-Hua],
Guo, Z.H.[Zhi-Hao],
Gong, M.M.[Ming-Ming],
Cao, H.Y.[Hao-Yu],
Liu, Y.S.[Yin-Song],
Jiang, D.Q.[De-Qiang],
Sun, X.[Xing],
Enhancing Visual Document Understanding with Contrastive Learning in
Large Visual-Language Models,
CVPR24(15546-15555)
IEEE DOI
2410
Visualization, Computational modeling, Contrastive learning,
Benchmark testing, Feature extraction, Filling, Contrastive Learning
BibRef
Pham, K.[Khoi],
Huynh, C.[Chuong],
Lim, S.N.[Ser-Nam],
Shrivastava, A.[Abhinav],
Composing Object Relations and Attributes for Image-Text Matching,
CVPR24(14354-14363)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Image edge detection,
Semantics, Benchmark testing, vision-language, image retrieval,
image-text matching
BibRef
Lee, J.H.[Ju-Hee],
Kang, J.W.[Je-Won],
SRTube: Video-Language Pre-Training with Action-Centric Video Tube
Features and Semantic Role Labeling,
CVPR24(13689-13699)
IEEE DOI
2410
Attention mechanisms, Computational modeling, Semantics,
Electron tubes, Trajectory, video-language pre-training
BibRef
Kim, G.[Gahyeon],
Kim, S.[Sohee],
Lee, S.[Seokju],
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models,
Prompting24(1572-1582)
IEEE DOI
2410
Visualization, Zero-shot learning, Semantics, Focusing,
Feature extraction, Data augmentation, Vectors, prompt learning, VLMs
BibRef
Xu, Z.[Zhenlin],
Zhu, Y.[Yi],
Deng, S.Q.[Si-Qi],
Mittal, A.[Abhay],
Chen, Y.B.[Yan-Bei],
Wang, M.[Manchen],
Favaro, P.[Paolo],
Tighe, J.[Joseph],
Modolo, D.[Davide],
Benchmarking Zero-Shot Recognition with Vision-Language Models:
Challenges on Granularity and Specificity,
WhatNext24(1827-1836)
IEEE DOI
2410
Computational modeling, Face recognition, Semantics, Training data,
Focusing, Vision and language models, Zero-shot recognition,
Benchmarking
BibRef
Luo, Z.W.[Zi-Wei],
Gustafsson, F.K.[Fredrik K.],
Zhao, Z.[Zheng],
Sjölund, J.[Jens],
Schön, T.B.[Thomas B.],
Photo-Realistic Image Restoration in the Wild with Controlled
Vision-Language Models,
NTIRE24(6641-6651)
IEEE DOI
2410
Degradation, Training, Image synthesis, Pipelines, Transform coding,
Diffusion models, Feature extraction, Image restoration, real-world
BibRef
Huang, C.Q.[Chao-Qin],
Jiang, A.[Aofan],
Feng, J.H.[Jing-Hao],
Zhang, Y.[Ya],
Wang, X.C.[Xin-Chao],
Wang, Y.F.[Yan-Feng],
Adapting Visual-Language Models for Generalizable Anomaly Detection
in Medical Images,
CVPR24(11375-11385)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Image segmentation, Visualization,
Source coding, Semantics, Anomaly Detection, Medical Images
BibRef
Bang, J.[Jihwan],
Ahn, S.[Sumyeong],
Lee, J.G.[Jae-Gil],
Active Prompt Learning in Vision Language Models,
CVPR24(26994-27004)
IEEE DOI Code:
WWW Link.
2410
Learning systems, Adaptation models, Codes, Sampling methods, Labeling
BibRef
Pan, C.[Chenbin],
Yaman, B.[Burhaneddin],
Nesti, T.[Tommaso],
Mallik, A.[Abhirup],
Allievi, A.G.[Alessandro G],
Velipasalar, S.[Senem],
Ren, L.[Liu],
VLP: Vision Language Planning for Autonomous Driving,
CVPR24(14760-14769)
IEEE DOI
2410
Training, Urban areas, Linguistics, Cognition, Robustness, Planning
BibRef
Liang, M.[Mingfu],
Su, J.C.[Jong-Chyi],
Schulter, S.[Samuel],
Garg, S.[Sparsh],
Zhao, S.Y.[Shi-Yu],
Wu, Y.[Ying],
Chandraker, M.[Manmohan],
AIDE: An Automatic Data Engine for Object Detection in Autonomous
Driving,
CVPR24(14695-14706)
IEEE DOI
2410
Training, Costs, Roads, Pipelines, Object detection, Benchmark testing,
Data models, Autonomous Driving, Vision Language Model,
Automatic Data Engine
BibRef
Li, Z.[Zheng],
Li, X.[Xiang],
Fu, X.[Xinyi],
Zhang, X.[Xin],
Wang, W.Q.[Wei-Qiang],
Chen, S.[Shuo],
Yang, J.[Jian],
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models,
CVPR24(26607-26616)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Prediction algorithms, Data models,
Vectors, Probability distribution, knowledge distillation,
zero-shot learning
BibRef
Khandelwal, A.[Anant],
PromptSync: Bridging Domain Gaps in Vision-Language Models through
Class-Aware Prototype Alignment and Discrimination,
ZeroShot24(7819-7828)
IEEE DOI
2410
Adaptation models, Computational modeling, Prototypes,
Contrastive learning, Benchmark testing, Robustness
BibRef
Hirohashi, Y.[Yuki],
Hirakawa, T.[Tsubasa],
Yamashita, T.[Takayoshi],
Fujiyoshi, H.[Hironobu],
Prompt Learning with One-Shot Setting based Feature Space Analysis in
Vision-and-Language Models,
ZeroShot24(7761-7770)
IEEE DOI
2410
Learning systems, Analytical models, Adaptation models,
Image resolution, Accuracy, Vision-and-Language Model, Prompt Learning
BibRef
Zhang, L.[Le],
Awal, R.[Rabiul],
Agrawal, A.[Aishwarya],
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to
Enhance Visio-Linguistic Compositional Understanding,
CVPR24(13774-13784)
IEEE DOI Code:
WWW Link.
2410
Annotations, Semantics, Refining, Text to image,
Contrastive learning, Benchmark testing, Cognition,
contrastive learning
BibRef
Rosasco, A.[Andrea],
Berti, S.[Stefano],
Pasquale, G.[Giulia],
Malafronte, D.[Damiano],
Sato, S.[Shogo],
Segawa, H.[Hiroyuki],
Inada, T.[Tetsugo],
Natale, L.[Lorenzo],
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized
Vision-Language Tasks,
CVPR24(22239-22248)
IEEE DOI Code:
WWW Link.
2410
Measurement, Codes, Image synthesis, Text to image,
Benchmark testing, benchmark, dataset,
compositionality
BibRef
Cheng, S.[Sijie],
Guo, Z.C.[Zhi-Cheng],
Wu, J.[Jinawen],
Fang, K.[Kechen],
Li, P.[Peng],
Liu, H.P.[Hua-Ping],
Liu, Y.[Yang],
EgoThink: Evaluating First-Person Perspective Thinking Capability of
Vision-Language Models,
CVPR24(14291-14302)
IEEE DOI
2410
Bridges, Visualization, Computational modeling, Focusing,
Benchmark testing, Planning, Egocentric, Vision-Language Models, Benchmark
BibRef
Guan, T.R.[Tian-Rui],
Liu, F.[Fuxiao],
Wu, X.[Xiyang],
Xian, R.Q.[Rui-Qi],
Li, Z.X.[Zong-Xia],
Liu, X.Y.[Xiao-Yu],
Wang, X.[Xijun],
Chen, L.[Lichang],
Huang, F.[Furong],
Yacoob, Y.[Yaser],
Manocha, D.[Dinesh],
Zhou, T.Y.[Tian-Yi],
Hallusionbench: An Advanced Diagnostic Suite for Entangled Language
Hallucination and Visual Illusion in Large Vision-Language Models,
CVPR24(14375-14385)
IEEE DOI Code:
WWW Link.
2410
Visualization, Analytical models, Accuracy, Statistical analysis,
Computational modeling, Benchmark testing, Vision language model,
VLM Evaluation
BibRef
Kil, J.[Jihyung],
Song, C.H.[Chan Hee],
Zheng, B.[Boyuan],
Deng, X.[Xiang],
Su, Y.[Yu],
Chao, W.L.[Wei-Lun],
Dual-View Visual Contextualization for Web Navigation,
CVPR24(14445-14454)
IEEE DOI
2410
Visualization, Navigation, Benchmark testing,
AI Agents, Web Agents, Web Navigation, Vision-Language,
Multimodal Agents
BibRef
Guo, Y.Y.[Yang-Yang],
Wang, G.Z.[Guang-Zhi],
Kankanhalli, M.[Mohan],
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation,
CVPR24(15699-15709)
IEEE DOI
2410
Codes, Computational modeling, Perturbation methods, Loading,
Computer architecture, Transformers, Vision-Language,
Low-rank Approximation
BibRef
Cao, J.J.[Jian-Jian],
Ye, P.[Peng],
Li, S.Z.[Sheng-Ze],
Yu, C.[Chong],
Tang, Y.S.[Yan-Song],
Lu, J.W.[Ji-Wen],
Chen, T.[Tao],
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for
Accelerating Vision-Language Transformer,
CVPR24(15710-15719)
IEEE DOI Code:
WWW Link.
2410
Degradation, Adaptation models, Visualization, Costs,
Computational modeling, Semantics, Token Pruning, Model Compress
BibRef
Farina, M.[Matteo],
Mancini, M.[Massimiliano],
Cunegatti, E.[Elia],
Cunegatti, E.[Elia],
Iacca, G.[Giovanni],
Ricci, E.[Elisa],
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning,
CVPR24(16185-16195)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Transfer learning, Neurons,
Benchmark testing, multimodal learning,
sparse neural networks
BibRef
Majumdar, A.[Arjun],
Ajay, A.[Anurag],
Zhang, X.H.[Xiao-Han],
Putta, P.[Pranav],
Yenamandra, S.[Sriram],
Henaff, M.[Mikael],
Silwal, S.[Sneha],
Mcvay, P.[Paul],
Maksymets, O.[Oleksandr],
Arnaud, S.[Sergio],
Yadav, K.[Karmesh],
Li, Q.[Qiyang],
Newman, B.[Ben],
Sharma, M.[Mohit],
Berges, V.[Vincent],
Zhang, S.Q.[Shi-Qi],
Agrawal, P.[Pulkit],
Bisk, Y.[Yonatan],
Batra, D.[Dhruv],
Kalakrishnan, M.[Mrinal],
Meier, F.[Franziska],
Paxton, C.[Chris],
Sax, A.[Alexander],
Rajeswaran, A.[Aravind],
OpenEQA: Embodied Question Answering in the Era of Foundation Models,
CVPR24(16488-16498)
IEEE DOI
2410
Protocols, Natural languages, Semantics, Benchmark testing,
Question answering (information retrieval),
Vision-Language Models
BibRef
Mu, F.Z.[Fang-Zhou],
Mo, S.C.[Si-Cheng],
Li, Y.[Yin],
SnAG: Scalable and Accurate Video Grounding,
CVPR24(18930-18940)
IEEE DOI Code:
WWW Link.
2410
Training, Analytical models, Accuracy, Grounding, Scalability,
Computational modeling, Video understanding,
Vision-Language Learning
BibRef
Gao, Y.[Yuan],
Shi, K.Y.[Kun-Yu],
Zhu, P.[Pengkai],
Belval, E.[Edouard],
Nuriel, O.[Oren],
Appalaraju, S.[Srikar],
Ghadar, S.[Shabnam],
Tu, Z.W.[Zhuo-Wen],
Mahadevan, V.[Vijay],
Soatto, S.[Stefano],
Enhancing Vision-Language Pre-Training with Rich Supervisions,
CVPR24(13480-13491)
IEEE DOI
2410
Location awareness, Visualization, Technological innovation,
Annotations, Pipelines, Web pages, Streaming media,
UI understanding
BibRef
Cao, Y.H.[Yun-Hao],
Ji, K.X.[Kai-Xiang],
Huang, Z.Y.[Zi-Yuan],
Zheng, C.Y.[Chuan-Yang],
Liu, J.J.[Jia-Jia],
Wang, J.[Jian],
Chen, J.D.[Jing-Dong],
Yang, M.[Ming],
Towards Better Vision-Inspired Vision-Language Models,
CVPR24(13537-13547)
IEEE DOI
2410
Training, Bridges, Visualization, Computational modeling,
Poles and towers, Benchmark testing, deep learning, deep prompt
BibRef
Shi, K.Y.[Kun-Yu],
Dong, Q.[Qi],
Goncalves, L.[Luis],
Tu, Z.W.[Zhuo-Wen],
Soatto, S.[Stefano],
Non-autoregressive Sequence-to-Sequence Vision-Language Models,
CVPR24(13603-13612)
IEEE DOI
2410
Visualization, Technological innovation, Computational modeling,
Predictive models, Drives, Encoding, Non-autoregressive, CTC,
vision language models
BibRef
Man, Y.Z.[Yun-Ze],
Gui, L.Y.[Liang-Yan],
Wang, Y.X.[Yu-Xiong],
Situational Awareness Matters in 3D Vision Language Reasoning,
CVPR24(13678-13688)
IEEE DOI
2410
Visualization, Solid modeling, Estimation, Performance gain,
Cognition, Vision-Language, Multi-modal, 3D Reasoning
BibRef
Zheng, C.H.[Chen-Hao],
Zhang, J.[Jieyu],
Kembhavi, A.[Aniruddha],
Krishna, R.[Ranjay],
Iterated Learning Improves Compositionality in Large Vision-Language
Models,
CVPR24(13785-13795)
IEEE DOI
2410
Training, Training data, Games, Contrastive learning,
Benchmark testing, Performance gain, Cognitive science
BibRef
Leng, S.[Sicong],
Zhang, H.[Hang],
Chen, G.Z.[Guan-Zheng],
Li, X.[Xin],
Lu, S.J.[Shi-Jian],
Miao, C.Y.[Chun-Yan],
Bing, L.[Lidong],
Mitigating Object Hallucinations in Large Vision-Language Models
through Visual Contrastive Decoding,
CVPR24(13872-13882)
IEEE DOI
2410
Training, Visualization, Accuracy, Computational modeling,
Benchmark testing, Decoding, Multimodality,
Vision and Language
BibRef
Slyman, E.[Eric],
Lee, S.[Stefan],
Cohen, S.[Scott],
Kafle, K.[Kushal],
FairDeDup: Detecting and Mitigating Vision-Language Fairness
Disparities in Semantic Dataset Deduplication,
CVPR24(13905-13916)
IEEE DOI
2410
Training, Measurement, Costs, Semantics, Skin, Data models, multimodal,
fairness, vision-language, foundation models, human-centered ai, deduplication
BibRef
Song, C.H.[Chull Hwan],
Hwang, T.[Taebaek],
Yoon, J.Y.[Joo-Young],
Choi, S.[Shunghyun],
Gu, Y.H.[Yeong Hyeon],
SyncMask: Synchronized Attentional Masking for Fashion-centric
Vision-Language Pretraining,
CVPR24(13948-13957)
IEEE DOI
2410
Training, Visualization, Image segmentation, Image resolution,
Refining, Contrastive learning
BibRef
Pramanick, S.[Shraman],
Han, G.X.[Guang-Xing],
Hou, R.[Rui],
Nag, S.[Sayan],
Lim, S.N.[Ser-Nam],
Ballas, N.[Nicolas],
Wang, Q.F.[Qi-Fan],
Chellappa, R.[Rama],
Almahairi, A.[Amjad],
Jack of All Tasks, Master of Many: Designing General-purpose
Coarse-to-Fine Vision-Language Model,
CVPR24(14076-14088)
IEEE DOI Code:
WWW Link.
2410
Image segmentation, Visualization, Image coding, Filters, Grounding,
Machine vision, Visual systems
BibRef
Zeng, Y.[Yunan],
Huang, Y.[Yan],
Zhang, J.J.[Jin-Jin],
Jie, Z.Q.[Ze-Qun],
Chai, Z.H.[Zhen-Hua],
Wang, L.[Liang],
Investigating Compositional Challenges in Vision-Language Models for
Visual Grounding,
CVPR24(14141-14151)
IEEE DOI
2410
Visualization, Codes, Grounding, Annotations, Pipelines, Benchmark testing
BibRef
Karmanov, A.[Adilbek],
Guan, D.[Dayan],
Lu, S.J.[Shi-Jian],
El Saddik, A.[Abdulmotaleb],
Xing, E.[Eric],
Efficient Test-Time Adaptation of Vision-Language Models,
CVPR24(14162-14171)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Computational modeling, Noise,
Predictive models, Benchmark testing
BibRef
Bulat, A.[Adrian],
Ouali, Y.[Yassine],
Tzimiropoulos, G.[Georgios],
FFF: Fixing Flawed Foundations in contrastive pre-training results in
very strong Vision-Language models,
CVPR24(14172-14182)
IEEE DOI
2410
Training, Image recognition, Noise, Image retrieval,
Field-flow fractionation
BibRef
Sameni, S.[Sepehr],
Kafle, K.[Kushal],
Tan, H.[Hao],
Jenni, S.[Simon],
Building Vision-Language Models on Solid Foundations with Masked
Distillation,
CVPR24(14216-14226)
IEEE DOI
2410
Training, Solid modeling, Visualization, Computational modeling,
Semantic segmentation, Buildings, LLM
BibRef
Li, R.J.[Rong-Jie],
Wu, Y.[Yu],
He, X.M.[Xu-Ming],
Learning by Correction: Efficient Tuning Task for Zero-Shot
Generative Vision-Language Reasoning,
CVPR24(13428-13437)
IEEE DOI
2410
Training, Visualization, Costs, Computational modeling, Cognition,
Question answering (information retrieval),
Vision-Language
BibRef
Peng, W.[Wujian],
Xie, S.C.[Si-Cheng],
You, Z.[Zuyao],
Lan, S.Y.[Shi-Yi],
Wu, Z.[Zuxuan],
Synthesize, Diagnose, and Optimize: Towards Fine-Grained
Vision-Language Understanding,
CVPR24(13279-13288)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Computational modeling, Pipelines, Benchmark testing,
Linguistics, Vision language model, Fine-grained understdanding
BibRef
Zhao, Y.[Yue],
Zhao, L.[Long],
Zhou, X.Y.[Xing-Yi],
Wu, J.L.[Jia-Lin],
Chu, C.T.[Chun-Te],
Miao, H.[Hui],
Schroff, F.[Florian],
Adam, H.[Hartwig],
Liu, T.[Ting],
Gong, B.Q.[Bo-Qing],
Krähenbühl, P.[Philipp],
Yuan, L.Z.[Liang-Zhe],
Distilling Vision-Language Models on Millions of Videos,
CVPR24(13106-13116)
IEEE DOI
2410
Adaptation models, Computational modeling, Benchmark testing,
Data models, Text to video
BibRef
Chen, J.[Jieneng],
Yu, Q.H.[Qi-Hang],
Shen, X.H.[Xiao-Hui],
Yuille, A.[Alan],
Chen, L.C.[Liang-Chieh],
ViTamin: Designing Scalable Vision Models in the Vision-Language Era,
CVPR24(12954-12966)
IEEE DOI
2410
Training, Image segmentation, Accuracy, Protocols, Image coding, Scalability,
Computational modeling, Vision-Language Models, Architectural Design
BibRef
Liu, S.H.[Shi-Hong],
Yu, S.[Samuel],
Lin, Z.Q.[Zhi-Qiu],
Pathak, D.[Deepak],
Ramanan, D.[Deva],
Language Models as Black-Box Optimizers for Vision-Language Models,
CVPR24(12687-12697)
IEEE DOI
2410
Computational modeling, Natural languages, Closed box,
Text to image, Human in the loop, Data models,
generative models
BibRef
Howard, P.[Phillip],
Madasu, A.[Avinash],
Le, T.[Tiep],
Moreno, G.L.[Gustavo Lujan],
Bhiwandiwalla, A.[Anahita],
Lal, V.[Vasudev],
SocialCounterfactuals: Probing and Mitigating Intersectional Social
Biases in Vision-Language Models with Counterfactual Examples,
CVPR24(11975-11985)
IEEE DOI
2410
Training, Prevention and mitigation, Text to image,
Diffusion models, Fairness, social bias,
counterfactuals
BibRef
Jiang, Y.[Yankai],
Huang, Z.Z.[Zhong-Zhen],
Zhang, R.Z.[Rong-Zhao],
Zhang, X.F.[Xiao-Fan],
Zhang, S.T.[Shao-Ting],
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and
Self-Prompting,
CVPR24(11386-11397)
IEEE DOI
2410
Training, Visualization, Pathology, Image segmentation,
Image analysis, Computational modeling, Vision-Language Model
BibRef
Kim, Y.[Younghyun],
Mo, S.[Sangwoo],
Kim, M.[Minkyu],
Lee, K.[Kyungmin],
Lee, J.[Jaeho],
Shin, J.[Jinwoo],
Discovering and Mitigating Visual Biases Through Keyword Explanation,
CVPR24(11082-11092)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Image recognition, Computational modeling,
Training data, Flowering plants, bias and fairness, explainable AI,
vision-language model
BibRef
Li, R.[Rui],
Fischer, T.[Tobias],
Segu, M.[Mattia],
Pollefeys, M.[Marc],
Van Gool, L.J.[Luc J.],
Tombari, F.[Federico],
Know Your Neighbors: Improving Single-View Reconstruction via Spatial
Vision-Language Reasoning,
CVPR24(9848-9858)
IEEE DOI Code:
WWW Link.
2410
Geometry, Visualization, Attention mechanisms, Shape, Semantics,
radiance field, vision-language model, spatial context, spatial attention
BibRef
Zeng, Z.[Ziyao],
Wang, D.[Daniel],
Yang, F.Y.[Feng-Yu],
Park, H.[Hyoungseob],
Soatto, S.[Stefano],
Lao, D.[Dong],
Wong, A.[Alex],
WorDepth: Variational Language Prior for Monocular Depth Estimation,
CVPR24(9708-9719)
IEEE DOI Code:
WWW Link.
2410
Measurement, Codes, Estimation, Encoding,
Monocular Depth Estimation, Vision-Language Model, Variational Model
BibRef
Hu, Y.S.[Yu-Shi],
Stretcu, O.[Otilia],
Lu, C.T.[Chun-Ta],
Viswanathan, K.[Krishnamurthy],
Hata, K.[Kenji],
Luo, E.[Enming],
Krishna, R.[Ranjay],
Fuxman, A.[Ariel],
Visual Program Distillation: Distilling Tools and Programmatic
Reasoning into Vision-Language Models,
CVPR24(9590-9601)
IEEE DOI
2410
Visualization, Adaptation models, Computational modeling,
Instruments, Loading, Music, Cognition, vision-language model,
tools
BibRef
Khan, Z.[Zaid],
Fu, Y.[Yun],
Consistency and Uncertainty: Identifying Unreliable Responses From
Black-Box Vision-Language Models for Selective Visual Question
Answering,
CVPR24(10854-10863)
IEEE DOI
2410
Visualization, Uncertainty, Computational modeling, Closed box,
Predictive models, Question answering (information retrieval),
trustworthy ml
BibRef
Gu, T.C.[Tian-Cheng],
Yang, K.C.[Kai-Cheng],
Liu, D.[Dongnan],
Cai, W.D.[Wei-Dong],
LaPA: Latent Prompt Assist Model for Medical Visual Question
Answering,
DEF-AI-MIA24(4971-4980)
IEEE DOI Code:
WWW Link.
2410
Visualization, Accuracy, Medical services, Predictive models,
Feature extraction, Question answering (information retrieval), Data mining
BibRef
Silva-Rodríguez, J.[Julio],
Hajimiri, S.[Sina],
Ben Ayed, I.[Ismail],
Dolz, J.[Jose],
A Closer Look at the Few-Shot Adaptation of Large Vision-Language
Models,
CVPR24(23681-23690)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Codes, Computational modeling,
Transfer learning, Probes
BibRef
Zanella, M.[Maxime],
Ben Ayed, I.[Ismail],
Low-Rank Few-Shot Adaptation of Vision-Language Models,
Prompting24(1593-1603)
IEEE DOI
2410
Training, Adaptation models, Design methodology,
Few shot learning, Vision-Language, few-shot,
adapter
BibRef
Wang, W.X.[Wen-Xuan],
He, X.J.[Xing-Jian],
Zhang, Y.[Yisi],
Guo, L.T.[Long-Teng],
Shen, J.C.[Jia-Chen],
Li, J.Y.[Jiang-Yun],
Liu, J.[Jing],
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring
Image Segmentation,
MultMed(26), 2024, pp. 6906-6916.
IEEE DOI
2405
Image segmentation, Visualization, Task analysis, Correlation,
Feature extraction, Transformers, Semantics, vision and language
BibRef
Sahin, U.[Ugur],
Li, H.[Hang],
Khan, Q.[Qadeer],
Cremers, D.[Daniel],
Tresp, V.[Volker],
Enhancing Multimodal Compositional Reasoning of Visual Language
Models with Generative Negative Mining,
WACV24(5551-5561)
IEEE DOI Code:
HTML Version.
2404
Training, Visualization, Codes, Pipelines, Self-supervised learning,
Cognition, Algorithms, Vision + language and/or other modalities
BibRef
Yang, C.[Cheng],
Xu, R.[Rui],
Guo, Y.[Ye],
Huang, P.X.[Pei-Xiang],
Chen, Y.[Yiru],
Ding, W.[Wenkui],
Wang, Z.Y.[Zhong-Yuan],
Zhou, H.[Hong],
Improving Vision-and-Language Reasoning via Spatial Relations
Modeling,
WACV24(758-767)
IEEE DOI
2404
Visualization, Analytical models, Graphical models,
Statistical analysis, Computational modeling, Excavation,
Vision + language and/or other modalities
BibRef
Shen, S.[Sheng],
Yang, S.[Shijia],
Zhang, T.J.[Tian-Jun],
Zhai, B.[Bohan],
Gonzalez, J.E.[Joseph E.],
Keutzer, K.[Kurt],
Darrell, T.J.[Trevor J.],
Multitask Vision-Language Prompt Tuning,
WACV24(5644-5655)
IEEE DOI
2404
Learning systems, Visualization, Adaptation models,
Benchmark testing, Vectors, Task analysis, Algorithms,
Vision + language and/or other modalities
BibRef
Zhang, G.[Gengyuan],
Zhang, Y.R.[Yu-Rui],
Zhang, K.[Kerui],
Tresp, V.[Volker],
Can Vision-Language Models be a Good Guesser? Exploring VLMs for
Times and Location Reasoning,
WACV24(625-634)
IEEE DOI Code:
WWW Link.
2404
Visualization, Computational modeling, Feature extraction,
Cognition, Task analysis, Commonsense reasoning, Algorithms,
Vision + language and/or other modalities
BibRef
Feinglass, J.[Joshua],
Yang, Y.Z.[Ye-Zhou],
Towards Addressing the Misalignment of Object Proposal Evaluation for
Vision-Language Tasks via Semantic Grounding,
WACV24(4385-4395)
IEEE DOI
2404
Measurement, Visualization, Protocols, Annotations, Grounding,
Semantics, Question answering (information retrieval),
Image recognition and understanding
BibRef
Nadeem, A.[Asmar],
Hilton, A.[Adrian],
Dawes, R.[Robert],
Thomas, G.[Graham],
Mustafa, A.[Armin],
CAD: Contextual Multi-modal Alignment for Dynamic AVQA,
WACV24(7236-7248)
IEEE DOI
2404
Visualization, Semantics, Decision making, Robustness,
Question answering (information retrieval), Complexity theory,
Smartphones / end user devices
BibRef
Wu, W.[Wenyi],
Li, Q.[Qi],
Zhong, W.L.[Wen-Liang],
Huang, J.Z.[Jun-Zhou],
MIVC: Multiple Instance Visual Component for Visual-Language Models,
WACV24(8102-8111)
IEEE DOI
2404
Visualization, Computational modeling, Neural networks,
Question answering (information retrieval),
Image recognition and understanding
BibRef
Ganz, R.[Roy],
Nuriel, O.[Oren],
Aberdam, A.[Aviad],
Kittenplon, Y.[Yair],
Mazor, S.[Shai],
Litman, R.[Ron],
Towards Models that Can See and Read,
ICCV23(21661-21671)
IEEE DOI
2401
BibRef
Zhang, H.[Heng],
Liu, D.[Daqing],
Lv, Z.[Zezhong],
Su, B.[Bing],
Tao, D.C.[Da-Cheng],
Exploring Temporal Concurrency for Video-Language Representation
Learning,
ICCV23(15522-15532)
IEEE DOI Code:
WWW Link.
2401
BibRef
Shukor, M.[Mustafa],
Dancette, C.[Corentin],
Cord, M.[Matthieu],
eP-ALM: Efficient Perceptual Augmentation of Language Models,
ICCV23(21999-22012)
IEEE DOI Code:
WWW Link.
2401
BibRef
Schulter, S.[Samuel],
Kumar, B.G.V.[B.G. Vijay],
Suh, Y.M.[Yu-Min],
Dafnis, K.M.[Konstantinos M.],
Zhang, Z.X.[Zhi-Xing],
Zhao, S.Y.[Shi-Yu],
Metaxas, D.N.[Dimitris N.],
OmniLabel: A Challenging Benchmark for Language-Based Object
Detection,
ICCV23(11919-11928)
IEEE DOI Code:
WWW Link.
2401
BibRef
Chen, Z.L.[Zi-Liang],
Huang, X.[Xin],
Guan, Q.L.[Quan-Long],
Lin, L.[Liang],
Luo, W.Q.[Wei-Qi],
A Retrospect to Multi-prompt Learning across Vision and Language,
ICCV23(22133-22144)
IEEE DOI
2401
BibRef
Derakhshani, M.M.[Mohammad Mahdi],
Sanchez, E.[Enrique],
Bulat, A.[Adrian],
da Costa, V.G.T.[Victor Guilherme Turrisi],
Snoek, C.G.M.[Cees G. M.],
Tzimiropoulos, G.[Georgios],
Martinez, B.[Brais],
Bayesian Prompt Learning for Image-Language Model Generalization,
ICCV23(15191-15200)
IEEE DOI Code:
WWW Link.
2401
BibRef
Cascante-Bonilla, P.[Paola],
Shehada, K.[Khaled],
Smith, J.S.[James Seale],
Doveh, S.[Sivan],
Kim, D.H.[Dong-Hyun],
Panda, R.[Rameswar],
Varol, G.[Gül],
Oliva, A.[Aude],
Ordonez, V.[Vicente],
Feris, R.S.[Rogerio S.],
Karlinsky, L.[Leonid],
Going Beyond Nouns With Vision & Language Models Using Synthetic
Data,
ICCV23(20098-20108)
IEEE DOI
2401
BibRef
Upadhyay, U.[Uddeshya],
Karthik, S.[Shyamgopal],
Mancini, M.[Massimiliano],
Akata, Z.[Zeynep],
ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models,
ICCV23(1899-1910)
IEEE DOI Code:
WWW Link.
2401
BibRef
Chen, Z.H.[Zhi-Hong],
Diao, S.Z.[Shi-Zhe],
Wang, B.[Benyou],
Li, G.B.[Guan-Bin],
Wan, X.[Xiang],
Towards Unifying Medical Vision-and-Language Pre-training via Soft
Prompts,
ICCV23(23346-23356)
IEEE DOI
2401
BibRef
Bitton-Guetta, N.[Nitzan],
Bitton, Y.[Yonatan],
Hessel, J.[Jack],
Schmidt, L.[Ludwig],
Elovici, Y.[Yuval],
Stanovsky, G.[Gabriel],
Schwartz, R.[Roy],
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
Synthetic and Compositional Images,
ICCV23(2616-2627)
IEEE DOI
2401
BibRef
Hu, Z.Y.[Zi-Yuan],
Li, Y.[Yanyang],
Lyu, M.R.[Michael R.],
Wang, L.W.[Li-Wei],
VL-PET: Vision-and-Language Parameter-Efficient Tuning via
Granularity Control,
ICCV23(2998-3008)
IEEE DOI Code:
WWW Link.
2401
BibRef
Slyman, E.[Eric],
Kahng, M.[Minsuk],
Lee, S.[Stefan],
VLSlice: Interactive Vision-and-Language Slice Discovery,
ICCV23(15245-15255)
IEEE DOI
2401
BibRef
Najibi, M.[Mahyar],
Ji, J.W.[Jing-Wei],
Zhou, Y.[Yin],
Qi, C.R.[Charles R.],
Yan, X.C.[Xin-Chen],
Ettinger, S.[Scott],
Anguelov, D.[Dragomir],
Unsupervised 3D Perception with 2D Vision-Language Distillation for
Autonomous Driving,
ICCV23(8568-8578)
IEEE DOI
2401
BibRef
Zheng, K.[Kecheng],
Wu, W.[Wei],
Feng, R.[Ruili],
Zhu, K.[Kai],
Liu, J.W.[Jia-Wei],
Zhao, D.L.[De-Li],
Zha, Z.J.[Zheng-Jun],
Chen, W.[Wei],
Shen, Y.J.[Yu-Jun],
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models,
ICCV23(11629-11639)
IEEE DOI
2401
BibRef
Wang, T.[Tan],
Lin, K.[Kevin],
Li, L.J.[Lin-Jie],
Lin, C.C.[Chung-Ching],
Yang, Z.Y.[Zheng-Yuan],
Zhang, H.W.[Han-Wang],
Liu, Z.C.[Zi-Cheng],
Wang, L.J.[Li-Juan],
Equivariant Similarity for Vision-Language Foundation Models,
ICCV23(11964-11974)
IEEE DOI
2401
BibRef
Xu, H.[Hu],
Xie, S.[Saining],
Huang, P.Y.[Po-Yao],
Yu, L.C.[Li-Cheng],
Howes, R.[Russell],
Ghosh, G.[Gargi],
Zettlemoyer, L.[Luke],
Feichtenhofer, C.[Christoph],
CiT: Curation in Training for Effective Vision-Language Data,
ICCV23(15134-15143)
IEEE DOI
2401
BibRef
Trager, M.[Matthew],
Perera, P.[Pramuditha],
Zancato, L.[Luca],
Achille, A.[Alessandro],
Bhatia, P.[Parminder],
Soatto, S.[Stefano],
Linear Spaces of Meanings: Compositional Structures in
Vision-Language Models,
ICCV23(15349-15358)
IEEE DOI
2401
BibRef
Chen, Y.S.[Yi-Syuan],
Song, Y.Z.[Yun-Zhu],
Yeo, C.Y.[Cheng Yu],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks,
ICCV23(15384-15396)
IEEE DOI
2401
BibRef
Wu, C.E.[Cheng-En],
Tian, Y.[Yu],
Yu, H.C.[Hai-Chao],
Wang, H.[Heng],
Morgado, P.[Pedro],
Hu, Y.H.[Yu Hen],
Yang, L.J.[Lin-Jie],
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy
Labels?,
ICCV23(15442-15451)
IEEE DOI Code:
WWW Link.
2401
BibRef
Ouali, Y.[Yassine],
Bulat, A.[Adrian],
Matinez, B.[Brais],
Tzimiropoulos, G.[Georgios],
Black Box Few-Shot Adaptation for Vision-Language models,
ICCV23(15488-15500)
IEEE DOI Code:
WWW Link.
2401
BibRef
Kan, B.[Baoshuo],
Wang, T.[Teng],
Lu, W.P.[Wen-Peng],
Zhen, X.T.[Xian-Tong],
Guan, W.[Weili],
Zheng, F.[Feng],
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language
Models,
ICCV23(15624-15634)
IEEE DOI
2401
BibRef
Zhai, J.T.[Jiang-Tian],
Zhang, Q.[Qi],
Wu, T.[Tong],
Chen, X.Y.[Xing-Yu],
Liu, J.J.[Jiang-Jiang],
Cheng, M.M.[Ming-Ming],
SLAN: Self-Locator Aided Network for Vision-Language Understanding,
ICCV23(21892-21901)
IEEE DOI Code:
WWW Link.
2401
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Yuan, J.[Junkun],
Tan, Z.C.[Zi-Chang],
Liu, J.J.[Jiang-Jiang],
Zhou, L.P.[Lu-Ping],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models,
ICCV23(21902-21912)
IEEE DOI
2401
BibRef
Cho, E.[Eulrang],
Kim, J.[Jooyeon],
Kim, H.W.J.[Hyun-Woo J.],
Distribution-Aware Prompt Tuning for Vision-Language Models,
ICCV23(21947-21956)
IEEE DOI Code:
WWW Link.
2401
BibRef
Varma, M.[Maya],
Delbrouck, J.B.[Jean-Benoit],
Hooper, S.[Sarah],
Chaudhari, A.[Akshay],
Langlotz, C.[Curtis],
ViLLA: Fine-Grained Vision-Language Representation Learning from
Real-World Data,
ICCV23(22168-22178)
IEEE DOI
2401
BibRef
Zhu, H.G.[Hong-Guang],
Wei, Y.C.[Yun-Chao],
Liang, X.D.[Xiao-Dan],
Zhang, C.J.[Chun-Jie],
Zhao, Y.[Yao],
CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation,
ICCV23(22200-22210)
IEEE DOI Code:
WWW Link.
2401
BibRef
Salin, E.[Emmanuelle],
Ayache, S.[Stéphane],
Favre, B.[Benoit],
Towards an Exhaustive Evaluation of Vision-Language Foundation Models,
MMFM23(339-352)
IEEE DOI
2401
BibRef
Hu, Z.[Zhizhang],
Zhu, X.L.[Xin-Liang],
Tran, S.[Son],
Vidal, R.[René],
Dhua, A.[Arnab],
ProVLA: Compositional Image Search with Progressive Vision-Language
Alignment and Multimodal Fusion,
CLVL23(2764-2769)
IEEE DOI
2401
BibRef
Hall, M.[Melissa],
Gustafson, L.[Laura],
Adcock, A.[Aaron],
Misra, I.[Ishan],
Ross, C.[Candace],
Vision-Language Models Performing Zero-Shot Tasks Exhibit Disparities
Between Gender Groups,
CLVL23(2770-2777)
IEEE DOI
2401
BibRef
Agnolucci, L.[Lorenzo],
Baldrati, A.[Alberto],
Todino, F.[Francesco],
Becattini, F.[Federico],
Bertini, M.[Marco],
del Bimbo, A.[Alberto],
ECO: Ensembling Context Optimization for Vision-Language Models,
CLVL23(2803-2807)
IEEE DOI
2401
BibRef
Palit, V.[Vedant],
Pandey, R.[Rohan],
Arora, A.[Aryaman],
Liang, P.P.[Paul Pu],
Towards Vision-Language Mechanistic Interpretability: A Causal
Tracing Tool for BLIP,
CLVL23(2848-2853)
IEEE DOI
2401
BibRef
Sammani, F.[Fawaz],
Deligiannis, N.[Nikos],
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language
Tasks,
VLAR23(4636-4641)
IEEE DOI
2401
BibRef
Lu, D.[Dong],
Wang, Z.Q.[Zhi-Qiang],
Wang, T.[Teng],
Guan, W.[Weili],
Gao, H.[Hongchang],
Zheng, F.[Feng],
Set-level Guidance Attack: Boosting Adversarial Transferability of
Vision-Language Pre-training Models,
ICCV23(102-111)
IEEE DOI Code:
WWW Link.
2401
BibRef
Lee, D.J.[Dong-Jun],
Song, S.[Seokwon],
Suh, J.[Jihee],
Choi, J.[Joonmyeong],
Lee, S.[Sanghyeok],
Kim, H.W.J.[Hyun-Woo J.],
Read-only Prompt Optimization for Vision-Language Few-shot Learning,
ICCV23(1401-1411)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, X.[Xuanlin],
Fang, Y.H.[Yun-Hao],
Liu, M.H.[Ming-Hua],
Ling, Z.[Zhan],
Tu, Z.W.[Zhuo-Wen],
Su, H.[Hao],
Distilling Large Vision-Language Model with Out-of-Distribution
Generalizability,
ICCV23(2492-2503)
IEEE DOI
2401
BibRef
Li, J.C.[Jun-Cheng],
Gao, M.[Minghe],
Wei, L.[Longhui],
Tang, S.L.[Si-Liang],
Zhang, W.Q.[Wen-Qiao],
Li, M.[Mengze],
Ji, W.[Wei],
Tian, Q.[Qi],
Chua, T.S.[Tat-Seng],
Zhuang, Y.T.[Yue-Ting],
Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models,
ICCV23(2551-2562)
IEEE DOI
2401
BibRef
Bi, J.Y.[Jun-Yu],
Cheng, D.[Daixuan],
Yao, P.[Ping],
Pang, B.[Bochen],
Zhan, Y.F.[Yue-Feng],
Yang, C.G.[Chuan-Guang],
Wang, Y.J.[Yu-Jing],
Sun, H.[Hao],
Deng, W.W.[Wei-Wei],
Zhang, Q.[Qi],
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and
Instance-Level Matching,
ICCV23(2584-2593)
IEEE DOI
2401
BibRef
Udandarao, V.[Vishaal],
Gupta, A.[Ankush],
Albanie, S.[Samuel],
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models,
ICCV23(2725-2736)
IEEE DOI Code:
WWW Link.
2401
BibRef
Jiang, C.[Chaoya],
Xu, H.Y.[Hai-Yang],
Ye, W.[Wei],
Ye, Q.H.[Qing-Hao],
Li, C.L.[Chen-Liang],
Yan, M.[Ming],
Bi, B.[Bin],
Zhang, S.K.[Shi-Kun],
Huang, F.[Fei],
Huang, S.F.[Song-Fang],
BUS: Efficient and Effective Vision-language Pre-training with
Bottom-Up Patch Summarization,
ICCV23(2888-2898)
IEEE DOI
2401
BibRef
Shi, C.[Cheng],
Yang, S.[Sibei],
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for
Vision-Language Models,
ICCV23(2920-2929)
IEEE DOI
2401
BibRef
Wang, A.J.P.[Alex Jin-Peng],
Lin, K.Q.[Kevin Qinghong],
Zhang, D.J.H.[David Jun-Hao],
Lei, S.W.X.[Stan Wei-Xian],
Shou, M.Z.[Mike Zheng],
Too Large; Data Reduction for Vision-Language Pre-Training,
ICCV23(3124-3134)
IEEE DOI
2401
BibRef
Wang, W.H.[Wei-Han],
Yang, Z.[Zhen],
Xu, B.[Bin],
Li, J.[Juanzi],
Sun, Y.[Yankui],
ViLTA: Enhancing Vision-Language Pre-training through Textual
Augmentation,
ICCV23(3135-3146)
IEEE DOI
2401
BibRef
Wang, T.J.J.[Tzu-Jui Julius],
Laaksonen, J.[Jorma],
Langer, T.[Tomas],
Arponen, H.[Heikki],
Bishop, T.E.[Tom E.],
Learning by Hallucinating:
Vision-Language Pre-training with Weak Supervision,
WACV23(1073-1083)
IEEE DOI
2302
Visualization, Vocabulary, Computational modeling, Detectors,
Benchmark testing, Transformers, un-supervised learning
BibRef
Boecking, B.[Benedikt],
Usuyama, N.[Naoto],
Bannur, S.[Shruthi],
Castro, D.C.[Daniel C.],
Schwaighofer, A.[Anton],
Hyland, S.[Stephanie],
Wetscherek, M.[Maria],
Naumann, T.[Tristan],
Nori, A.[Aditya],
Alvarez-Valle, J.[Javier],
Poon, H.[Hoifung],
Oktay, O.[Ozan],
Making the Most of Text Semantics to Improve Biomedical Vision-Language
Processing,
ECCV22(XXXVI:1-21).
Springer DOI
2211
BibRef
Cui, Q.[Quan],
Zhou, B.[Boyan],
Guo, Y.[Yu],
Yin, W.D.[Wei-Dong],
Wu, H.[Hao],
Yoshie, O.[Osamu],
Chen, Y.[Yubo],
Contrastive Vision-Language Pre-training with Limited Resources,
ECCV22(XXXVI:236-253).
Springer DOI
2211
BibRef
Walmer, M.[Matthew],
Sikka, K.[Karan],
Sur, I.[Indranil],
Shrivastava, A.[Abhinav],
Jha, S.[Susmit],
Dual-Key Multimodal Backdoors for Visual Question Answering,
CVPR22(15354-15364)
IEEE DOI
2210
Visualization, Training data, Detectors, Feature extraction,
Question answering (information retrieval),
Vision + language
BibRef
Ding, Y.[Yang],
Yu, J.[Jing],
Liu, B.[Bang],
Hu, Y.[Yue],
Cui, M.X.[Ming-Xin],
Wu, Q.[Qi],
MuKEA: Multimodal Knowledge Extraction and Accumulation for
Knowledge-based Visual Question Answering,
CVPR22(5079-5088)
IEEE DOI
2210
Bridges, Visualization, Codes, Computational modeling,
Knowledge based systems, Semantics, Vision + language
BibRef
Gao, F.[Feng],
Ping, Q.[Qing],
Thattai, G.[Govind],
Reganti, A.[Aishwarya],
Wu, Y.N.[Ying Nian],
Natarajan, P.[Prem],
Transform-Retrieve-Generate: Natural Language-Centric
Outside-Knowledge Visual Question Answering,
CVPR22(5057-5067)
IEEE DOI
2210
Knowledge engineering, Visualization, Solid modeling,
Knowledge based systems, Natural languages, Transforms,
Visual reasoning
BibRef
Aflalo, E.[Estelle],
Du, M.[Meng],
Tseng, S.Y.[Shao-Yen],
Liu, Y.F.[Yong-Fei],
Wu, C.[Chenfei],
Duan, N.[Nan],
Lal, V.[Vasudev],
VL-InterpreT: An Interactive Visualization Tool for Interpreting
Vision-Language Transformers,
CVPR22(21374-21383)
IEEE DOI
2210
Heating systems, Visualization, Machine vision,
Computational modeling, Transformers, Question answering (information retrieval)
BibRef
Hu, X.W.[Xiao-Wei],
Gan, Z.[Zhe],
Wang, J.F.[Jian-Feng],
Yang, Z.Y.[Zheng-Yuan],
Liu, Z.C.[Zi-Cheng],
Lu, Y.[Yumao],
Wang, L.J.[Li-Juan],
Scaling Up Vision-Language Pretraining for Image Captioning,
CVPR22(17959-17968)
IEEE DOI
2210
Training, Visualization, Computational modeling, Training data,
Benchmark testing, Transformers, Feature extraction, Vision + language
BibRef
Zhang, P.C.[Peng-Chuan],
Li, X.J.[Xiu-Jun],
Hu, X.W.[Xiao-Wei],
Yang, J.W.[Jian-Wei],
Zhang, L.[Lei],
Wang, L.J.[Li-Juan],
Choi, Y.J.[Ye-Jin],
Gao, J.F.[Jian-Feng],
VinVL: Revisiting Visual Representations in Vision-Language Models,
CVPR21(5575-5584)
IEEE DOI
2111
Training, Visualization, Computational modeling, Object detection,
Benchmark testing, Feature extraction, Transformers
BibRef
Li, Z.W.[Zhuo-Wan],
Stengel-Eskin, E.[Elias],
Zhang, Y.X.[Yi-Xiao],
Xie, C.[Cihang],
Tran, Q.[Quan],
van Durme, B.[Benjamin],
Yuille, A.L.[Alan L.],
Calibrating Concepts and Operations:
Towards Symbolic Reasoning on Real Images,
ICCV21(14890-14899)
IEEE DOI
2203
Visualization, Analytical models, Codes, Computational modeling,
Cognition, Data models, Vision + language
BibRef
Yang, X.[Xu],
Zhang, H.W.[Han-Wang],
Qi, G.J.[Guo-Jun],
Cai, J.F.[Jian-Fei],
Causal Attention for Vision-Language Tasks,
CVPR21(9842-9852)
IEEE DOI
2111
Correlation, Codes, Computational modeling,
Training data, Transformers, Data models
BibRef
Stefanini, M.[Matteo],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
A Novel Attention-based Aggregation Function to Combine Vision and
Language,
ICPR21(1212-1219)
IEEE DOI
2105
Deep learning, Visualization, Image retrieval,
Transforms, Knowledge discovery
BibRef
Jain, V.,
Lodhavia, J.,
Automatic Question Tagging using k-Nearest Neighbors and Random
Forest,
ISCV20(1-4)
IEEE DOI
2011
learning (artificial intelligence),
question answering (information retrieval),
Natural Language Processing
BibRef
Zheng, W.B.[Wen-Bo],
Yan, L.[Lan],
Gou, C.[Chao],
Wang, F.Y.[Fei-Yue],
Webly Supervised Knowledge Embedding Model for Visual Reasoning,
CVPR20(12442-12451)
IEEE DOI
2008
Visual reasoning between visual image and natural language description.
Visualization, Cognition, Knowledge based systems, Task analysis,
Knowledge engineering, Modulation, Robustness
BibRef
Nguyen, D.K.[Duy-Kien],
Okatani, T.[Takayuki],
Multi-Task Learning of Hierarchical Vision-Language Representation,
CVPR19(10484-10493).
IEEE DOI
2002
BibRef
Gupta, T.[Tanmay],
Shih, K.J.[Kevin J.],
Singh, S.[Saurabh],
Hoiem, D.[Derek],
Aligned Image-Word Representations Improve Inductive Transfer Across
Vision-Language Tasks,
ICCV17(4223-4232)
IEEE DOI
1802
data visualisation, image recognition,
learning (artificial intelligence),
Visualization
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Video Question Answering, Movies, Spatio-Temporal, Query, VQA .