Tamaazousti, Y.[Youssef],
Le Borgne, H.[Hervé],
Popescu, A.[Adrian],
Gadeski, E.[Etienne],
Ginsca, A.[Alexandru],
Hudelot, C.[Céline],
Vision-language integration using constrained local semantic features,
CVIU(163), No. 1, 2017, pp. 41-57.
Elsevier DOI
1712
Image classification
BibRef
Gouthaman, K.V.,
Nambiar, A.[Athira],
Srinivas, K.S.[Kancheti Sai],
Mittal, A.[Anurag],
Linguistically-aware attention for reducing the semantic gap in
vision-language tasks,
PR(112), 2021, pp. 107812.
Elsevier DOI
2102
Attention models, Visual question answering,
Counting in visual question answering, Image captioning
BibRef
Zhou, K.Y.[Kai-Yang],
Yang, J.K.[Jing-Kang],
Loy, C.C.[Chen Change],
Liu, Z.W.[Zi-Wei],
Learning to Prompt for Vision-Language Models,
IJCV(130), No. 9, September 2022, pp. 2337-2348.
Springer DOI
2208
BibRef
Zhou, K.Y.[Kai-Yang],
Yang, J.K.[Jing-Kang],
Loy, C.C.[Chen Change],
Liu, Z.W.[Zi-Wei],
Conditional Prompt Learning for Vision-Language Models,
CVPR22(16795-16804)
IEEE DOI
2210
Training, Representation learning, Adaptation models,
Neural networks, Manuals, Market research, Representation learning
BibRef
Ma, C.C.[Cheng-Cheng],
Liu, Y.[Yang],
Deng, J.K.[Jian-Kang],
Xie, L.X.[Ling-Xi],
Dong, W.M.[Wei-Ming],
Xu, C.S.[Chang-Sheng],
Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models,
CirSysVideo(33), No. 9, September 2023, pp. 4616-4629.
IEEE DOI Code:
WWW Link.
2310
BibRef
Zhu, Y.Q.[Yong-Qing],
Li, X.Y.[Xiang-Yang],
Zheng, M.[Mao],
Yang, J.H.[Jia-Hao],
Wang, Z.H.[Zi-Han],
Guo, X.Q.[Xiao-Qian],
Chai, Z.F.[Zi-Feng],
Yuan, Y.C.[Yu-Chen],
Jiang, S.Q.[Shu-Qiang],
Focus and Align: Learning Tube Tokens for Video-Language Pre-Training,
MultMed(25), 2023, pp. 8036-8050.
IEEE DOI
2312
BibRef
Chen, C.Q.[Chong-Qing],
Han, D.[Dezhi],
Chang, C.C.[Chin-Chen],
MPCCT: Multimodal vision-language learning paradigm with
context-based compact Transformer,
PR(147), 2024, pp. 110084.
Elsevier DOI Code:
WWW Link.
2312
Multimodal vision-language paradigms,
High-dependency modeling, Visual question answering (VQA),
Logical relationship reasoning
BibRef
Wu, W.H.[Wen-Hao],
Sun, Z.[Zhun],
Song, Y.X.[Yu-Xin],
Wang, J.D.[Jing-Dong],
Ouyang, W.L.[Wan-Li],
Transferring Vision-Language Models for Visual Recognition:
A Classifier Perspective,
IJCV(132), No. 2, February 2024, pp. 392-409.
Springer DOI
2402
BibRef
Ming, Y.F.[Yi-Fei],
Li, Y.X.[Yi-Xuan],
How Does Fine-Tuning Impact Out-of-Distribution Detection for
Vision-Language Models?,
IJCV(132), No. 2, February 2024, pp. 596-609.
Springer DOI
2402
BibRef
Zhao, C.R.[Cai-Rong],
Wang, Y.[Yubin],
Jiang, X.Y.[Xin-Yang],
Shen, Y.F.[Yi-Fei],
Song, K.[Kaitao],
Li, D.S.[Dong-Sheng],
Miao, D.Q.[Duo-Qian],
Learning Domain Invariant Prompt for Vision-Language Models,
IP(33), 2024, pp. 1348-1360.
IEEE DOI
2402
Task analysis, Tuning, Training, Adaptation models, Visualization,
Image color analysis, Self-supervised learning, Prompt learning,
domain generalization
BibRef
Yang, X.F.[Xiao-Feng],
Liu, F.[Fayao],
Lin, G.S.[Guo-Sheng],
Neural Logic Vision Language Explainer,
MultMed(26), 2024, pp. 3331-3340.
IEEE DOI
2402
Cognition, Logic programming, Deep learning, Visualization,
Data models, Training, Markov processes,
vision language pretraining
BibRef
Wang, Y.D.[Yi-Dong],
Yu, Z.O.[Zhu-Ohao],
Wang, J.D.[Jin-Dong],
Heng, Q.[Qiang],
Chen, H.[Hao],
Ye, W.[Wei],
Xie, R.[Rui],
Xie, X.[Xing],
Zhang, S.K.[Shi-Kun],
Exploring Vision-Language Models for Imbalanced Learning,
IJCV(132), No. 1, January 2024, pp. 224-237.
Springer DOI
2402
BibRef
Yu, Z.T.[Zheng-Tao],
Zhao, J.[Jia],
Guo, C.L.[Chen-Liang],
Yang, Y.[Ying],
StableNet: Distinguishing the hard samples to overcome language
priors in visual question answering,
IET-CV(18), No. 2, 2024, pp. 315-327.
DOI Link
2403
multimedia systems
BibRef
Zeng, Y.[Yan],
Zhang, X.[Xinsong],
Li, H.[Hang],
Wang, J.W.[Jia-Wei],
Zhang, J.P.[Ji-Peng],
Zhou, W.[Wangchunshu],
X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks,
PAMI(46), No. 5, May 2024, pp. 3156-3168.
IEEE DOI
2404
Task analysis, Visualization, Transformers, Detectors, Training,
Feature extraction, Image coding,
vision language pre-training
BibRef
Zheng, Y.Z.[Yao-Zong],
Zhong, B.[Bineng],
Liang, Q.H.[Qi-Hua],
Li, G.R.[Guo-Rong],
Ji, R.R.[Rong-Rong],
Li, X.X.[Xian-Xian],
Toward Unified Token Learning for Vision-Language Tracking,
CirSysVideo(34), No. 4, April 2024, pp. 2125-2135.
IEEE DOI
2404
Task analysis, Target tracking, Visualization, Feature extraction,
Pipelines, Linguistics, Training, Vision-language tracking,
multi-modal modeling
BibRef
Ye, P.[Ping],
Xiao, G.[Gang],
Liu, J.[Jun],
Multimodal Features Alignment for Vision-Language Object Tracking,
RS(16), No. 7, 2024, pp. 1168.
DOI Link
2404
BibRef
Bazi, Y.[Yakoub],
Bashmal, L.[Laila],
Rahhal, M.M.A.[Mohamad Mahmoud Al],
Ricci, R.[Riccardo],
Melgani, F.[Farid],
RS-LLaVA: A Large Vision-Language Model for Joint Captioning and
Question Answering in Remote Sensing Imagery,
RS(16), No. 9, 2024, pp. 1477.
DOI Link
2405
BibRef
Kong, D.[Daehyeon],
Kong, K.[Kyeongbo],
Kang, S.J.[Suk-Ju],
Image clustering using generated text centroids,
SP:IC(125), 2024, pp. 117128.
Elsevier DOI
2405
Deep neural network, Image clustering, Multimodal task, Vision-language model
BibRef
Chen, X.Y.[Xian-Yu],
Yang, J.H.[Jin-Hui],
Chen, S.[Shi],
Wang, L.[Louis],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Every Problem, Every Step, All in Focus: Learning to Solve
Vision-Language Problems With Integrated Attention,
PAMI(46), No. 7, July 2024, pp. 4720-4735.
IEEE DOI
2406
Problem-solving, Task analysis, Visualization, Measurement,
Graph neural networks, Cognition, Videos, Graph attention,
vision-language problem solving
BibRef
Menon, S.[Sachit],
Chandratreya, I.P.[Ishaan Preetam],
Vondrick, C.[Carl],
Task Bias in Contrastive Vision-Language Models,
IJCV(132), No. 6, June 2024, pp. 2026-2040.
Springer DOI
2406
BibRef
Zhang, J.Y.[Jing-Yi],
Huang, J.X.[Jia-Xing],
Jin, S.[Sheng],
Lu, S.J.[Shi-Jian],
Vision-Language Models for Vision Tasks: A Survey,
PAMI(46), No. 8, August 2024, pp. 5625-5644.
IEEE DOI
2407
Task analysis, Visualization, Training, Deep learning, Surveys,
Data models, Predictive models, Big Data, big model, deep learning,
image classification
BibRef
Dong, M.P.[Meng-Ping],
Li, F.[Fei],
Li, Z.B.[Zhen-Bo],
Liu, X.[Xue],
Cluster prototype earth mover's distance adapters and
alignment-guided prompt learning for vision-language models,
PR(156), 2024, pp. 110861.
Elsevier DOI
2408
Cluster prototype, Earth mover's distance, Adapter,
Prompt learning, Vision-language models
BibRef
Sahin, U.[Ugur],
Li, H.[Hang],
Khan, Q.[Qadeer],
Cremers, D.[Daniel],
Tresp, V.[Volker],
Enhancing Multimodal Compositional Reasoning of Visual Language
Models with Generative Negative Mining,
WACV24(5551-5561)
IEEE DOI Code:
HTML Version.
2404
Training, Visualization, Codes, Pipelines, Self-supervised learning,
Cognition, Algorithms, Vision + language and/or other modalities
BibRef
Yang, C.[Cheng],
Xu, R.[Rui],
Guo, Y.[Ye],
Huang, P.X.[Pei-Xiang],
Chen, Y.[Yiru],
Ding, W.[Wenkui],
Wang, Z.Y.[Zhong-Yuan],
Zhou, H.[Hong],
Improving Vision-and-Language Reasoning via Spatial Relations
Modeling,
WACV24(758-767)
IEEE DOI
2404
Visualization, Analytical models, Graphical models,
Statistical analysis, Computational modeling, Excavation,
Vision + language and/or other modalities
BibRef
Shen, S.[Sheng],
Yang, S.[Shijia],
Zhang, T.J.[Tian-Jun],
Zhai, B.[Bohan],
Gonzalez, J.E.[Joseph E.],
Keutzer, K.[Kurt],
Darrell, T.J.[Trevor J.],
Multitask Vision-Language Prompt Tuning,
WACV24(5644-5655)
IEEE DOI
2404
Learning systems, Visualization, Adaptation models,
Benchmark testing, Vectors, Task analysis, Algorithms,
Vision + language and/or other modalities
BibRef
Zhang, G.[Gengyuan],
Zhang, Y.R.[Yu-Rui],
Zhang, K.[Kerui],
Tresp, V.[Volker],
Can Vision-Language Models be a Good Guesser? Exploring VLMs for
Times and Location Reasoning,
WACV24(625-634)
IEEE DOI Code:
WWW Link.
2404
Visualization, Computational modeling, Feature extraction,
Cognition, Task analysis, Commonsense reasoning, Algorithms,
Vision + language and/or other modalities
BibRef
Feinglass, J.[Joshua],
Yang, Y.Z.[Ye-Zhou],
Towards Addressing the Misalignment of Object Proposal Evaluation for
Vision-Language Tasks via Semantic Grounding,
WACV24(4385-4395)
IEEE DOI
2404
Measurement, Visualization, Protocols, Annotations, Grounding,
Semantics, Question answering (information retrieval),
Image recognition and understanding
BibRef
Nadeem, A.[Asmar],
Hilton, A.[Adrian],
Dawes, R.[Robert],
Thomas, G.[Graham],
Mustafa, A.[Armin],
CAD: Contextual Multi-modal Alignment for Dynamic AVQA,
WACV24(7236-7248)
IEEE DOI
2404
Visualization, Semantics, Decision making, Robustness,
Question answering (information retrieval), Complexity theory,
Smartphones / end user devices
BibRef
Wu, W.[Wenyi],
Li, Q.[Qi],
Zhong, W.L.[Wen-Liang],
Huang, J.Z.[Jun-Zhou],
MIVC: Multiple Instance Visual Component for Visual-Language Models,
WACV24(8102-8111)
IEEE DOI
2404
Visualization, Computational modeling, Neural networks,
Question answering (information retrieval),
Image recognition and understanding
BibRef
Ganz, R.[Roy],
Nuriel, O.[Oren],
Aberdam, A.[Aviad],
Kittenplon, Y.[Yair],
Mazor, S.[Shai],
Litman, R.[Ron],
Towards Models that Can See and Read,
ICCV23(21661-21671)
IEEE DOI
2401
BibRef
Zhang, H.[Heng],
Liu, D.[Daqing],
Lv, Z.[Zezhong],
Su, B.[Bing],
Tao, D.C.[Da-Cheng],
Exploring Temporal Concurrency for Video-Language Representation
Learning,
ICCV23(15522-15532)
IEEE DOI Code:
WWW Link.
2401
BibRef
Shukor, M.[Mustafa],
Dancette, C.[Corentin],
Cord, M.[Matthieu],
eP-ALM: Efficient Perceptual Augmentation of Language Models,
ICCV23(21999-22012)
IEEE DOI Code:
WWW Link.
2401
BibRef
Schulter, S.[Samuel],
Kumar, B.G.V.[B.G. Vijay],
Suh, Y.M.[Yu-Min],
Dafnis, K.M.[Konstantinos M.],
Zhang, Z.X.[Zhi-Xing],
Zhao, S.Y.[Shi-Yu],
Metaxas, D.N.[Dimitris N.],
OmniLabel: A Challenging Benchmark for Language-Based Object
Detection,
ICCV23(11919-11928)
IEEE DOI Code:
WWW Link.
2401
BibRef
Chen, Z.L.[Zi-Liang],
Huang, X.[Xin],
Guan, Q.L.[Quan-Long],
Lin, L.[Liang],
Luo, W.Q.[Wei-Qi],
A Retrospect to Multi-prompt Learning across Vision and Language,
ICCV23(22133-22144)
IEEE DOI
2401
BibRef
Derakhshani, M.M.[Mohammad Mahdi],
Sanchez, E.[Enrique],
Bulat, A.[Adrian],
da Costa, V.G.T.[Victor Guilherme Turrisi],
Snoek, C.G.M.[Cees G. M.],
Tzimiropoulos, G.[Georgios],
Martinez, B.[Brais],
Bayesian Prompt Learning for Image-Language Model Generalization,
ICCV23(15191-15200)
IEEE DOI Code:
WWW Link.
2401
BibRef
Cascante-Bonilla, P.[Paola],
Shehada, K.[Khaled],
Smith, J.S.[James Seale],
Doveh, S.[Sivan],
Kim, D.H.[Dong-Hyun],
Panda, R.[Rameswar],
Varol, G.[Gül],
Oliva, A.[Aude],
Ordonez, V.[Vicente],
Feris, R.S.[Rogerio S.],
Karlinsky, L.[Leonid],
Going Beyond Nouns With Vision & Language Models Using Synthetic
Data,
ICCV23(20098-20108)
IEEE DOI
2401
BibRef
Zara, G.[Giacomo],
Conti, A.[Alessandro],
Roy, S.[Subhankar],
Lathuilière, S.[Stéphane],
Rota, P.[Paolo],
Ricci, E.[Elisa],
The Unreasonable Effectiveness of Large Language-Vision Models for
Source-free Video Domain Adaptation,
ICCV23(10273-10283)
IEEE DOI
2401
BibRef
Upadhyay, U.[Uddeshya],
Karthik, S.[Shyamgopal],
Mancini, M.[Massimiliano],
Akata, Z.[Zeynep],
ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models,
ICCV23(1899-1910)
IEEE DOI Code:
WWW Link.
2401
BibRef
Chen, Z.H.[Zhi-Hong],
Diao, S.Z.[Shi-Zhe],
Wang, B.[Benyou],
Li, G.B.[Guan-Bin],
Wan, X.[Xiang],
Towards Unifying Medical Vision-and-Language Pre-training via Soft
Prompts,
ICCV23(23346-23356)
IEEE DOI
2401
BibRef
Bitton-Guetta, N.[Nitzan],
Bitton, Y.[Yonatan],
Hessel, J.[Jack],
Schmidt, L.[Ludwig],
Elovici, Y.[Yuval],
Stanovsky, G.[Gabriel],
Schwartz, R.[Roy],
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
Synthetic and Compositional Images,
ICCV23(2616-2627)
IEEE DOI
2401
BibRef
Hu, Z.Y.[Zi-Yuan],
Li, Y.[Yanyang],
Lyu, M.R.[Michael R.],
Wang, L.W.[Li-Wei],
VL-PET: Vision-and-Language Parameter-Efficient Tuning via
Granularity Control,
ICCV23(2998-3008)
IEEE DOI Code:
WWW Link.
2401
BibRef
Slyman, E.[Eric],
Kahng, M.[Minsuk],
Lee, S.[Stefan],
VLSlice: Interactive Vision-and-Language Slice Discovery,
ICCV23(15245-15255)
IEEE DOI
2401
BibRef
Najibi, M.[Mahyar],
Ji, J.W.[Jing-Wei],
Zhou, Y.[Yin],
Qi, C.R.[Charles R.],
Yan, X.C.[Xin-Chen],
Ettinger, S.[Scott],
Anguelov, D.[Dragomir],
Unsupervised 3D Perception with 2D Vision-Language Distillation for
Autonomous Driving,
ICCV23(8568-8578)
IEEE DOI
2401
BibRef
Zheng, K.[Kecheng],
Wu, W.[Wei],
Feng, R.[Ruili],
Zhu, K.[Kai],
Liu, J.W.[Jia-Wei],
Zhao, D.L.[De-Li],
Zha, Z.J.[Zheng-Jun],
Chen, W.[Wei],
Shen, Y.J.[Yu-Jun],
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models,
ICCV23(11629-11639)
IEEE DOI
2401
BibRef
Wang, T.[Tan],
Lin, K.[Kevin],
Li, L.J.[Lin-Jie],
Lin, C.C.[Chung-Ching],
Yang, Z.Y.[Zheng-Yuan],
Zhang, H.W.[Han-Wang],
Liu, Z.C.[Zi-Cheng],
Wang, L.J.[Li-Juan],
Equivariant Similarity for Vision-Language Foundation Models,
ICCV23(11964-11974)
IEEE DOI
2401
BibRef
Xu, H.[Hu],
Xie, S.[Saining],
Huang, P.Y.[Po-Yao],
Yu, L.C.[Li-Cheng],
Howes, R.[Russell],
Ghosh, G.[Gargi],
Zettlemoyer, L.[Luke],
Feichtenhofer, C.[Christoph],
CiT: Curation in Training for Effective Vision-Language Data,
ICCV23(15134-15143)
IEEE DOI
2401
BibRef
Trager, M.[Matthew],
Perera, P.[Pramuditha],
Zancato, L.[Luca],
Achille, A.[Alessandro],
Bhatia, P.[Parminder],
Soatto, S.[Stefano],
Linear Spaces of Meanings: Compositional Structures in
Vision-Language Models,
ICCV23(15349-15358)
IEEE DOI
2401
BibRef
Chen, Y.S.[Yi-Syuan],
Song, Y.Z.[Yun-Zhu],
Yeo, C.Y.[Cheng Yu],
Liu, B.[Bei],
Fu, J.L.[Jian-Long],
Shuai, H.H.[Hong-Han],
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks,
ICCV23(15384-15396)
IEEE DOI
2401
BibRef
Wu, C.E.[Cheng-En],
Tian, Y.[Yu],
Yu, H.C.[Hai-Chao],
Wang, H.[Heng],
Morgado, P.[Pedro],
Hu, Y.H.[Yu Hen],
Yang, L.J.[Lin-Jie],
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy
Labels?,
ICCV23(15442-15451)
IEEE DOI Code:
WWW Link.
2401
BibRef
Ouali, Y.[Yassine],
Bulat, A.[Adrian],
Matinez, B.[Brais],
Tzimiropoulos, G.[Georgios],
Black Box Few-Shot Adaptation for Vision-Language models,
ICCV23(15488-15500)
IEEE DOI Code:
WWW Link.
2401
BibRef
Kan, B.[Baoshuo],
Wang, T.[Teng],
Lu, W.P.[Wen-Peng],
Zhen, X.T.[Xian-Tong],
Guan, W.[Weili],
Zheng, F.[Feng],
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language
Models,
ICCV23(15624-15634)
IEEE DOI
2401
BibRef
Zhai, J.T.[Jiang-Tian],
Zhang, Q.[Qi],
Wu, T.[Tong],
Chen, X.Y.[Xing-Yu],
Liu, J.J.[Jiang-Jiang],
Cheng, M.M.[Ming-Ming],
SLAN: Self-Locator Aided Network for Vision-Language Understanding,
ICCV23(21892-21901)
IEEE DOI Code:
WWW Link.
2401
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Yuan, J.[Junkun],
Tan, Z.C.[Zi-Chang],
Liu, J.J.[Jiang-Jiang],
Zhou, L.P.[Lu-Ping],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models,
ICCV23(21902-21912)
IEEE DOI
2401
BibRef
Cho, E.[Eulrang],
Kim, J.[Jooyeon],
Kim, H.W.J.[Hyun-Woo J.],
Distribution-Aware Prompt Tuning for Vision-Language Models,
ICCV23(21947-21956)
IEEE DOI Code:
WWW Link.
2401
BibRef
Varma, M.[Maya],
Delbrouck, J.B.[Jean-Benoit],
Hooper, S.[Sarah],
Chaudhari, A.[Akshay],
Langlotz, C.[Curtis],
ViLLA: Fine-Grained Vision-Language Representation Learning from
Real-World Data,
ICCV23(22168-22178)
IEEE DOI
2401
BibRef
Zhu, H.G.[Hong-Guang],
Wei, Y.C.[Yun-Chao],
Liang, X.D.[Xiao-Dan],
Zhang, C.J.[Chun-Jie],
Zhao, Y.[Yao],
CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation,
ICCV23(22200-22210)
IEEE DOI Code:
WWW Link.
2401
BibRef
Salin, E.[Emmanuelle],
Ayache, S.[Stéphane],
Favre, B.[Benoit],
Towards an Exhaustive Evaluation of Vision-Language Foundation Models,
MMFM23(339-352)
IEEE DOI
2401
BibRef
Hu, Z.[Zhizhang],
Zhu, X.L.[Xin-Liang],
Tran, S.[Son],
Vidal, R.[René],
Dhua, A.[Arnab],
ProVLA: Compositional Image Search with Progressive Vision-Language
Alignment and Multimodal Fusion,
CLVL23(2764-2769)
IEEE DOI
2401
BibRef
Hall, M.[Melissa],
Gustafson, L.[Laura],
Adcock, A.[Aaron],
Misra, I.[Ishan],
Ross, C.[Candace],
Vision-Language Models Performing Zero-Shot Tasks Exhibit Disparities
Between Gender Groups,
CLVL23(2770-2777)
IEEE DOI
2401
BibRef
Agnolucci, L.[Lorenzo],
Baldrati, A.[Alberto],
Todino, F.[Francesco],
Becattini, F.[Federico],
Bertini, M.[Marco],
del Bimbo, A.[Alberto],
ECO: Ensembling Context Optimization for Vision-Language Models,
CLVL23(2803-2807)
IEEE DOI
2401
BibRef
Palit, V.[Vedant],
Pandey, R.[Rohan],
Arora, A.[Aryaman],
Liang, P.P.[Paul Pu],
Towards Vision-Language Mechanistic Interpretability: A Causal
Tracing Tool for BLIP,
CLVL23(2848-2853)
IEEE DOI
2401
BibRef
Sammani, F.[Fawaz],
Deligiannis, N.[Nikos],
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language
Tasks,
VLAR23(4636-4641)
IEEE DOI
2401
BibRef
Lu, D.[Dong],
Wang, Z.Q.[Zhi-Qiang],
Wang, T.[Teng],
Guan, W.[Weili],
Gao, H.[Hongchang],
Zheng, F.[Feng],
Set-level Guidance Attack: Boosting Adversarial Transferability of
Vision-Language Pre-training Models,
ICCV23(102-111)
IEEE DOI Code:
WWW Link.
2401
BibRef
Lee, D.J.[Dong-Jun],
Song, S.[Seokwon],
Suh, J.[Jihee],
Choi, J.[Joonmyeong],
Lee, S.[Sanghyeok],
Kim, H.W.J.[Hyun-Woo J.],
Read-only Prompt Optimization for Vision-Language Few-shot Learning,
ICCV23(1401-1411)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, X.[Xuanlin],
Fang, Y.H.[Yun-Hao],
Liu, M.H.[Ming-Hua],
Ling, Z.[Zhan],
Tu, Z.W.[Zhuo-Wen],
Su, H.[Hao],
Distilling Large Vision-Language Model with Out-of-Distribution
Generalizability,
ICCV23(2492-2503)
IEEE DOI
2401
BibRef
Li, J.C.[Jun-Cheng],
Gao, M.[Minghe],
Wei, L.[Longhui],
Tang, S.L.[Si-Liang],
Zhang, W.Q.[Wen-Qiao],
Li, M.[Mengze],
Ji, W.[Wei],
Tian, Q.[Qi],
Chua, T.S.[Tat-Seng],
Zhuang, Y.T.[Yue-Ting],
Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models,
ICCV23(2551-2562)
IEEE DOI
2401
BibRef
Bi, J.Y.[Jun-Yu],
Cheng, D.[Daixuan],
Yao, P.[Ping],
Pang, B.[Bochen],
Zhan, Y.F.[Yue-Feng],
Yang, C.G.[Chuan-Guang],
Wang, Y.J.[Yu-Jing],
Sun, H.[Hao],
Deng, W.W.[Wei-Wei],
Zhang, Q.[Qi],
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and
Instance-Level Matching,
ICCV23(2584-2593)
IEEE DOI
2401
BibRef
Udandarao, V.[Vishaal],
Gupta, A.[Ankush],
Albanie, S.[Samuel],
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models,
ICCV23(2725-2736)
IEEE DOI Code:
WWW Link.
2401
BibRef
Jiang, C.[Chaoya],
Xu, H.Y.[Hai-Yang],
Ye, W.[Wei],
Ye, Q.H.[Qing-Hao],
Li, C.L.[Chen-Liang],
Yan, M.[Ming],
Bi, B.[Bin],
Zhang, S.K.[Shi-Kun],
Huang, F.[Fei],
Huang, S.[Songfang],
BUS: Efficient and Effective Vision-language Pre-training with
Bottom-Up Patch Summarization,
ICCV23(2888-2898)
IEEE DOI
2401
BibRef
Shi, C.[Cheng],
Yang, S.[Sibei],
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for
Vision-Language Models,
ICCV23(2920-2929)
IEEE DOI
2401
BibRef
Wang, A.J.P.[Alex Jin-Peng],
Lin, K.Q.[Kevin Qinghong],
Zhang, D.J.H.[David Jun-Hao],
Lei, S.W.X.[Stan Wei-Xian],
Shou, M.Z.[Mike Zheng],
Too Large; Data Reduction for Vision-Language Pre-Training,
ICCV23(3124-3134)
IEEE DOI
2401
BibRef
Wang, W.H.[Wei-Han],
Yang, Z.[Zhen],
Xu, B.[Bin],
Li, J.[Juanzi],
Sun, Y.[Yankui],
ViLTA: Enhancing Vision-Language Pre-training through Textual
Augmentation,
ICCV23(3135-3146)
IEEE DOI
2401
BibRef
Wang, T.J.J.[Tzu-Jui Julius],
Laaksonen, J.[Jorma],
Langer, T.[Tomas],
Arponen, H.[Heikki],
Bishop, T.E.[Tom E.],
Learning by Hallucinating:
Vision-Language Pre-training with Weak Supervision,
WACV23(1073-1083)
IEEE DOI
2302
Visualization, Vocabulary, Computational modeling, Detectors,
Benchmark testing, Transformers, un-supervised learning
BibRef
Boecking, B.[Benedikt],
Usuyama, N.[Naoto],
Bannur, S.[Shruthi],
Castro, D.C.[Daniel C.],
Schwaighofer, A.[Anton],
Hyland, S.[Stephanie],
Wetscherek, M.[Maria],
Naumann, T.[Tristan],
Nori, A.[Aditya],
Alvarez-Valle, J.[Javier],
Poon, H.[Hoifung],
Oktay, O.[Ozan],
Making the Most of Text Semantics to Improve Biomedical Vision-Language
Processing,
ECCV22(XXXVI:1-21).
Springer DOI
2211
BibRef
Cui, Q.[Quan],
Zhou, B.[Boyan],
Guo, Y.[Yu],
Yin, W.D.[Wei-Dong],
Wu, H.[Hao],
Yoshie, O.[Osamu],
Chen, Y.[Yubo],
Contrastive Vision-Language Pre-training with Limited Resources,
ECCV22(XXXVI:236-253).
Springer DOI
2211
BibRef
Walmer, M.[Matthew],
Sikka, K.[Karan],
Sur, I.[Indranil],
Shrivastava, A.[Abhinav],
Jha, S.[Susmit],
Dual-Key Multimodal Backdoors for Visual Question Answering,
CVPR22(15354-15364)
IEEE DOI
2210
Visualization, Training data, Detectors, Feature extraction,
Question answering (information retrieval),
Vision + language
BibRef
Ding, Y.[Yang],
Yu, J.[Jing],
Liu, B.[Bang],
Hu, Y.[Yue],
Cui, M.X.[Ming-Xin],
Wu, Q.[Qi],
MuKEA: Multimodal Knowledge Extraction and Accumulation for
Knowledge-based Visual Question Answering,
CVPR22(5079-5088)
IEEE DOI
2210
Bridges, Visualization, Codes, Computational modeling,
Knowledge based systems, Semantics, Vision + language
BibRef
Gao, F.[Feng],
Ping, Q.[Qing],
Thattai, G.[Govind],
Reganti, A.[Aishwarya],
Wu, Y.N.[Ying Nian],
Natarajan, P.[Prem],
Transform-Retrieve-Generate: Natural Language-Centric
Outside-Knowledge Visual Question Answering,
CVPR22(5057-5067)
IEEE DOI
2210
Knowledge engineering, Visualization, Solid modeling,
Knowledge based systems, Natural languages, Transforms,
Visual reasoning
BibRef
Aflalo, E.[Estelle],
Du, M.[Meng],
Tseng, S.Y.[Shao-Yen],
Liu, Y.F.[Yong-Fei],
Wu, C.[Chenfei],
Duan, N.[Nan],
Lal, V.[Vasudev],
VL-InterpreT: An Interactive Visualization Tool for Interpreting
Vision-Language Transformers,
CVPR22(21374-21383)
IEEE DOI
2210
Heating systems, Visualization, Machine vision,
Computational modeling, Transformers, Question answering (information retrieval)
BibRef
Hu, X.W.[Xiao-Wei],
Gan, Z.[Zhe],
Wang, J.F.[Jian-Feng],
Yang, Z.Y.[Zheng-Yuan],
Liu, Z.C.[Zi-Cheng],
Lu, Y.[Yumao],
Wang, L.J.[Li-Juan],
Scaling Up Vision-Language Pretraining for Image Captioning,
CVPR22(17959-17968)
IEEE DOI
2210
Training, Visualization, Computational modeling, Training data,
Benchmark testing, Transformers, Feature extraction, Vision + language
BibRef
Zhang, P.C.[Peng-Chuan],
Li, X.J.[Xiu-Jun],
Hu, X.W.[Xiao-Wei],
Yang, J.W.[Jian-Wei],
Zhang, L.[Lei],
Wang, L.J.[Li-Juan],
Choi, Y.J.[Ye-Jin],
Gao, J.F.[Jian-Feng],
VinVL: Revisiting Visual Representations in Vision-Language Models,
CVPR21(5575-5584)
IEEE DOI
2111
Training, Visualization, Computational modeling, Object detection,
Benchmark testing, Feature extraction, Transformers
BibRef
Li, Z.W.[Zhuo-Wan],
Stengel-Eskin, E.[Elias],
Zhang, Y.X.[Yi-Xiao],
Xie, C.[Cihang],
Tran, Q.[Quan],
van Durme, B.[Benjamin],
Yuille, A.L.[Alan L.],
Calibrating Concepts and Operations:
Towards Symbolic Reasoning on Real Images,
ICCV21(14890-14899)
IEEE DOI
2203
Visualization, Analytical models, Codes, Computational modeling,
Cognition, Data models, Vision + language
BibRef
Yang, X.[Xu],
Zhang, H.W.[Han-Wang],
Qi, G.J.[Guo-Jun],
Cai, J.F.[Jian-Fei],
Causal Attention for Vision-Language Tasks,
CVPR21(9842-9852)
IEEE DOI
2111
Correlation, Codes, Computational modeling,
Training data, Transformers, Data models
BibRef
Stefanini, M.[Matteo],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
A Novel Attention-based Aggregation Function to Combine Vision and
Language,
ICPR21(1212-1219)
IEEE DOI
2105
Deep learning, Visualization, Image retrieval,
Transforms, Knowledge discovery
BibRef
Jain, V.,
Lodhavia, J.,
Automatic Question Tagging using k-Nearest Neighbors and Random
Forest,
ISCV20(1-4)
IEEE DOI
2011
learning (artificial intelligence),
question answering (information retrieval),
Natural Language Processing
BibRef
Zheng, W.B.[Wen-Bo],
Yan, L.[Lan],
Gou, C.[Chao],
Wang, F.Y.[Fei-Yue],
Webly Supervised Knowledge Embedding Model for Visual Reasoning,
CVPR20(12442-12451)
IEEE DOI
2008
Visual reasoning between visual image and natural language description.
Visualization, Cognition, Knowledge based systems, Task analysis,
Knowledge engineering, Modulation, Robustness
BibRef
Nguyen, D.K.[Duy-Kien],
Okatani, T.[Takayuki],
Multi-Task Learning of Hierarchical Vision-Language Representation,
CVPR19(10484-10493).
IEEE DOI
2002
BibRef
Gupta, T.[Tanmay],
Shih, K.J.[Kevin J.],
Singh, S.[Saurabh],
Hoiem, D.[Derek],
Aligned Image-Word Representations Improve Inductive Transfer Across
Vision-Language Tasks,
ICCV17(4223-4232)
IEEE DOI
1802
data visualisation, image recognition,
learning (artificial intelligence),
Visualization
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Video Question Answering, Movies, Spatio-Temporal, Query, VQA .