Osman, A.[Ahmed],
Samek, W.[Wojciech],
DRAU: Dual Recurrent Attention Units for Visual Question Answering,
CVIU(185), 2019, pp. 24-30.
Elsevier DOI
1906
isual Question Answering, Attention Mechanisms,
Multi-modal Learning, Machine Vision, Natural Language Processing
BibRef
Li, W.[Wei],
Sun, J.H.[Jian-Hui],
Liu, G.[Ge],
Zhao, L.L.[Ling-Lan],
Fang, X.Z.[Xiang-Zhong],
Visual question answering with attention transfer and a cross-modal
gating mechanism,
PRL(133), 2020, pp. 334-340.
Elsevier DOI
2005
Attention, Visual question answering, Gating
BibRef
Yu, J.[Jing],
Zhu, Z.H.[Zi-Hao],
Wang, Y.J.[Yu-Jing],
Zhang, W.F.[Wei-Feng],
Hu, Y.[Yue],
Tan, J.L.[Jian-Long],
Cross-modal knowledge reasoning for knowledge-based visual question
answering,
PR(108), 2020, pp. 107563.
Elsevier DOI
2008
Cross-modal knowledge reasoning, Multimodal knowledge graphs,
Compositional reasoning module, Explainable reasoning
BibRef
Yang, Z.Q.[Zhuo-Qian],
Qin, Z.C.[Zeng-Chang],
Yu, J.[Jing],
Wan, T.[Tao],
Prior Visual Relationship Reasoning For Visual Question Answering,
ICIP20(1411-1415)
IEEE DOI
2011
Visualization, Semantics, Convolution, Cognition,
Knowledge discovery, Benchmark testing, Measurement, VQA,
GCN, Attention Mechanism
BibRef
Yu, J.[Jing],
Zhang, W.F.[Wei-Feng],
Lu, Y.H.[Yu-Hang],
Qin, Z.C.[Zeng-Chang],
Hu, Y.[Yue],
Tan, J.L.[Jian-Long],
Wu, Q.[Qi],
Reasoning on the Relation: Enhancing Visual Representation for Visual
Question Answering and Cross-Modal Retrieval,
MultMed(22), No. 12, December 2020, pp. 3196-3209.
IEEE DOI
2011
Visualization, Cognition, Task analysis, Knowledge discovery,
Semantics, Correlation, Information retrieval,
cross-modal information retrieval
BibRef
Wu, Y.R.[Yi-Rui],
Ma, Y.T.[Yun-Tao],
Wan, S.H.[Shao-Hua],
Multi-scale relation reasoning for multi-modal Visual Question
Answering,
SP:IC(96), 2021, pp. 116319.
Elsevier DOI
2106
Multi-modal data, Visual Question Answering,
Multi-scale relation reasoning, Attention model
BibRef
Ma, Y.T.[Yun-Tao],
Lu, T.[Tong],
Wu, Y.R.[Yi-Rui],
Multi-scale Relational Reasoning with Regional Attention for Visual
Question Answering,
ICPR21(5642-5649)
IEEE DOI
2105
Visualization, Neural networks, Knowledge discovery, Cognition,
Robustness, Data mining, Visual question learning, Attention,
Multi-scale relational reasoning
BibRef
Hu, J.[Jun],
Qian, S.S.[Sheng-Sheng],
Fang, Q.[Quan],
Xu, C.S.[Chang-Sheng],
Heterogeneous Community Question Answering via Social-Aware
Multi-Modal Co-Attention Convolutional Matching,
MultMed(23), 2021, pp. 2321-2334.
IEEE DOI
2108
Visualization, Semantics, Knowledge discovery, Context modeling,
Portable computers, Task analysis, Object detection, social multimedia
BibRef
Farazi, M.[Moshiur],
Khan, S.[Salman],
Barnes, N.M.[Nick M.],
Accuracy vs. complexity: A trade-off in visual question answering
models,
PR(120), 2021, pp. 108106.
Elsevier DOI
2109
Visual question answering, Visual feature extraction,
Language features, Multi-modal fusion, Speed-accuracy trade-off
BibRef
Liu, F.[Fei],
Liu, J.[Jing],
Fang, Z.W.[Zhi-Wei],
Hong, R.C.[Ri-Chang],
Lu, H.Q.[Han-Qing],
Visual Question Answering With Dense Inter- and Intra-Modality
Interactions,
MultMed(23), 2021, pp. 3518-3529.
IEEE DOI
2110
Visualization, Knowledge discovery, Connectors, Encoding, Task analysis,
Image coding, Stacking, Visual question answering, dense interactions
BibRef
Wu, J.J.[Jia-Jia],
Du, J.[Jun],
Wang, F.[Fengren],
Yang, C.[Chen],
Jiang, X.Z.[Xin-Zhe],
Hu, J.[Jinshui],
Yin, B.[Bing],
Zhang, J.S.[Jian-Shu],
Dai, L.R.[Li-Rong],
A multimodal attention fusion network with a dynamic vocabulary for
TextVQA,
PR(122), 2022, pp. 108214.
Elsevier DOI
2112
Dynamic vocabulary, Attention map, Multimodal fusion, ST-VQA
BibRef
Peng, L.[Liang],
Yang, Y.[Yang],
Wang, Z.[Zheng],
Huang, Z.[Zi],
Shen, H.T.[Heng Tao],
MRA-Net: Improving VQA Via Multi-Modal Relation Attention Network,
PAMI(44), No. 1, January 2022, pp. 318-329.
IEEE DOI
2112
Visualization, Feature extraction, Semantics, Knowledge discovery,
Cognition, Task analysis, Natural languages,
relation attention
BibRef
Shuang, K.[Kai],
Guo, J.[Jinyu],
Wang, Z.H.[Zi-Han],
Comprehensive-perception dynamic reasoning for visual question
answering,
PR(131), 2022, pp. 108878.
Elsevier DOI
2208
Cross-modal information fusion, Visual question answering,
Comprehensive perception, Relational reasoning
BibRef
Xie, J.Y.[Jia-Yuan],
Fang, W.H.[Wen-Hao],
Cai, Y.[Yi],
Huang, Q.B.[Qing-Bao],
Li, Q.[Qing],
Knowledge-Based Visual Question Generation,
CirSysVideo(32), No. 11, November 2022, pp. 7547-7558.
IEEE DOI
2211
Visualization, Feature extraction, Task analysis,
Knowledge based systems, Knowledge representation, Decoding, multimodal
BibRef
Gao, C.Y.[Chen-Yu],
Zhu, Q.[Qi],
Wang, P.[Peng],
Li, H.[Hui],
Liu, Y.L.[Yu-Liang],
van den Hengel, A.J.[Anton J.],
Wu, Q.[Qi],
Structured Multimodal Attentions for TextVQA,
PAMI(44), No. 12, December 2022, pp. 9603-9614.
IEEE DOI
2212
Optical character recognition software, Cognition, Visualization,
Text recognition, Task analysis, Knowledge discovery, Annotations, transformer
BibRef
Xu, F.Z.[Fang-Zhi],
Lin, Q.[Qika],
Liu, J.[Jun],
Zhang, L.L.[Ling-Ling],
Zhao, T.Z.[Tian-Zhe],
Chai, Q.[Qi],
Pan, Y.[Yudai],
Huang, Y.[Yi],
Wang, Q.Y.[Qian-Ying],
MoCA: Incorporating domain pretraining and cross attention for
textbook question answering,
PR(140), 2023, pp. 109588.
Elsevier DOI
2305
Textbook question answering, Multimodal, Pretraining, Attention
BibRef
Mohamud, S.A.M.[Safaa Abdullahi Moallim],
Jalali, A.[Amin],
Lee, M.H.[Min-Ho],
Encoder-decoder cycle for visual question answering based on
perception-action cycle,
PR(144), 2023, pp. 109848.
Elsevier DOI
2310
Visual question answering, Vision language tasks,
Multi-modality fusion, Attention, Bilinear fusion, Brain-inspired frameworks
BibRef
Tito, R.[Rubèn],
Karatzas, D.[Dimosthenis],
Valveny, E.[Ernest],
Hierarchical multimodal transformers for Multipage DocVQA,
PR(144), 2023, pp. 109834.
Elsevier DOI
2310
Multipage document Visual Question Answering,
Document Visual Question Answering, Multipage documents, Document Intelligence
BibRef
Biswas, K.[Kunal],
Shivakumara, P.[Palaiahnakote],
Pal, U.[Umapada],
Liu, C.L.[Cheng-Lin],
Lu, Y.[Yue],
VQAPT: A New visual question answering model for personality traits
in social media images,
PRL(175), 2023, pp. 66-73.
Elsevier DOI
2311
Personality trait images, Multimodal concept, Text recognition,
Social media images, Natural language processing, Visual question answering
BibRef
Cho, J.W.[Jae Won],
Argaw, D.M.[Dawit Mureja],
Oh, Y.[Youngtaek],
Kim, D.J.[Dong-Jin],
Kweon, I.S.[In So],
Empirical study on using adapters for debiased Visual Question
Answering,
CVIU(237), 2023, pp. 103842.
Elsevier DOI
2311
Visual Question Answering, Model Robustness, Biased Data, Adapters
BibRef
Cho, J.W.[Jae Won],
Kim, D.J.[Dong-Jin],
Choi, J.[Jinsoo],
Jung, Y.[Yunjae],
Kweon, I.S.[In So],
Dealing with Missing Modalities in the Visual Question
Answer-Difference Prediction Task through Knowledge Distillation,
MULA21(1592-1601)
IEEE DOI
2109
Visualization, Knowledge discovery,
Task analysis, Bars
BibRef
Cho, J.W.[Jae Won],
Kim, D.J.[Dong-Jin],
Ryu, H.[Hyeonggon],
Kweon, I.S.[In So],
Generative Bias for Robust Visual Question Answering,
CVPR23(11681-11690)
IEEE DOI
2309
BibRef
Mashrur, A.[Akib],
Luo, W.[Wei],
Zaidi, N.A.[Nayyar A.],
Robles-Kelly, A.[Antonio],
Robust visual question answering via semantic cross modal
augmentation,
CVIU(238), 2024, pp. 103862.
Elsevier DOI
2312
Visual question answering, Transformers, Multimodal learning,
Model Robustness, Data augmentation
BibRef
Yao, H.B.[Hai-Bo],
Wang, L.P.[Li-Peng],
Cai, C.T.[Cheng-Tao],
Sun, Y.X.[Yu-Xin],
Zhang, Z.[Zhi],
Luo, Y.K.[Yong-Kang],
Multi-modal spatial relational attention networks for visual question
answering,
IVC(140), 2023, pp. 104840.
Elsevier DOI
2312
Visual question answering, Spatial relation,
Attention mechanism, Pre-training strategy
BibRef
Zheng, W.B.[Wen-Bo],
Yan, L.[Lan],
Wang, F.Y.[Fei-Yue],
So Many Heads, So Many Wits: Multimodal Graph Reasoning for
Text-Based Visual Question Answering,
SMCS(54), No. 2, February 2024, pp. 854-865.
IEEE DOI
2402
Visualization, Cognition,
Question answering (information retrieval), Feature extraction,
text-based visual question answering
BibRef
Bi, Y.D.[Yan-Dong],
Jiang, H.[Huajie],
Hu, Y.L.[Yong-Li],
Sun, Y.F.[Yan-Feng],
Yin, B.C.[Bao-Cai],
See and Learn More: Dense Caption-Aware Representation for Visual
Question Answering,
CirSysVideo(34), No. 2, February 2024, pp. 1135-1146.
IEEE DOI
2402
Visualization, Cognition, Question answering (information retrieval),
Feature extraction, cross-modal fusion
BibRef
Jiang, J.J.[Jing-Jing],
Liu, Z.Y.[Zi-Yi],
Zheng, N.N.[Nan-Ning],
Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering,
IJCV(132), No. 1, January 2024, pp. 185-207.
Springer DOI
2402
BibRef
Zhang, S.[Siyu],
Chen, Y.[Yeming],
Sun, Y.[Yaoru],
Wang, F.[Fang],
Shi, H.B.[Hai-Bo],
Wang, H.R.[Hao-Ran],
LOIS: Looking Out of Instance Semantics for Visual Question Answering,
MultMed(26), 2024, pp. 6202-6214.
IEEE DOI
2404
Visualization, Semantics, Task analysis, Feature extraction,
Question answering (information retrieval), Cognition, Detectors,
multimodal relation attention
BibRef
Xie, J.Y.[Jia-Yuan],
Cai, Y.[Yi],
Chen, J.L.[Jia-Li],
Xu, R.H.[Ruo-Hang],
Wang, J.X.[Jie-Xin],
Li, Q.[Qing],
Knowledge-Augmented Visual Question Answering With Natural Language
Explanation,
IP(33), 2024, pp. 2652-2664.
IEEE DOI Code:
WWW Link.
2404
Task analysis, Visualization, Feature extraction,
Question answering (information retrieval), Iterative methods,
multimodal
BibRef
Wang, J.J.[Jun-Jue],
Ma, A.L.[Ai-Long],
Chen, Z.H.[Zi-Hang],
Zheng, Z.[Zhuo],
Wan, Y.T.[Yu-Ting],
Zhang, L.P.[Liang-Pei],
Zhong, Y.F.[Yan-Fei],
EarthVQANet: Multi-task visual question answering for remote sensing
image understanding,
PandRS(212), 2024, pp. 422-439.
Elsevier DOI Code:
HTML Version.
2406
Visual question answering, Semantic segmentation,
Multi-modal fusion, Multi-task learning, Knowledge reasoning
BibRef
Qian, S.[Shun],
Liu, B.Q.[Bing-Quan],
Sun, C.J.[Cheng-Jie],
Xu, Z.[Zhen],
Ma, L.[Lin],
Wang, B.[Baoxun],
CroMIC-QA: The Cross-Modal Information Complementation Based Question
Answering,
MultMed(26), 2024, pp. 8348-8359.
IEEE DOI
2408
Task analysis, Visualization, Semantics, Crops,
Question answering (information retrieval), Diseases, multi-modal tasks
BibRef
Uehara, K.[Kohei],
Harada, T.[Tatsuya],
Learning by Asking Questions for Knowledge-Based Novel Object
Recognition,
IJCV(132), No. 6, June 2024, pp. 2290-2309.
Springer DOI
2406
BibRef
Earlier:
K-VQG: Knowledge-aware Visual Question Generation for Common-sense
Acquisition,
WACV23(4390-4398)
IEEE DOI
2302
Recognize novel objects.
Learning systems, Visualization, Knowledge acquisition,
Benchmark testing, Task analysis, visual reasoning)
BibRef
Uehara, K.[Kohei],
Duan, N.[Nan],
Harada, T.[Tatsuya],
Learning to Ask Informative Sub-Questions for Visual Question
Answering,
MULA22(4680-4689)
IEEE DOI
2210
Training, Visualization, Computational modeling,
Reinforcement learning, Predictive models
BibRef
Li, Y.K.[Yi-Kang],
Duan, N.[Nan],
Zhou, B.L.[Bo-Lei],
Chu, X.[Xiao],
Ouyang, W.L.[Wan-Li],
Wang, X.G.[Xiao-Gang],
Zhou, M.[Ming],
Visual Question Generation as Dual Task of Visual Question Answering,
CVPR18(6116-6124)
IEEE DOI
1812
Task analysis, Visualization, Knowledge discovery, Training,
Computational modeling
BibRef
Gao, P.[Peng],
Li, H.S.[Hong-Sheng],
Li, S.[Shuang],
Lu, P.[Pan],
Li, Y.K.[Yi-Kang],
Hoi, S.C.H.[Steven C. H.],
Wang, X.G.[Xiao-Gang],
Question-Guided Hybrid Convolution for Visual Question Answering,
ECCV18(I: 485-501).
Springer DOI
1810
BibRef
Gao, P.[Peng],
Jiang, Z.K.[Zheng-Kai],
You, H.X.[Hao-Xuan],
Lu, P.[Pan],
Hoi, S.C.H.[Steven C. H.],
Wang, X.G.[Xiao-Gang],
Li, H.S.[Hong-Sheng],
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual
Question Answering,
CVPR19(6632-6641).
IEEE DOI
2002
BibRef
Vosoughi, A.[Ali],
Deng, S.J.[Shi-Jian],
Zhang, S.Y.[Song-Yang],
Tian, Y.P.[Ya-Peng],
Xu, C.L.[Chen-Liang],
Luo, J.B.[Jie-Bo],
Cross Modality Bias in Visual Question Answering:
A Causal View With Possible Worlds VQA,
MultMed(26), 2024, pp. 8609-8624.
IEEE DOI
2408
Visualization, Faces, Training, Linguistics, Cultural differences,
Question answering (information retrieval), Cognition,
visual question answering (VQA)
BibRef
Guo, Y.Y.[Yang-Yang],
Jiao, F.[Fangkai],
Shen, Z.Q.[Zhi-Qi],
Nie, L.Q.[Li-Qiang],
Kankanhalli, M.[Mohan],
UNK-VQA: A Dataset and a Probe Into the Abstention Ability of
Multi-Modal Large Models,
PAMI(46), No. 12, December 2024, pp. 10284-10296.
IEEE DOI
2411
Perturbation methods, Semantics, Benchmark testing, Visualization,
Image color analysis, visual question answering
BibRef
Chen, F.Y.[Fei-Yang],
Tang, X.S.[Xue-Song],
Hao, K.R.[Kuang-Rong],
GEXMERT: Geometrically enhanced cross-modality encoder representations
from transformers inspired by higher-order visual percepts,
PR(158), 2025, pp. 111047.
Elsevier DOI
2411
Bio-inspiration, Multi-modal, Visual question answering, Visual reasoning
BibRef
Zhang, B.[Boyuan],
Li, J.X.[Jia-Xu],
Shi, Y.C.[Yu-Cheng],
Han, Y.[Yahong],
Hu, Q.H.[Qing-Hua],
VADS: Visuo-Adaptive DualStrike attack on visual question answer,
CVIU(249), 2024, pp. 104137.
Elsevier DOI Code:
WWW Link.
2412
Cross-modal robustness, Visual question answer, Adversarial attack
BibRef
Peng, D.[Dahe],
Li, Z.X.[Zhi-Xin],
Unbiased VQA via modal information interaction and question
transformation,
PR(162), 2025, pp. 111394.
Elsevier DOI
2503
Deep learning, Multi-modal task, Visual question answering,
Language bias, Ensemble model
BibRef
Fan, L.[Lin],
Gong, X.[Xun],
Zheng, C.Y.[Cen-Yang],
Tan, X.L.[Xu-Li],
Li, J.[Jiao],
Ou, Y.F.[Ya-Fei],
Cycle-VQA: A Cycle-Consistent Framework for Robust Medical Visual
Question Answering,
PR(165), 2025, pp. 111609.
Elsevier DOI
2505
Visual Question Answering, Multi-modal fusion,
Gastrointestinal Stromal Tumor, Cycle consistency, Multi-attributes learning
BibRef
Lin, Q.[Qika],
He, K.[Kai],
Zhu, Y.F.[Yi-Fan],
Xu, F.Z.[Fang-Zhi],
Cambria, E.[Erik],
Feng, M.L.[Meng-Ling],
Cross-Modal Knowledge Diffusion-Based Generation for Difference-Aware
Medical VQA,
IP(34), 2025, pp. 2421-2434.
IEEE DOI
2505
Biomedical imaging, Visualization, Medical diagnostic imaging,
Noise, Semantics, Medical services, Diffusion processes,
difference-aware modeling
BibRef
Kim, B.S.[Byeong Su],
Kim, J.[Jieun],
Lee, D.[Deokwoo],
Jang, B.[Beakcheol],
Visual Question Answering: A Survey of Methods, Datasets, Evaluation,
and Challenges,
Surveys(57), No. 10, May 2025, pp. xx-yy.
DOI Link
2507
Survey, Visual Question Answering. Visual question answering, multi-modal, attention mechanism,
model-agnostic, language bias, computer vision, natural language processing
BibRef
Huang, C.Y.[Cheng-Yue],
Maneechotesuwan, B.[Brisa],
Chopra, S.[Shivang],
Kira, Z.[Zsolt],
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal
Shifts in Visual Question Answering,
CVPR25(3909-3918)
IEEE DOI Code:
WWW Link.
2508
Training, Measurement, Visualization, Correlation, Benchmark testing,
Robustness, Question answering (information retrieval), Faces
BibRef
Wu, K.X.[Kai-Xuan],
Li, X.[Xinde],
Li, X.[Xinling],
Hu, C.[Chuanfei],
Wu, G.L.[Guo-Liang],
AVQACL: A Novel Benchmark for Audio-Visual Question Answering
Continual Learning,
CVPR25(3252-3261)
IEEE DOI Code:
WWW Link.
2508
Continuing education, Codes, Semantics, Benchmark testing,
Question answering (information retrieval), Cognition,
catastrophic forgetting
BibRef
Zhao, X.Y.[Xin-Yang],
Bai, Z.W.[Zong-Wen],
Zhou, M.L.[Mei-Li],
Ren, X.C.[Xin-Cheng],
Wang, Y.Q.[Yu-Qing],
Wang, L.C.[Lin-Chun],
Integrating Dynamic Routing with Reinforcement Learning and
Multimodal Techniques for Visual Question Answering,
ICIVC24(295-301)
IEEE DOI
2503
Visualization, Adaptation models, Computational modeling,
Reinforcement learning, Transformers, Routing,
Self-Attention
BibRef
Park, K.R.[Kyu Ri],
Lee, H.J.[Hong Joo],
Kim, J.U.[Jung Uk],
Learning Trimodal Relation for Audio-visual Question Answering with
Missing Modality,
ECCV24(XV: 42-59).
Springer DOI
2412
BibRef
Mishra, A.[Aakansha],
Agarwala, A.[Aditya],
Tiwari, U.[Utsav],
Rajendiran, V.N.[Vikram N.],
Miriyala, S.S.[Srinivas S.],
Efficient Visual Question Answering on Embedded Devices:
Cross-Modality Attention with Evolutionary Quantization,
ICIP24(2142-2148)
IEEE DOI
2411
Training, Visualization, Quantization (signal), Accuracy, Runtime,
Pipelines, Programming, Visual Question Answering,
Model Deployment
BibRef
Jiang, X.[Xinyi],
Wang, G.M.[Guo-Ming],
Guo, J.H.[Jun-Hao],
Li, J.C.[Jun-Cheng],
Zhang, W.Q.[Wen-Qiao],
Lu, R.X.[Rong-Xing],
Tang, S.L.[Si-Liang],
DIEM: Decomposition-Integration Enhancing Multimodal Insights,
CVPR24(27294-27303)
IEEE DOI
2410
Accuracy, Error analysis, Cognitive processes, Focusing,
Benchmark testing, Question answering (information retrieval)
BibRef
Reichman, B.[Benjamin],
Heck, L.[Larry],
Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual
Question Answering,
CLVL23(2829-2834)
IEEE DOI
2401
BibRef
Qian, Z.[Zi],
Wang, X.[Xin],
Duan, X.G.[Xu-Guang],
Qin, P.[Pengda],
Li, Y.H.[Yu-Hong],
Zhu, W.W.[Wen-Wu],
Decouple Before Interact: Multi-Modal Prompt Learning for Continual
Visual Question Answering,
ICCV23(2941-2950)
IEEE DOI
2401
BibRef
Li, B.J.[Bing-Jia],
Wang, J.[Jie],
Zhao, M.[Minyi],
Zhou, S.[Shuigeng],
Two-stage Multimodality Fusion for High-performance Text-based Visual
Question Answering,
ACCV22(IV:658-674).
Springer DOI
2307
BibRef
Chai, Z.[Zi],
Wan, X.J.[Xiao-Jun],
Han, S.C.[Soyeon Caren],
Poon, J.[Josiah],
Visual Question Generation Under Multi-granularity Cross-Modal
Interaction,
MMMod23(I: 255-266).
Springer DOI
2304
BibRef
Wang, J.H.[Jiang-Hai],
Hu, M.H.[Meng-Hao],
Song, Y.G.[Ya-Guang],
Yang, X.S.[Xiao-Shan],
Health-Oriented Multimodal Food Question Answering,
MMMod23(I: 191-203).
Springer DOI
2304
BibRef
Zhang, H.T.[Hao-Tian],
Wu, W.[Wei],
CAT: Re-Conv Attention in Transformer for Visual Question Answering,
ICPR22(1471-1477)
IEEE DOI
2212
Representation learning, Visualization, Predictive models,
Performance gain, Transformers, Feature extraction, Multi-modal task
BibRef
Dancette, C.[Corentin],
Cadène, R.[Rémi],
Teney, D.[Damien],
Cord, M.[Matthieu],
Beyond Question-Based Biases:
Assessing Multimodal Shortcut Learning in Visual Question Answering,
ICCV21(1554-1563)
IEEE DOI
2203
Training, Visualization, Protocols, Codes, Image color analysis,
Computational modeling, Vision + language, Explainable AI,
Visual reasoning and logical representation
BibRef
Felix, R.[Rafael],
Repasky, B.[Boris],
Hodge, S.[Samuel],
Zolfaghari, R.[Reza],
Abbasnejad, E.[Ehsan],
Sherrah, J.[Jamie],
Cross-Modal Visual Question Answering for Remote Sensing Data: the
International Conference on Digital Image Computing: Techniques and
Applications (DICTA 2021),
DICTA21(1-9)
IEEE DOI
2201
Earth, Visualization, Satellites, Digital images, Natural languages,
Machine learning, Transformers, Visual Question Answering,
OpenStreetMap
BibRef
Chen, H.Y.[Hong-Yu],
Liu, R.F.[Rui-Fang],
Peng, B.[Bo],
Cross-modal Relational Reasoning Network for Visual Question
Answering,
MAIR2-21(3939-3948)
IEEE DOI
2112
Bridges, Visualization, Semantics,
Knowledge discovery, Linear programming
BibRef
Farazi, M.[Moshiur],
Khan, S.[Salman],
Barnes, N.M.[Nick M.],
Question-Agnostic Attention for Visual Question Answering,
ICPR21(3542-3549)
IEEE DOI
2105
Training, Visualization, Image resolution, Preforms,
Computational modeling, Semantics, Focusing,
Multimodal Fusion
BibRef
Li, Y.[Yanan],
Lin, Y.[Yuetan],
Zhao, H.H.[Hong-Hui],
Wang, D.H.[Dong-Hui],
Dual Path Multi-Modal High-Order Features for Textual Content based
Visual Question Answering,
ICPR21(4324-4331)
IEEE DOI
2105
Visualization, Image recognition, Image coding, Correlation,
Text recognition, Fuses, Semantics
BibRef
Huang, H.T.[Han-Tao],
Han, T.[Tao],
Han, W.[Wei],
Yap, D.[Deep],
Chiang, C.M.[Cheng-Ming],
Answer-checking in Context:
A Multi-modal Fully Attention Network for Visual Question Answering,
ICPR21(1173-1180)
IEEE DOI
2105
Visualization, Bit error rate, Image representation,
Knowledge discovery
BibRef
Kant, Y.[Yash],
Batra, D.[Dhruv],
Anderson, P.[Peter],
Schwing, A.[Alexander],
Parikh, D.[Devi],
Lu, J.[Jiasen],
Agrawal, H.[Harsh],
Spatially Aware Multimodal Transformers for TextVQA,
ECCV20(IX:715-732).
Springer DOI
2011
BibRef
Hu, R.,
Singh, A.,
Darrell, T.J.,
Rohrbach, M.,
Iterative Answer Prediction With Pointer-Augmented Multimodal
Transformers for TextVQA,
CVPR20(9989-9999)
IEEE DOI
2008
Optical character recognition software, Task analysis, Feature extraction,
Visualization, Iterative decoding, Vocabulary, Predictive models
BibRef
Peng, G.[Gao],
You, H.X.[Hao-Xuan],
Zhang, Z.P.[Zhan-Peng],
Wang, X.G.[Xiao-Gang],
Li, H.S.[Hong-Sheng],
Multi-Modality Latent Interaction Network for Visual Question
Answering,
ICCV19(5824-5834)
IEEE DOI
2004
data visualisation, image representation, image retrieval,
learning (artificial intelligence), Object detection
BibRef
Cadene, R.[Remi],
Ben-younes, H.[Hedi],
Cord, M.[Matthieu],
Thome, N.[Nicolas],
MUREL: Multimodal Relational Reasoning for Visual Question Answering,
CVPR19(1989-1998).
IEEE DOI
2002
BibRef
Haurilet, M.[Monica],
Al-Halah, Z.[Ziad],
Stiefelhagen, R.[Rainer],
DynGraph: Visual Question Answering via Dynamic Scene Graphs,
GCPR19(428-441).
Springer DOI
1911
BibRef
Earlier:
MoQA: A Multi-modal Question Answering Architecture,
VL18(IV:106-113).
Springer DOI
1905
BibRef
Gu, J.X.[Jiu-Xiang],
Cai, J.F.[Jian-Fei],
Joty, S.[Shafiq],
Niu, L.[Li],
Wang, G.[Gang],
Look, Imagine and Match: Improving Textual-Visual Cross-Modal
Retrieval with Generative Models,
CVPR18(7181-7189)
IEEE DOI
1812
Visualization, Training, Decoding, Semantics, Measurement.
BibRef
Sheng, S.R.[Shu-Rong],
Venkitasubramanian, A.N.[Aparna Nurani],
Moens, M.F.[Marie-Francine],
A Markov Network Based Passage Retrieval Method for Multimodal Question
Answering in the Cultural Heritage Domain,
MMMod18(I:3-15).
Springer DOI
1802
BibRef
Yu, Z.,
Yu, J.,
Fan, J.,
Tao, D.,
Multi-modal Factorized Bilinear Pooling with Co-attention Learning
for Visual Question Answering,
ICCV17(1839-1848)
IEEE DOI
1802
computational complexity, feature extraction, image fusion,
learning (artificial intelligence), Visualization
BibRef
Ben-Younes, H.,
Cadene, R.,
Cord, M.,
Thome, N.,
MUTAN: Multimodal Tucker Fusion for Visual Question Answering,
ICCV17(2631-2639)
IEEE DOI
1802
image fusion, image representation,
question answering (information retrieval), tensors, (VQA) tasks,
Visualization
BibRef
Kembhavi, A.,
Seo, M.,
Schwenk, D.,
Choi, J.,
Farhadi, A.,
Hajishirzi, H.,
Are You Smarter Than a Sixth Grader? Textbook Question Answering for
Multimodal Machine Comprehension,
CVPR17(5376-5384)
IEEE DOI
1711
Cognition, Knowledge discovery, Natural languages,
Training, Visualization
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
VQA, Visual Question Answering, Neural Networks .