VQA: Visual Question Answering,
dataset containing open-ended questions about images
WWW Link.
Dataset, Visual Question Answering.
See also VQA: Visual Question Answering.
Visual Genome,
Visual Genome is a dataset, a knowledge base, an ongoing effort to
connect structured image concepts to language.
WWW Link.
WWW Link.
Dataset, Visual Question Answering.
Kafle, K.[Kushal],
Kanan, C.[Christopher],
Visual question answering:
Datasets, algorithms, and future challenges,
CVIU(163), No. 1, 2017, pp. 3-20.
Elsevier DOI
1712
Image understanding
BibRef
Wu, Q.[Qi],
Teney, D.[Damien],
Wang, P.[Peng],
Shen, C.H.[Chun-Hua],
Dick, A.[Anthony],
van den Hengel, A.J.[Anton J.],
Visual question answering: A survey of methods and datasets,
CVIU(163), No. 1, 2017, pp. 21-40.
Elsevier DOI
1712
Survey, Visual Question Answering. Visual question answering
BibRef
Teney, D.[Damien],
Wu, Q.,
van den Hengel, A.J.[Anton J.],
Visual Question Answering: A Tutorial,
SPMag(34), No. 6, November 2017, pp. 63-75.
IEEE DOI
1712
Survey, Visual Question Answering. Bioinformatics, Genomics, Machine learning, Visualization
BibRef
Teney, D.[Damien],
Liu, L.,
van den Hengel, A.J.[Anton J.],
Graph-Structured Representations for Visual Question Answering,
CVPR17(3233-3241)
IEEE DOI
1711
Feature extraction, Knowledge discovery, Neural networks,
Syntactics, Training, Visualization
BibRef
Teney, D.[Damien],
van den Hengel, A.J.[Anton J.],
Visual Question Answering as a Meta Learning Task,
ECCV18(XV: 229-245).
Springer DOI
1810
BibRef
Teney, D.[Damien],
Abbasnejad, E.[Ehsan],
van den Hengel, A.J.[Anton J.],
Unshuffling Data for Improved Generalization in Visual Question
Answering,
ICCV21(1397-1407)
IEEE DOI
2203
Training, Visualization, Annotations, Computational modeling,
Genomics, Training data, Vision + language,
BibRef
Wu, Q.[Qi],
Shen, C.H.[Chun-Hua],
Wang, P.[Peng],
Dick, A.[Anthony],
van den Hengel, A.J.[Anton J.],
Image Captioning and Visual Question Answering Based on Attributes
and External Knowledge,
PAMI(40), No. 6, June 2018, pp. 1367-1381.
IEEE DOI
1805
BibRef
Earlier: A1, A3, A2, A4, A5:
Ask Me Anything: Free-Form Visual Question Answering Based on
Knowledge from External Sources,
CVPR16(4622-4630)
IEEE DOI
1612
Computational modeling, Knowledge based systems,
Knowledge discovery, Resource description framework, Semantics,
visual question answering
BibRef
Tommasi, T.[Tatiana],
Mallya, A.[Arun],
Plummer, B.A.[Bryan A.],
Lazebnik, S.[Svetlana],
Berg, A.C.[Alexander C.],
Berg, T.L.[Tamara L.],
Combining Multiple Cues for Visual Madlibs Question Answering,
IJCV(127), No. 1, January 2019, pp. 38-60.
Springer DOI
1901
BibRef
Earlier:
Solving Visual Madlibs with Multiple Cues,
BMVC16(xx-yy).
HTML Version.
1805
BibRef
Yu, L.C.[Li-Cheng],
Park, E.[Eunbyung],
Berg, A.C.[Alexander C.],
Berg, T.L.[Tamara L.],
Visual Madlibs:
Fill in the Blank Description Generation and Question Answering,
ICCV15(2461-2469)
IEEE DOI
1602
dataset consisting of 360,001 focused natural language descriptions
for 10,738 images
BibRef
Liu, F.[Feng],
Xiang, T.[Tao],
Hospedales, T.M.[Timothy M.],
Yang, W.K.[Wan-Kou],
Sun, C.Y.[Chang-Yin],
Inverse Visual Question Answering:
A New Benchmark and VQA Diagnosis Tool,
PAMI(42), No. 2, February 2020, pp. 460-474.
IEEE DOI
2001
BibRef
Earlier:
iVQA: Inverse Visual Question Answering,
CVPR18(8611-8619)
IEEE DOI
1812
Benchmark testing, Visualization, Predictive models,
Analytical models, Image color analysis, Knowledge discovery,
reinforcement learning.
Task analysis, Measurement, Decoding, Natural languages, Cognition
BibRef
Patil, C.[Charulata],
Patwardhan, M.[Manasi],
Visual Question Generation: The State of the Art,
Surveys(53), No. 3, May 2020, pp. xx-yy.
DOI Link
2007
Image understanding, question generation
BibRef
He, F.J.[Fei-Juan],
Wang, Y.X.[Ya-Xian],
Miao, X.L.[Xiang-Lin],
Sun, X.[Xia],
Interpretable visual reasoning: A survey,
IVC(112), 2021, pp. 104194.
Elsevier DOI
2107
Visual question answering, Visual reasoning, Interpretability,
Datasets, Survey
BibRef
Sharma, H.[Himanshu],
Jalal, A.S.[Anand Singh],
A survey of methods, datasets and evaluation metrics for visual
question answering,
IVC(116), 2021, pp. 104327.
Elsevier DOI
2112
Natural language processing,
Deep neural networks, World knowledge, Attention
BibRef
Yang, L.[Lu],
Jiang, H.[He],
Song, Q.[Qing],
Guo, J.[Jun],
A Survey on Long-Tailed Visual Recognition,
IJCV(130), No. 7, July 2022, pp. 1837-1872.
Springer DOI
2207
Deep learning usually does the common well, not the rare.
See also YouTube-8M Dataset.
BibRef
Zhao, W.L.[Wen-Liang],
Rao, Y.M.[Yong-Ming],
Tang, Y.S.[Yan-Song],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
VideoABC: A Real-World Video Dataset for Abductive Visual Reasoning,
IP(31), 2022, pp. 6048-6061.
IEEE DOI
2209
Cognition, Visualization, Task analysis, Benchmark testing,
Question answering (information retrieval), Machine vision,
instruction video
BibRef
Lahouti, F.[Farshad],
Kostina, V.[Victoria],
Hassibi, B.[Babak],
How to Query an Oracle? Efficient Strategies to Label Data,
PAMI(44), No. 11, November 2022, pp. 7597-7609.
IEEE DOI
2210
Labeling, Erbium, Dogs, Crowdsourcing, Reliability, Decoding, Databases,
Machine learning, labeling datasets, clustering, classification,
entity resolution
BibRef
Ma, J.[Jie],
Wang, P.H.[Ping-Hui],
Kong, D.C.[De-Chen],
Wang, Z.W.[Ze-Wei],
Liu, J.[Jun],
Pei, H.B.[Hong-Bin],
Zhao, J.Z.[Jun-Zhou],
Robust Visual Question Answering: Datasets, Methods, and Future
Challenges,
PAMI(46), No. 8, August 2024, pp. 5575-5594.
IEEE DOI
2407
Sports, Task analysis, Robustness, Transformers, Training,
Question answering (information retrieval),
visual question answering
BibRef
Li, K.[Kun],
Vosselman, G.[George],
Yang, M.Y.[Michael Ying],
HRVQA: A Visual Question Answering benchmark for high-resolution
aerial images,
PandRS(214), 2024, pp. 65-81.
Elsevier DOI Code:
WWW Link.
2407
Visual question answering, High-resolution aerial images,
Transformers, Benchmark dataset
BibRef
Chen, C.[Chongyan],
Liu, M.C.[Meng-Chen],
Codella, N.[Noel],
Li, Y.S.[Yun-Sheng],
Yuan, L.[Lu],
Gurari, D.[Danna],
Fully Authentic Visual Question Answering Dataset from Online
Communities,
ECCV24(XLVIII: 252-269).
Springer DOI
2412
BibRef
Singh, M.[Monika],
Patvardhan, C.,
Lakshmi, C.V.[C. Vasantha],
Does ChatGPT Spell the End of Automatic Question Generation Research?,
ICCVMI23(1-6)
IEEE DOI
2403
Measurement, Computational modeling, Taxonomy, Manuals, Syntactics,
Chatbots, Cognition, ChatGPT, Transformer-based model,
Machine Learning
BibRef
Zhu, L.[Liuwan],
Ning, R.[Rui],
Li, J.[Jiang],
Xin, C.S.[Chun-Sheng],
Wu, H.Y.[Hong-Yi],
Most and Least Retrievable Images in Visual-Language Query Systems,
ECCV22(XXXVII:1-18).
Springer DOI
2211
BibRef
Salewski, L.[Leonard],
Emde, C.[Cornelius],
Do, V.[Virginie],
Akata, Z.[Zeynep],
Lukasiewicz, T.[Thomas],
e-ViL: A Dataset and Benchmark for Natural Language Explanations in
Vision-Language Tasks,
ICCV21(1224-1234)
IEEE DOI
2203
Measurement, Codes, Computational modeling, Natural languages,
Benchmark testing, Predictive models, Explainable AI,
Vision + language
BibRef
Gupta, V.[Vivek],
Patro, B.N.[Badri N.],
Parihar, H.[Hemant],
Namboodiri, V.P.[Vinay P.],
VQuAD: Video Question Answering Diagnostic Dataset,
Novelty22(282-291)
IEEE DOI
2202
Correlation, Codes, Conferences, Bit error rate,
Cognition, Task analysis
BibRef
Nishimura, T.[Taichi],
Sakoda, K.[Kojiro],
Hashimoto, A.[Atsushi],
Ushiku, Y.[Yoshitaka],
Tanaka, N.[Natsuko],
Ono, F.[Fumihito],
Kameko, H.[Hirotaka],
Mori, S.[Shinsuke],
Egocentric Biochemical Video-and-Language Dataset,
CLVL21(3122-3126)
IEEE DOI
2112
Visualization, Protocols, Annotations,
Biological system modeling, Data collection
BibRef
Zhang, M.[Mingda],
Maidment, T.[Tristan],
Diab, A.[Ahmad],
Kovashka, A.[Adriana],
Hwa, R.[Rebecca],
Domain-robust VQA with diverse datasets and methods but no target
labels,
CVPR21(7042-7052)
IEEE DOI
2111
Visualization, Adaptation models,
Computational modeling, Semantics, Linguistics, Syntactics
BibRef
Mathew, M.[Minesh],
Karatzas, D.[Dimosthenis],
Jawahar, C.V.,
DocVQA: A Dataset for VQA on Document Images,
WACV21(2199-2208)
IEEE DOI
WWW Link.
2106
Dataset, Visual Q-A. Visualization, Text analysis, Image recognition,
Image analysis, Layout
BibRef
Patel, D.[Devshree],
Parikh, R.[Ratnam],
Shastri, Y.[Yesha],
Recent Advances in Video Question Answering:
A Review of Datasets and Methods,
VTIUR20(339-356).
Springer DOI
2103
BibRef
Fan, C.,
EgoVQA: An Egocentric Video Question Answering Benchmark Dataset,
EPIC19(4359-4366)
IEEE DOI
2004
question answering (information retrieval),
video signal processing, EgoVQA dataset, visual question,
dataset
BibRef
Hudson, D.A.[Drew A.],
Manning, C.D.[Christopher D.],
GQA: A New Dataset for Real-World Visual Reasoning and Compositional
Question Answering,
CVPR19(6693-6702).
IEEE DOI
2002
BibRef
Yang, G.Y.R.[Guang-Yu Robert],
Ganichev, I.[Igor],
Wang, X.J.[Xiao-Jing],
Shlens, J.[Jonathon],
Sussillo, D.[David],
A Dataset and Architecture for Visual Reasoning with a Working Memory,
ECCV18(X: 729-745).
Springer DOI
1810
BibRef
Gan, C.,
Li, Y.,
Li, H.,
Sun, C.,
Gong, B.,
VQS: Linking Segmentations to Questions and Answers for Supervised
Attention in VQA and Question-Focused Semantic Segmentation,
ICCV17(1829-1838)
IEEE DOI
1802
image annotation, image segmentation, multilayer perceptrons,
question answering (information retrieval), COCO, VQA dataset,
Visualization
BibRef
Maharaj, T.,
Ballas, N.,
Rohrbach, A.,
Courville, A.,
Pal, C.,
A Dataset and Exploration of Models for Understanding Video Data
through Fill-in-the-Blank Question-Answering,
CVPR17(7359-7368)
IEEE DOI
1711
Computational modeling, Motion pictures, Natural languages,
Training, Visualization, Voltage, control
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Visual Dialog .