19.4.3.3.2 Visual Dialog

Chapter Contents (Back)
Visual Dialog. Dialog.

Das, A.[Abhishek], Kottur, S.[Satwik], Gupta, K.[Khushi], Singh, A.[Avi], Yadav, D.[Deshraj], Lee, S.[Stefan], Moura, J.M.F.[José M. F.], Parikh, D.[Devi], Batra, D.[Dhruv],
Visual Dialog,
PAMI(41), No. 5, May 2019, pp. 1242-1256.
IEEE DOI 1904
Hold a meaningful dialog about visual content. Visualization, Task analysis, Artificial intelligence, History, Protocols, Natural languages, Wheelchairs, Visual dialog, machine learning BibRef

Zhao, Z.[Zhou], Zhang, Z.[Zhu], Jiang, X.H.[Xing-Hua], Cai, D.[Deng],
Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks,
IP(28), No. 8, August 2019, pp. 3860-3872.
IEEE DOI 1907
learning (artificial intelligence), natural language processing, reinforcement learning BibRef

Gu, M.[Mao], Zhao, Z.[Zhou], Jin, W.[Weike], Cai, D.[Deng], Wu, F.[Fei],
Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks,
CirSysVideo(30), No. 12, December 2020, pp. 4453-4466.
IEEE DOI 2012
Visualization, Knowledge discovery, History, Task analysis, Context modeling, Decoding, Computational modeling, Video dialog, convolution BibRef

Guo, D., Wang, H., Wang, S., Wang, M.,
Textual-Visual Reference-Aware Attention Network for Visual Dialog,
IP(29), 2020, pp. 6655-6666.
IEEE DOI 2007
Visualization, Semantics, History, Correlation, Head, Cognition, Task analysis, Visual dialog, attention network, textual reference, multimodal semantic interaction BibRef

Patro, B.N.[Badri N.], Anupriy, Namboodiri, V.P.[Vinay P.],
Probabilistic framework for solving visual dialog,
PR(110), 2021, pp. 107586.
Elsevier DOI 2011
CNN, LSTM, Uncertainty, Aleatoric uncertainty, Epistemic uncertainty vision and language, Visual dialog, VQA, Bayesian deep learning BibRef

Zhao, L.[Lei], Lyu, X.Y.[Xin-Yu], Song, J.K.[Jing-Kuan], Gao, L.L.[Lian-Li],
GuessWhich? Visual dialog with attentive memory network,
PR(114), 2021, pp. 107823.
Elsevier DOI 2103
Visual dialog, Attentive memory network, Reinforcement learning BibRef

Jiang, T.L.[Tian-Ling], Shao, H.L.[Hai-Lin], Tian, X.[Xin], Ji, Y.[Yi], Liu, C.P.[Chun-Ping],
Aligning vision-language for graph inference in visual dialog,
IVC(116), 2021, pp. 104316.
Elsevier DOI 2112
Visual dialog, Alignment, Graph inference, Scene graph BibRef

Guo, D.[Dan], Wang, H.[Hui], Wang, M.[Meng],
Context-Aware Graph Inference With Knowledge Distillation for Visual Dialog,
PAMI(44), No. 10, October 2022, pp. 6056-6073.
IEEE DOI 2209
Visualization, Task analysis, History, Cognition, Semantics, Linguistics, Image edge detection, Visual dialog, knowledge distillation BibRef

Guo, D.[Dan], Wang, H.[Hui], Zhang, H.W.[Han-Wang], Zha, Z.J.[Zheng-Jun], Wang, M.[Meng],
Iterative Context-Aware Graph Inference for Visual Dialog,
CVPR20(10052-10061)
IEEE DOI 2008
Visualization, History, Task analysis, Semantics, Message passing, Neural networks, Cognition BibRef

Patro, B.N.[Badri N.], Anupriy, Namboodiri, V.P.[Vinay P.],
Explanation vs. attention: A two-player game to obtain attention for VQA and visual dialog,
PR(132), 2022, pp. 108898.
Elsevier DOI 2209
CNN, LSTM, Explanation, Attention, Grad-CAM, MMD, CORAL, GAN, VQA, Visual Dialog, Deep learning BibRef

Zhu, Y.[Ye], Wu, Y.[Yu], Yang, Y.[Yi], Yan, Y.[Yan],
Saying the Unseen: Video Descriptions via Dialog Agents,
PAMI(44), No. 10, October 2022, pp. 7190-7204.
IEEE DOI 2209
Task analysis, Visualization, Artificial intelligence, Natural languages, Knowledge transfer, Semantics, multi-modal learning BibRef


Byun, J.[Jaeseok], Hwang, T.[Taebaek], Fu, J.L.[Jian-Long], Moon, T.[Taesup],
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training,
ECCV22(XIX:395-412).
Springer DOI 2211

WWW Link. BibRef

Yan, S.P.[Shi-Peng], Hong, L.[Lanqing], Xu, H.[Hang], Han, J.H.[Jian-Hua], Tuytelaars, T.[Tinne], Li, Z.G.[Zhen-Guo], He, X.M.[Xu-Ming],
Generative Negative Text Replay for Continual Vision-Language Pretraining,
ECCV22(XXXVI:22-38).
Springer DOI 2211
BibRef

Cai, Z.W.[Zhao-Wei], Kwon, G.[Gukyeong], Ravichandran, A.[Avinash], Bas, E.[Erhan], Tu, Z.W.[Zhuo-Wen], Bhotika, R.[Rahul], Soatto, S.[Stefano],
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks,
ECCV22(XXXVI:290-308).
Springer DOI 2211
BibRef

Zhang, Y.F.[Yi-Feng], Jiang, M.[Ming], Zhao, Q.[Qi],
New Datasets and Models for Contextual Reasoning in Visual Dialog,
ECCV22(XXXVI:434-451).
Springer DOI 2211
BibRef

Pham, H.A.[Hoang-Anh], Le, T.M.[Thao Minh], Le, V.[Vuong], Phuong, T.M.[Tu Minh], Tran, T.[Truyen],
Video Dialog as Conversation About Objects Living in Space-Time,
ECCV22(XXIX:710-726).
Springer DOI 2211
BibRef

Zhang, Z.F.[Ze-Fan], Jiang, T.L.[Tian-Ling], Liu, C.P.[Chun-Ping], Ji, Y.[Yi],
Coupling Attention and Convolution for Heuristic Network in Visual Dialog,
ICIP22(2896-2900)
IEEE DOI 2211
Couplings, Visualization, Convolution, Semantics, Benchmark testing, Thalamus, Visual dialog, attention, convolution BibRef

Zhang, H.Y.[Hang-Yu], Li, Y.M.[Ying-Ming], Zhang, Z.F.[Zhong-Fei],
Video-Grounded Dialogues with Joint Video and Image Training,
ICIP22(3903-3907)
IEEE DOI 2211
Training, Visualization, Transformers, Feature extraction, Data mining, Video-grounded Dialogues, Multimodality, Transformer BibRef

Zhang, S.[Shunyu], Jiang, X.Z.[Xiao-Ze], Yang, Z.[Zequn], Wan, T.[Tao], Qin, Z.[Zengchang],
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog,
MULA22(4599-4608)
IEEE DOI 2210
Visualization, Fuses, Semantics, Knowledge based systems, Oral communication, Transformers, Pattern recognition BibRef

Zhu, Y.[Yi], Weng, Y.[Yue], Zhu, F.[Fengda], Liang, X.D.[Xiao-Dan], Ye, Q.X.[Qi-Xiang], Lu, Y.[Yutong], Jiao, J.B.[Jian-Bin],
Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation,
ICCV21(1574-1583)
IEEE DOI 2203
Costs, Uncertainty, Navigation, Annotations, Reinforcement learning, Optimization, Vision+language, BibRef

Engin, D.[Deniz], Schnitzler, F.[François], Duong, N.Q.K.[Ngoc Q. K.], Avrithis, Y.[Yannis],
On the hidden treasure of dialog in video question answering,
ICCV21(2044-2053)
IEEE DOI 2203
Location awareness, TV, Codes, Video description, Annotations, Knowledge based systems, Video analysis and understanding, Vision + language BibRef

Matsumori, S.[Shoya], Shingyouchi, K.[Kosuke], Abe, Y.[Yuki], Fukuchi, Y.[Yosuke], Sugiura, K.[Komei], Imai, M.[Michita],
Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue,
ICCV21(1878-1887)
IEEE DOI 2203
Visualization, Buildings, Transformers, Task analysis, Artificial intelligence, Vision + language, Visual reasoning and logical representation BibRef

Tu, T.[Tao], Ping, Q.[Qing], Thattai, G.[Govindarajan], Tur, G.[Gokhan], Natarajan, P.[Prem],
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation,
CVPR21(5618-5627)
IEEE DOI 2111
Visualization, Games, Reinforcement learning, Generators, Encoding, Pattern recognition BibRef

Jiang, T.L.[Tian-Ling], Ji, Y.[Yi], Liu, C.P.[Chun-Ping],
Integrating Historical States and Co-attention Mechanism for Visual Dialog,
ICPR21(2041-2048)
IEEE DOI 2105
Visualization, Benchmark testing, Cognition, History, Task analysis, Faces BibRef

Nguyen, V.Q.[Van-Quang], Suganuma, M.[Masanori], Okatani, T.[Takayuki],
Efficient Attention Mechanism for Visual Dialog that Can Handle All the Interactions Between Multiple Inputs,
ECCV20(XXIV:223-240).
Springer DOI 2012
BibRef

Murahari, V.[Vishvak], Batra, D.[Dhruv], Parikh, D.[Devi], Das, A.[Abhishek],
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-art Baseline,
ECCV20(XVIII:336-352).
Springer DOI 2012
BibRef

Zhu, Y.[Ye], Wu, Y.[Yu], Yang, Y.[Yi], Yan, Y.[Yan],
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents,
ECCV20(XXIII:153-169).
Springer DOI 2011
BibRef

Qi, J., Niu, Y., Huang, J., Zhang, H.,
Two Causal Principles for Improving Visual Dialog,
CVPR20(10857-10866)
IEEE DOI 2008
Visualization, History, Task analysis, Data models, Training, Feeds, Decoding BibRef

Abbasnejad, E.[Ehsan], Teney, D.[Damien], Parvaneh, A.[Amin], Shi, J.[Javen], van den Hengel, A.J.[Anton J.],
Counterfactual Vision and Language Learning,
CVPR20(10041-10051)
IEEE DOI 2008
Training, Visualization, Training data, Task analysis, Machine learning, Knowledge discovery, Data models BibRef

Zhu, Y., Zhu, F., Zhan, Z., Lin, B., Jiao, J., Chang, X., Liang, X.,
Vision-Dialog Navigation by Exploring Cross-Modal Memory,
CVPR20(10727-10736)
IEEE DOI 2008
Navigation, Visualization, Task analysis, History, Memory modules, Natural languages, Decision making BibRef

Yang, T., Zha, Z., Zhang, H.,
Making History Matter: History-Advantage Sequence Training for Visual Dialog,
ICCV19(2561-2569)
IEEE DOI 2004
image retrieval, image sequences, interactive systems, neural nets, question answering (information retrieval), Decoding BibRef

Guo, D.[Dalu], Xu, C.[Chang], Tao, D.C.[Da-Cheng],
Image-Question-Answer Synergistic Network for Visual Dialog,
CVPR19(10426-10435).
IEEE DOI 2002
BibRef

Zheng, Z.L.[Zi-Long], Wang, W.G.[Wen-Guan], Qi, S.Y.[Si-Yuan], Zhu, S.C.[Song-Chun],
Reasoning Visual Dialogs With Structural and Partial Observations,
CVPR19(6662-6671).
IEEE DOI 2002
BibRef

Bani, G.[Gabriele], Belli, D.[Davide], Dagan, G.[Gautier], Geenen, A.[Alexander], Skliar, A.[Andrii], Venkatesh, A.[Aashish], Baumgärtner, T.[Tim], Bruni, E.[Elia], Fernández, R.[Raquel],
Adding Object Detection Skills to Visual Dialogue Agents,
VL18(IV:180-187).
Springer DOI 1905
BibRef

Yang, M., Yang, N.S.R., Zhang, K., Tao, J.,
Self-Talk: Responses to Users' Opinions and Challenges in Human Computer Dialog,
ICPR18(2839-2844)
IEEE DOI 1812
History, Robots, Databases, Predictive models, Pattern recognition, Automation, Search engines, human computer dialog, abstract extraction BibRef

Jain, U., Schwing, A., Lazebnik, S.,
Two Can Play This Game: Visual Dialog with Discriminative Question Generation and Answering,
CVPR18(5754-5763)
IEEE DOI 1812
Visualization, Task analysis, History, Knowledge discovery, Measurement, Training, Computer architecture BibRef

Dokania, P.K., Torr, P.H.S., Siddharth, N., Massiceti, D.,
FLIPDIAL: A Generative Model for Two-Way Visual Dialogue,
CVPR18(6097-6105)
IEEE DOI 1812
Visualization, Task analysis, Computational modeling, History, Data models, Pediatrics, Image color analysis BibRef

Wu, Q., Wang, P., Shen, C., Reid, I.D., van den Hengel, A.J.[Anton J.],
Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning,
CVPR18(6106-6115)
IEEE DOI 1812
Visualization, Task analysis, Generators, History, Computational modeling, Image color analysis BibRef

Kottur, S.[Satwik], Moura, J.M.F.[José M. F.], Parikh, D.[Devi], Batra, D.[Dhruv], Rohrbach, M.[Marcus],
Visual Coreference Resolution in Visual Dialog Using Neural Module Networks,
ECCV18(XV: 160-178).
Springer DOI 1810
BibRef

Strub, F.[Florian], Seurin, M.[Mathieu], Perez, E.[Ethan], de Vries, H.[Harm], Mary, J.[Jérémie], Preux, P.[Philippe], Courville, A.[Aaron], Pietquin, O.[Olivier],
Visual Reasoning with Multi-hop Feature Modulation,
ECCV18(VI: 808-831).
Springer DOI 1810
BibRef

Das, A., Kottur, S., Moura, J.M.F., Lee, S., Batra, D.,
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning,
ICCV17(2970-2979)
IEEE DOI 1802
interactive systems, learning (artificial intelligence), multi-agent systems, natural language interfaces, robot vision, Visualization BibRef

de Vries, H.[Harm], Strub, F.[Florian], Chandar, S.[Sarath], Pietquin, O.[Olivier], Larochelle, H.[Hugo], Courville, A.[Aaron],
GuessWhat?! Visual Object Discovery through Multi-modal Dialogue,
CVPR17(4466-4475)
IEEE DOI 1711
Databases, Games, Knowledge discovery, Natural languages, Visualization BibRef

Nam, H.[Hyeonseob], Ha, J.W.[Jung-Woo], Kim, J.[Jeonghee],
Dual Attention Networks for Multimodal Reasoning and Matching,
CVPR17(2156-2164)
IEEE DOI 1711
Cognition, Knowledge discovery, Mathematical model, Neural networks, Semantics, Visualization BibRef

Johnson, J.[Justin], Hariharan, B.[Bharath], van der Maaten, L.[Laurens], Hoffman, J., Fei-Fei, L.[Li], Zitnick, C.L.[C. Lawrence], Girshick, R.[Ross],
Inferring and Executing Programs for Visual Reasoning,
ICCV17(3008-3017)
IEEE DOI 1802
BibRef
Earlier: A1, A2, A3, A5, A6, A7, Only:
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning,
CVPR17(1988-1997)
IEEE DOI 1711
Dataset, Visual Reasoning.
WWW Link. backpropagation, image matching, learning (artificial intelligence), neural nets, Visualization. Cognition, Image color analysis, Metals, Semantics, Shape. BibRef

Das, A.[Abhishek], Kottur, S.[Satwik], Gupta, K.[Khushi], Singh, A.[Avi], Yadav, D.[Deshraj], Moura, J.M.F.[José M. F.], Parikh, D.[Devi], Batra, D.[Dhruv],
Visual Dialog,
CVPR17(1080-1089)
IEEE DOI 1711
Hold a dialog with humans in a natural visual context. History, Knowledge discovery, Protocols, Visualization, Wheelchairs BibRef

Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Visual Grounding, Grounding Expressions .


Last update:Nov 28, 2022 at 16:32:47