Das, A.[Abhishek],
Kottur, S.[Satwik],
Gupta, K.[Khushi],
Singh, A.[Avi],
Yadav, D.[Deshraj],
Lee, S.[Stefan],
Moura, J.M.F.[José M. F.],
Parikh, D.[Devi],
Batra, D.[Dhruv],
Visual Dialog,
PAMI(41), No. 5, May 2019, pp. 1242-1256.
IEEE DOI
1904
Hold a meaningful dialog about visual content.
Visualization, Task analysis, Artificial intelligence, History,
Protocols, Natural languages, Wheelchairs, Visual dialog,
machine learning
BibRef
Zhao, Z.[Zhou],
Zhang, Z.[Zhu],
Jiang, X.H.[Xing-Hua],
Cai, D.[Deng],
Multi-Turn Video Question Answering via Hierarchical Attention
Context Reinforced Networks,
IP(28), No. 8, August 2019, pp. 3860-3872.
IEEE DOI
1907
learning (artificial intelligence), natural language processing,
reinforcement learning
BibRef
Gu, M.[Mao],
Zhao, Z.[Zhou],
Jin, W.[Weike],
Cai, D.[Deng],
Wu, F.[Fei],
Video Dialog via Multi-Grained Convolutional Self-Attention Context
Multi-Modal Networks,
CirSysVideo(30), No. 12, December 2020, pp. 4453-4466.
IEEE DOI
2012
Visualization, Knowledge discovery, History, Task analysis,
Context modeling, Decoding, Computational modeling, Video dialog,
convolution
BibRef
Guo, D.,
Wang, H.,
Wang, S.,
Wang, M.,
Textual-Visual Reference-Aware Attention Network for Visual Dialog,
IP(29), 2020, pp. 6655-6666.
IEEE DOI
2007
Visualization, Semantics, History, Correlation, Head, Cognition,
Task analysis, Visual dialog, attention network, textual reference,
multimodal semantic interaction
BibRef
Patro, B.N.[Badri N.],
Anupriy,
Namboodiri, V.P.[Vinay P.],
Probabilistic framework for solving visual dialog,
PR(110), 2021, pp. 107586.
Elsevier DOI
2011
CNN, LSTM, Uncertainty, Aleatoric uncertainty,
Epistemic uncertainty vision and language, Visual dialog, VQA,
Bayesian deep learning
BibRef
Zhao, L.[Lei],
Lyu, X.Y.[Xin-Yu],
Song, J.K.[Jing-Kuan],
Gao, L.L.[Lian-Li],
GuessWhich? Visual dialog with attentive memory network,
PR(114), 2021, pp. 107823.
Elsevier DOI
2103
Visual dialog, Attentive memory network, Reinforcement learning
BibRef
Jiang, T.L.[Tian-Ling],
Shao, H.L.[Hai-Lin],
Tian, X.[Xin],
Ji, Y.[Yi],
Liu, C.P.[Chun-Ping],
Aligning vision-language for graph inference in visual dialog,
IVC(116), 2021, pp. 104316.
Elsevier DOI
2112
Visual dialog, Alignment, Graph inference, Scene graph
BibRef
Guo, D.[Dan],
Wang, H.[Hui],
Wang, M.[Meng],
Context-Aware Graph Inference With Knowledge Distillation for Visual
Dialog,
PAMI(44), No. 10, October 2022, pp. 6056-6073.
IEEE DOI
2209
Visualization, Task analysis, History, Cognition, Semantics,
Linguistics, Image edge detection, Visual dialog,
knowledge distillation
BibRef
Guo, D.[Dan],
Wang, H.[Hui],
Zhang, H.W.[Han-Wang],
Zha, Z.J.[Zheng-Jun],
Wang, M.[Meng],
Iterative Context-Aware Graph Inference for Visual Dialog,
CVPR20(10052-10061)
IEEE DOI
2008
Visualization, History, Task analysis, Semantics, Message passing,
Neural networks, Cognition
BibRef
Patro, B.N.[Badri N.],
Anupriy,
Namboodiri, V.P.[Vinay P.],
Explanation vs. attention: A two-player game to obtain attention for
VQA and visual dialog,
PR(132), 2022, pp. 108898.
Elsevier DOI
2209
CNN, LSTM, Explanation, Attention, Grad-CAM, MMD, CORAL, GAN, VQA,
Visual Dialog, Deep learning
BibRef
Zhu, Y.[Ye],
Wu, Y.[Yu],
Yang, Y.[Yi],
Yan, Y.[Yan],
Saying the Unseen: Video Descriptions via Dialog Agents,
PAMI(44), No. 10, October 2022, pp. 7190-7204.
IEEE DOI
2209
Task analysis, Visualization, Artificial intelligence,
Natural languages, Knowledge transfer, Semantics,
multi-modal learning
BibRef
Huang, Y.[Yan],
Wang, Y.M.[Yu-Ming],
Wang, L.[Liang],
Efficient Image and Sentence Matching,
PAMI(45), No. 3, March 2023, pp. 2970-2983.
IEEE DOI
2302
Matrix decomposition, Symmetric matrices, Computational modeling,
Predictive models, Analytical models, Task analysis,
vision and language
BibRef
Zhao, L.[Lei],
Li, J.L.[Jun-Lin],
Gao, L.L.[Lian-Li],
Rao, Y.[Yunbo],
Song, J.K.[Jing-Kuan],
Shen, H.T.[Heng Tao],
Heterogeneous Knowledge Network for Visual Dialog,
CirSysVideo(33), No. 2, February 2023, pp. 861-871.
IEEE DOI
2302
Visualization, Feature extraction, Task analysis, History,
Knowledge engineering, Semantics, Meters, Heterogeneous knowledge,
visual dialog
BibRef
Buçinca, Z.[Zana],
Yemez, Y.[Yücel],
Erzin, E.[Engin],
Sezgin, M.[Metin],
AffectON: Incorporating Affect Into Dialog Generation,
AffCom(14), No. 1, January 2023, pp. 823-835.
IEEE DOI
2303
Task analysis, Decoding, Training, Syntactics, Semantics, Computers,
Recurrent neural networks, Affective computing, affective dialog generation
BibRef
Yu, H.[Haeun],
Ko, Y.J.[Young-Joong],
Enriching the dialogue state tracking model with a asyntactic
discourse graph,
PRL(169), 2023, pp. 81-86.
Elsevier DOI
2305
Dialogue state tracking, Task-oriented dialogue system,
Graph attention network,
BibRef
Wu, Y.X.[Yu-Xia],
Liao, L.[Lizi],
Zhang, G.Y.[Gang-Yi],
Lei, W.Q.[Wen-Qiang],
Zhao, G.S.[Guo-Shuai],
Qian, X.M.[Xue-Ming],
Chua, T.S.[Tat-Seng],
State Graph Reasoning for Multimodal Conversational Recommendation,
MultMed(25), 2023, pp. 3113-3124.
IEEE DOI
2309
BibRef
Firdaus, M.[Mauajama],
Thangavelu, N.[Naveen],
Ekbal, A.[Asif],
Bhattacharyya, P.[Pushpak],
I Enjoy Writing and Playing, Do You?: A Personalized and Emotion
Grounded Dialogue Agent Using Generative Adversarial Network,
AffCom(14), No. 3, July 2023, pp. 2127-2138.
IEEE DOI
2310
BibRef
Zhang, Z.[Zefan],
Li, S.[Shun],
Ji, Y.[Yi],
Liu, C.P.[Chun-Ping],
Infer unseen from seen: Relation regularized zero-shot visual dialog,
JVCIR(97), 2023, pp. 103961.
Elsevier DOI
2312
Visual dialog, Zero-shot learning, Attention
BibRef
Qi, Q.S.[Qiao-Song],
Zhang, A.[Aixi],
Liao, Y.[Yue],
Sun, W.Y.[Wen-Yu],
Wang, Y.L.[Yong-Liang],
Li, X.B.[Xiao-Bo],
Liu, S.[Si],
Simultaneously Training and Compressing Vision-and-Language
Pre-Training Model,
MultMed(25), 2023, pp. 8194-8203.
IEEE DOI
2312
BibRef
Liu, A.A.[An-An],
Huang, C.X.[Chen-Xi],
Xu, N.[Ning],
Tian, H.S.[Hong-Shuo],
Liu, J.[Jing],
Zhang, Y.D.[Yong-Dong],
Counterfactual Visual Dialog: Robust Commonsense Knowledge Learning
From Unbiased Training,
MultMed(26), 2024, pp. 1639-1651.
IEEE DOI
2402
Visualization, Commonsense reasoning, History, Task analysis,
Correlation, Knowledge based systems, Computational modeling, counterfactual
BibRef
Ricci, R.[Riccardo],
Bazi, Y.[Yakoub],
Melgani, F.[Farid],
Machine-to-Machine Visual Dialoguing with ChatGPT for Enriched
Textual Image Description,
RS(16), No. 3, 2024, pp. 441.
DOI Link
2402
BibRef
Bulat, A.[Adrian],
Tzimiropoulos, G.[Georgios],
Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and
Zero-Shot Adaptation of V&L Models,
IJCV(132), No. 4, April 2024, pp. 1108-1125.
Springer DOI
2404
BibRef
Earlier:
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of
Vision and Language Models,
CVPR23(23232-23241)
IEEE DOI
2309
BibRef
Wang, A.J.P.[Alex Jin-Peng],
Zhou, P.[Pan],
Shou, M.Z.[Mike Zheng],
Yan, S.C.[Shui-Cheng],
Enhancing Visual Grounding in Vision-Language Pre-Training With
Position-Guided Text Prompts,
PAMI(46), No. 5, May 2024, pp. 3406-3421.
IEEE DOI
2404
BibRef
Earlier:
Position-Guided Text Prompt for Vision-Language Pre-Training,
CVPR23(23242-23251)
IEEE DOI
2309
Task analysis, Visualization, Grounding, Adaptation models,
Object recognition, Detectors, Feature extraction, visual grounding
BibRef
Du, S.S.[Shan-Shan],
Wang, H.[Hanli],
Li, T.[Tengpeng],
Chen, C.W.[Chang Wen],
Hybrid Graph Reasoning With Dynamic Interaction for Visual Dialog,
MultMed(26), 2024, pp. 9095-9108.
IEEE DOI
2409
Visualization, Cognition, Semantics, Task analysis, Routing, History,
Transformers, Cross-modal interaction, dynamic routing, visual dialog
BibRef
Sun, J.T.[Jing-Tao],
Kou, J.Y.[Jia-Yin],
Hou, W.[Wenyan],
Bai, Y.[Yujei],
A multi-agent curiosity reward model for task-oriented dialogue
systems,
PR(157), 2025, pp. 110884.
Elsevier DOI
2409
Task-oriented dialogue systems, Reinforcement learning,
Curiosity rewards, Exploration and exploitation
BibRef
Kane, B.[Benjamin],
Giugno, C.[Catherine],
Schubert, L.[Lenhart],
Haut, K.[Kurtis],
Wohn, C.[Caleb],
Hoque, E.[Ehsan],
Managing Emotional Dialogue for a Virtual Cancer Patient:
A Schema-Guided Approach,
AffCom(15), No. 3, July 2024, pp. 1041-1052.
IEEE DOI
2409
Medical services, Oral communication, Cancer, Training,
Task analysis, Semantics, Planning, Virtual agent, schema-guided
BibRef
Xie, J.Y.[Jia-Yuan],
Chen, J.L.[Jia-Li],
Liu, Z.H.[Zheng-Hao],
Cai, Y.[Yi],
Huang, Q.[Qingbao],
Li, Q.[Qing],
Video Question Generation for Dynamic Changes,
CirSysVideo(34), No. 9, September 2024, pp. 8710-8721.
IEEE DOI Code:
WWW Link.
2410
Feature extraction, Task analysis, Visualization, Dynamics,
Data mining, Decoding, video temporal reasoning
BibRef
Liu, Y.T.[Yi-Ting],
Li, L.[Liang],
Tu, Y.[Yunbin],
Zhang, B.C.[Bei-Chen],
Zha, Z.J.[Zheng-Jun],
Huang, Q.M.[Qing-Ming],
Dynamic Strategy Prompt Reasoning for Emotional Support Conversation,
MultMed(27), 2025, pp. 108-119.
IEEE DOI
2501
Emotion recognition, Oral communication, Commonsense reasoning,
History, Information processing, Generators, Computers,
strategy prompt reasoning
BibRef
Haydarov, K.[Kilichbek],
Shen, X.Q.[Xiao-Qian],
Madasu, A.[Avinash],
Salem, M.[Mahmoud],
Li, L.J.[Li-Jia],
Elsayed, G.[Gamaleldin],
Elhoseiny, M.[Mohamed],
Affective Visual Dialog: A Large-scale Benchmark for Emotional
Reasoning Based on Visually Grounded Conversations,
ECCV24(LXXV: 18-36).
Springer DOI
2412
BibRef
Abdessaied, A.[Adnen],
Shi, L.[Lei],
Bulling, A.[Andreas],
Multi-modal Video Dialog State Tracking in the Wild,
ECCV24(LVII: 348-365).
Springer DOI
2412
BibRef
Yoon, H.S.[Hee Suk],
Yoon, E.[Eunseop],
Tee, J.T.J.[Joshua Tian Jin],
Zhang, K.[Kang],
Heo, Y.J.[Yu-Jung],
Chang, D.S.[Du-Seong],
Yoo, C.D.[Chang D.],
BI-MDRG: Bridging Image History in Multimodal Dialogue Response
Generation,
ECCV24(XXXI: 378-396).
Springer DOI
2412
BibRef
He, Q.Q.[Qiang-Qiang],
Zhang, J.[Jie],
Qian, S.W.[Shu-Wei],
Wang, C.J.[Chong-Jun],
Some Can Be Better than All:
Multimodal Star Transformer for Visual Dialog,
ICIP24(2022-2026)
IEEE DOI
2411
Visualization, Satellites, Computational modeling, Stars,
Linguistics, Transformers, Visual Dialog, Transformer, Multimodal,
Star Transformer
BibRef
Abdessaied, A.[Adnen],
Shi, L.[Lei],
Bulling, A.[Andreas],
VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal
Multi-Modal GRaphs,
WACV24(5793-5802)
IEEE DOI
2404
Visualization, Boosting, Graph neural networks, History, Algorithms,
Vision + language and/or other modalities, Algorithms, Datasets and evaluations
BibRef
Han, S.J.[Seung-Ju],
Hessel, J.[Jack],
Dziri, N.[Nouha],
Choi, Y.[Yejin],
Yu, Y.J.[Young-Jae],
Champagne: Learning Real-world Conversation from Large-Scale Web
Videos,
ICCV23(15452-15463)
IEEE DOI Code:
WWW Link.
2401
BibRef
Oshima, R.[Ryosuke],
Shinagawa, S.[Seitaro],
Tsunashima, H.[Hideki],
Feng, Q.[Qi],
Morishima, S.[Shigeo],
Pointing out Human Answer Mistakes in a Goal-Oriented Visual Dialogue,
VLAR23(4665-4670)
IEEE DOI
2401
BibRef
Ishii, T.[Takahiro],
Miura, J.[Jun],
Hayashi, K.[Kotaro],
Enhancing Human-Robot Collaborative Object Search through Human
Behavior Observation and Dialog,
ACVR23(1841-1848)
IEEE DOI
2401
BibRef
Madasu, A.[Avinash],
Lal, V.[Vasudev],
Is Multimodal Vision Supervision Beneficial to Language?,
NFVLR23(2637-2642)
IEEE DOI
2309
WWW Link.
BibRef
Ashutosh, K.[Kumar],
Girdhar, R.[Rohit],
Torresani, L.[Lorenzo],
Grauman, K.[Kristen],
HierVL: Learning Hierarchical Video-Language Embeddings,
CVPR23(23066-23078)
IEEE DOI
2309
BibRef
Smith, J.S.[James Seale],
Cascante-Bonilla, P.[Paola],
Arbelle, A.[Assaf],
Kim, D.H.[Dong-Hyun],
Panda, R.[Rameswar],
Cox, D.[David],
Yang, D.[Diyi],
Kira, Z.[Zsolt],
Feris, R.S.[Rogerio S.],
Karlinsky, L.[Leonid],
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning*,
CVPR23(14994-15004)
IEEE DOI
2309
BibRef
Chen, Y.X.[Yu-Xin],
Ma, Z.Y.[Zong-Yang],
Zhang, Z.Q.[Zi-Qi],
Qi, Z.A.[Zhong-Ang],
Yuan, C.F.[Chun-Feng],
Shan, Y.[Ying],
Li, B.[Bing],
Hu, W.M.[Wei-Ming],
Qie, X.[Xiaohu],
Wu, J.P.[Jian-Ping],
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval,
CVPR23(11018-11027)
IEEE DOI
2309
BibRef
Huang, J.J.[Jing-Jia],
Li, Y.[Yinan],
Feng, J.S.[Jia-Shi],
Wu, X.L.[Xing-Long],
Sun, X.S.[Xiao-Shuai],
Ji, R.R.[Rong-Rong],
Clover: Towards A Unified Video-Language Alignment and Fusion Model,
CVPR23(14856-14866)
IEEE DOI
2309
BibRef
Li, C.H.[Chuan-Hao],
Li, Z.[Zhen],
Jing, C.C.[Chen-Chen],
Jia, Y.D.[Yun-De],
Wu, Y.W.[Yu-Wei],
Exploring the Effect of Primitives for Compositional Generalization
in Vision-and-Language,
CVPR23(19092-19101)
IEEE DOI
2309
BibRef
Yao, H.T.[Han-Tao],
Zhang, R.[Rui],
Xu, C.S.[Chang-Sheng],
Visual-Language Prompt Tuning with Knowledge-Guided Context
Optimization,
CVPR23(6757-6767)
IEEE DOI
2309
BibRef
Kwon, H.[Hyeongjun],
Song, T.[Taeyong],
Jeong, S.[Somi],
Kim, J.[Jin],
Jang, J.[Jinhyun],
Sohn, K.H.[Kwang-Hoon],
Probabilistic Prompt Learning for Dense Prediction,
CVPR23(6768-6777)
IEEE DOI
2309
BibRef
Luo, H.C.[Hong-Chen],
Zhai, W.[Wei],
Zhang, J.[Jing],
Cao, Y.[Yang],
Tao, D.C.[Da-Cheng],
Leverage Interactive Affinity for Affordance Learning,
CVPR23(6809-6819)
IEEE DOI
2309
BibRef
Bagad, P.[Piyush],
Tapaswi, M.[Makarand],
Snoek, C.G.M.[Cees G.M.],
Test of Time: Instilling Video-Language Models with a Sense of Time,
CVPR23(2503-2516)
IEEE DOI
2309
BibRef
Kang, G.C.[Gi-Cheon],
Kim, S.[Sungdong],
Kim, J.H.[Jin-Hwa],
Kwak, D.H.[Dong-Hyun],
Zhang, B.T.[Byoung-Tak],
The Dialog Must Go On: Improving Visual Dialog via Generative
Self-Training,
CVPR23(6746-6756)
IEEE DOI
2309
BibRef
Bannur, S.[Shruthi],
Hyland, S.[Stephanie],
Liu, Q.[Qianchu],
Pérez-García, F.[Fernando],
Ilse, M.[Maximilian],
Castro, D.C.[Daniel C.],
Boecking, B.[Benedikt],
Sharma, H.[Harshita],
Bouzid, K.[Kenza],
Thieme, A.[Anja],
Schwaighofer, A.[Anton],
Wetscherek, M.[Maria],
Lungren, M.P.[Matthew P.],
Nori, A.[Aditya],
Alvarez-Valle, J.[Javier],
Oktay, O.[Ozan],
Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing,
CVPR23(15016-15027)
IEEE DOI
2309
BibRef
Srinivasan, T.[Tejas],
Ren, X.[Xiang],
Thomason, J.[Jesse],
Curriculum Learning for Data-Efficient Vision-Language Alignment,
ODRUM23(5619-5624)
IEEE DOI
2309
BibRef
Ibing, M.[Moritz],
Lim, I.[Isaak],
Kobbelt, L.[Leif],
Localized Latent Updates for Fine-Tuning Vision-Language Models,
ECV23(4509-4518)
IEEE DOI
2309
BibRef
Zhou, Y.T.[Yu-Tong],
Shimada, N.[Nobutaka],
Vision + Language Applications: A Survey,
GCV23(826-842)
IEEE DOI
2309
BibRef
Parisot, S.[Sarah],
Yang, Y.X.[Yong-Xin],
McDonagh, S.[Steven],
Learning to Name Classes for Vision and Language Models,
CVPR23(23477-23486)
IEEE DOI
2309
BibRef
Kim, S.[Sungwoong],
Jo, D.[Daejin],
Lee, D.[Donghoon],
Kim, J.[Jongmin],
MAGVLT: Masked Generative Vision-and-Language Transformer,
CVPR23(23338-23348)
IEEE DOI
2309
BibRef
Ji, Y.[Yatai],
Wang, J.J.[Jun-Jie],
Gong, Y.[Yuan],
Zhang, L.[Lin],
Zhu, Y.[Yanru],
Wang, H.F.[Hong-Fa],
Zhang, J.X.[Jia-Xing],
Sakai, T.[Tetsuya],
Yang, Y.[Yujiu],
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model,
CVPR23(23262-23271)
IEEE DOI
2309
BibRef
Zhang, X.[Xu],
Wang, W.[Wen],
Chen, Z.[Zhe],
Xu, Y.F.[Yu-Fei],
Zhang, J.[Jing],
Tao, D.C.[Da-Cheng],
CLAMP: Prompt-based Contrastive Learning for Connecting Language and
Animal Pose,
CVPR23(23272-23281)
IEEE DOI
2309
BibRef
Wang, T.[Teng],
Ge, Y.X.[Yi-Xiao],
Zheng, F.[Feng],
Cheng, R.[Ran],
Shan, Y.[Ying],
Qie, X.[Xiaohu],
Luo, P.[Ping],
Accelerating Vision-Language Pretraining with Free Language Modeling,
CVPR23(23161-23170)
IEEE DOI
2309
BibRef
Doveh, S.[Sivan],
Arbelle, A.[Assaf],
Harary, S.[Sivan],
Schwartz, E.[Eli],
Herzig, R.[Roei],
Giryes, R.[Raja],
Feris, R.S.[Rogerio S.],
Panda, R.[Rameswar],
Ullman, S.[Shimon],
Karlinsky, L.[Leonid],
Teaching Structured Vision and Language Concepts to Vision and
Language Models,
CVPR23(2657-2668)
IEEE DOI
2309
BibRef
Chino, A.[Amika],
Teraoka, T.[Takehiro],
Relevance-aware Question Generation in Non-task-oriented Dialogue
Systems,
VAMR23(344-358).
Springer DOI
2307
BibRef
Tang, Z.[Zineng],
Cho, J.[Jaemin],
Lei, J.[Jie],
Bansal, M.[Mohit],
PERCEIVER-VL: Efficient Vision-and-Language Modeling with Iterative
Latent Attention,
WACV23(4399-4409)
IEEE DOI
2302
Training, Analytical models, Scalability, Benchmark testing,
Transformers, Complexity theory, Algorithms:
Vision + language and/or other modalities
BibRef
Tripathi, A.[Aditay],
Mishra, A.[Anand],
Chakraborty, A.[Anirban],
Grounding Scene Graphs on Natural Images via Visio-Lingual Message
Passing,
WACV23(4380-4389)
IEEE DOI
2302
Location awareness, Visualization, Grounding, Message passing,
Image edge detection, Semantics, Directed graphs,
Vision + language and/or other modalities
BibRef
Byun, J.[Jaeseok],
Hwang, T.[Taebaek],
Fu, J.L.[Jian-Long],
Moon, T.[Taesup],
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language
Pre-training,
ECCV22(XIX:395-412).
Springer DOI
2211
WWW Link.
BibRef
Yan, S.P.[Shi-Peng],
Hong, L.Q.[Lan-Qing],
Xu, H.[Hang],
Han, J.H.[Jian-Hua],
Tuytelaars, T.[Tinne],
Li, Z.G.[Zhen-Guo],
He, X.M.[Xu-Ming],
Generative Negative Text Replay for Continual Vision-Language
Pretraining,
ECCV22(XXXVI:22-38).
Springer DOI
2211
BibRef
Zhang, Y.F.[Yi-Feng],
Jiang, M.[Ming],
Zhao, Q.[Qi],
New Datasets and Models for Contextual Reasoning in Visual Dialog,
ECCV22(XXXVI:434-451).
Springer DOI
2211
BibRef
Pham, H.A.[Hoang-Anh],
Le, T.M.[Thao Minh],
Le, V.[Vuong],
Phuong, T.M.[Tu Minh],
Tran, T.[Truyen],
Video Dialog as Conversation About Objects Living in Space-Time,
ECCV22(XXIX:710-726).
Springer DOI
2211
BibRef
Zhang, Z.F.[Ze-Fan],
Jiang, T.L.[Tian-Ling],
Liu, C.P.[Chun-Ping],
Ji, Y.[Yi],
Coupling Attention and Convolution for Heuristic Network in Visual
Dialog,
ICIP22(2896-2900)
IEEE DOI
2211
Couplings, Visualization, Convolution, Semantics, Benchmark testing,
Thalamus, Visual dialog, attention, convolution
BibRef
Zhang, H.Y.[Hang-Yu],
Li, Y.M.[Ying-Ming],
Zhang, Z.F.[Zhong-Fei],
Video-Grounded Dialogues with Joint Video and Image Training,
ICIP22(3903-3907)
IEEE DOI
2211
Training, Visualization, Transformers, Feature extraction,
Data mining, Video-grounded Dialogues, Multimodality,
Transformer
BibRef
Zhang, S.Y.[Shun-Yu],
Jiang, X.Z.[Xiao-Ze],
Yang, Z.Q.[Ze-Qun],
Wan, T.[Tao],
Qin, Z.C.[Zeng-Chang],
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog,
MULA22(4599-4608)
IEEE DOI
2210
Visualization, Fuses, Semantics, Knowledge based systems,
Oral communication, Transformers
BibRef
Zhu, Y.[Yi],
Weng, Y.[Yue],
Zhu, F.D.[Feng-Da],
Liang, X.D.[Xiao-Dan],
Ye, Q.X.[Qi-Xiang],
Lu, Y.T.[Yu-Tong],
Jiao, J.B.[Jian-Bin],
Self-Motivated Communication Agent for Real-World Vision-Dialog
Navigation,
ICCV21(1574-1583)
IEEE DOI
2203
Costs, Uncertainty, Navigation, Annotations, Reinforcement learning,
Optimization, Vision+language,
BibRef
Engin, D.[Deniz],
Schnitzler, F.[François],
Duong, N.Q.K.[Ngoc Q. K.],
Avrithis, Y.[Yannis],
On the hidden treasure of dialog in video question answering,
ICCV21(2044-2053)
IEEE DOI
2203
Location awareness, TV, Codes, Video description, Annotations,
Knowledge based systems, Video analysis and understanding, Vision + language
BibRef
Matsumori, S.[Shoya],
Shingyouchi, K.[Kosuke],
Abe, Y.[Yuki],
Fukuchi, Y.[Yosuke],
Sugiura, K.[Komei],
Imai, M.[Michita],
Unified Questioner Transformer for Descriptive Question Generation in
Goal-Oriented Visual Dialogue,
ICCV21(1878-1887)
IEEE DOI
2203
Visualization, Buildings, Transformers,
Task analysis, Artificial intelligence, Vision + language,
Visual reasoning and logical representation
BibRef
Tu, T.[Tao],
Ping, Q.[Qing],
Thattai, G.[Govindarajan],
Tur, G.[Gokhan],
Natarajan, P.[Prem],
Learning Better Visual Dialog Agents with Pretrained
Visual-Linguistic Representation,
CVPR21(5618-5627)
IEEE DOI
2111
Visualization, Games, Reinforcement learning,
Generators, Encoding
BibRef
Jiang, T.L.[Tian-Ling],
Ji, Y.[Yi],
Liu, C.P.[Chun-Ping],
Integrating Historical States and Co-attention Mechanism for Visual
Dialog,
ICPR21(2041-2048)
IEEE DOI
2105
Visualization, Benchmark testing, Cognition,
History, Task analysis, Faces
BibRef
Nguyen, V.Q.[Van-Quang],
Suganuma, M.[Masanori],
Okatani, T.[Takayuki],
Efficient Attention Mechanism for Visual Dialog that Can Handle All the
Interactions Between Multiple Inputs,
ECCV20(XXIV:223-240).
Springer DOI
2012
BibRef
Murahari, V.[Vishvak],
Batra, D.[Dhruv],
Parikh, D.[Devi],
Das, A.[Abhishek],
Large-scale Pretraining for Visual Dialog:
A Simple State-of-the-art Baseline,
ECCV20(XVIII:336-352).
Springer DOI
2012
BibRef
Zhu, Y.[Ye],
Wu, Y.[Yu],
Yang, Y.[Yi],
Yan, Y.[Yan],
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents,
ECCV20(XXIII:153-169).
Springer DOI
2011
BibRef
Qi, J.,
Niu, Y.,
Huang, J.,
Zhang, H.,
Two Causal Principles for Improving Visual Dialog,
CVPR20(10857-10866)
IEEE DOI
2008
Visualization, History, Task analysis, Data models, Training, Feeds, Decoding
BibRef
Abbasnejad, E.[Ehsan],
Teney, D.[Damien],
Parvaneh, A.[Amin],
Shi, J.[Javen],
van den Hengel, A.J.[Anton J.],
Counterfactual Vision and Language Learning,
CVPR20(10041-10051)
IEEE DOI
2008
Training, Visualization, Training data, Task analysis,
Machine learning, Knowledge discovery, Data models
BibRef
Zhu, Y.,
Zhu, F.,
Zhan, Z.,
Lin, B.,
Jiao, J.,
Chang, X.,
Liang, X.,
Vision-Dialog Navigation by Exploring Cross-Modal Memory,
CVPR20(10727-10736)
IEEE DOI
2008
Navigation, Visualization, Task analysis, History, Memory modules,
Natural languages, Decision making
BibRef
Yang, T.,
Zha, Z.,
Zhang, H.,
Making History Matter:
History-Advantage Sequence Training for Visual Dialog,
ICCV19(2561-2569)
IEEE DOI
2004
image retrieval, image sequences, interactive systems, neural nets,
question answering (information retrieval), Decoding
BibRef
Guo, D.[Dalu],
Xu, C.[Chang],
Tao, D.C.[Da-Cheng],
Image-Question-Answer Synergistic Network for Visual Dialog,
CVPR19(10426-10435).
IEEE DOI
2002
BibRef
Zheng, Z.L.[Zi-Long],
Wang, W.G.[Wen-Guan],
Qi, S.Y.[Si-Yuan],
Zhu, S.C.[Song-Chun],
Reasoning Visual Dialogs With Structural and Partial Observations,
CVPR19(6662-6671).
IEEE DOI
2002
BibRef
Bani, G.[Gabriele],
Belli, D.[Davide],
Dagan, G.[Gautier],
Geenen, A.[Alexander],
Skliar, A.[Andrii],
Venkatesh, A.[Aashish],
Baumgärtner, T.[Tim],
Bruni, E.[Elia],
Fernández, R.[Raquel],
Adding Object Detection Skills to Visual Dialogue Agents,
VL18(IV:180-187).
Springer DOI
1905
BibRef
Yang, M.,
Yang, N.S.R.,
Zhang, K.,
Tao, J.,
Self-Talk: Responses to Users' Opinions and Challenges in Human
Computer Dialog,
ICPR18(2839-2844)
IEEE DOI
1812
History, Robots, Databases, Predictive models,
Automation, Search engines, human computer dialog, abstract extraction
BibRef
Jain, U.,
Schwing, A.,
Lazebnik, S.,
Two Can Play This Game: Visual Dialog with Discriminative Question
Generation and Answering,
CVPR18(5754-5763)
IEEE DOI
1812
Visualization, Task analysis, History, Knowledge discovery,
Measurement, Training, Computer architecture
BibRef
Dokania, P.K.,
Torr, P.H.S.,
Siddharth, N.,
Massiceti, D.,
FLIPDIAL: A Generative Model for Two-Way Visual Dialogue,
CVPR18(6097-6105)
IEEE DOI
1812
Visualization, Task analysis, Computational modeling, History,
Data models, Pediatrics, Image color analysis
BibRef
Wu, Q.,
Wang, P.,
Shen, C.,
Reid, I.D.,
van den Hengel, A.J.[Anton J.],
Are You Talking to Me? Reasoned Visual Dialog Generation Through
Adversarial Learning,
CVPR18(6106-6115)
IEEE DOI
1812
Visualization, Task analysis, Generators, History,
Computational modeling, Image color analysis
BibRef
Kottur, S.[Satwik],
Moura, J.M.F.[José M. F.],
Parikh, D.[Devi],
Batra, D.[Dhruv],
Rohrbach, M.[Marcus],
Visual Coreference Resolution in Visual Dialog Using Neural Module
Networks,
ECCV18(XV: 160-178).
Springer DOI
1810
BibRef
Strub, F.[Florian],
Seurin, M.[Mathieu],
Perez, E.[Ethan],
de Vries, H.[Harm],
Mary, J.[Jérémie],
Preux, P.[Philippe],
Courville, A.[Aaron],
Pietquin, O.[Olivier],
Visual Reasoning with Multi-hop Feature Modulation,
ECCV18(VI: 808-831).
Springer DOI
1810
BibRef
Das, A.,
Kottur, S.,
Moura, J.M.F.,
Lee, S.,
Batra, D.,
Learning Cooperative Visual Dialog Agents with Deep Reinforcement
Learning,
ICCV17(2970-2979)
IEEE DOI
1802
interactive systems, learning (artificial intelligence),
multi-agent systems, natural language interfaces, robot vision,
Visualization
BibRef
de Vries, H.[Harm],
Strub, F.[Florian],
Chandar, S.[Sarath],
Pietquin, O.[Olivier],
Larochelle, H.[Hugo],
Courville, A.[Aaron],
GuessWhat?! Visual Object Discovery through Multi-modal Dialogue,
CVPR17(4466-4475)
IEEE DOI
1711
Databases, Games, Knowledge discovery,
Natural languages, Visualization
BibRef
Nam, H.[Hyeonseob],
Ha, J.W.[Jung-Woo],
Kim, J.[Jeonghee],
Dual Attention Networks for Multimodal Reasoning and Matching,
CVPR17(2156-2164)
IEEE DOI
1711
Cognition, Knowledge discovery, Mathematical model,
Neural networks, Semantics, Visualization
BibRef
Johnson, J.[Justin],
Hariharan, B.[Bharath],
van der Maaten, L.[Laurens],
Hoffman, J.,
Fei-Fei, L.[Li],
Zitnick, C.L.[C. Lawrence],
Girshick, R.[Ross],
Inferring and Executing Programs for Visual Reasoning,
ICCV17(3008-3017)
IEEE DOI
1802
BibRef
Earlier: A1, A2, A3, A5, A6, A7, Only:
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
Visual Reasoning,
CVPR17(1988-1997)
IEEE DOI
1711
Dataset, Visual Reasoning.
WWW Link.
backpropagation, image matching,
learning (artificial intelligence), neural nets,
Visualization.
Cognition, Image color analysis, Metals, Semantics, Shape.
BibRef
Das, A.[Abhishek],
Kottur, S.[Satwik],
Gupta, K.[Khushi],
Singh, A.[Avi],
Yadav, D.[Deshraj],
Moura, J.M.F.[José M. F.],
Parikh, D.[Devi],
Batra, D.[Dhruv],
Visual Dialog,
CVPR17(1080-1089)
IEEE DOI
1711
Hold a dialog with humans in a natural visual context.
History, Knowledge discovery, Protocols, Visualization, Wheelchairs
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Large Language Models for Vision, LLM, LVLM .