Wu, Y.C.[Yu-Chieh],
Yang, J.C.[Jie-Chi],
A Robust Passage Retrieval Algorithm for Video Question Answering,
CirSysVideo(18), No. 10, October 2008, pp. 1411-1421.
IEEE DOI
0811
BibRef
Wu, Y.C.[Yu-Chieh],
Lee, Y.S.[Yue-Shi],
Yang, J.C.[Jie-Chi],
Yen, S.J.[Show-Jane],
A New Passage Ranking Algorithm for Video Question Answering,
PSIVT06(563-572).
Springer DOI
0612
BibRef
Li, G.D.[Guang-Da],
Li, H.J.[Hao-Jie],
Ming, Z.Y.[Zhao-Yan],
Hong, R.C.[Ri-Chang],
Tang, S.[Sheng],
Chua, T.S.[Tat-Seng],
Question Answering over Community-Contributed Web Videos,
MultMedMag(17), No. 4, October-December 2010, pp. 46-57.
IEEE DOI
1011
BibRef
Song, Y.C.[Yi-Cheng],
Li, H.J.[Hao-Jie],
Mash-Up Approach for Web Video Category Recommendation,
PSIVT10(197-202).
IEEE DOI
1011
BibRef
Guo, Z.Y.[Zhao-Yu],
Zhao, Z.[Zhou],
Jin, W.[Weike],
Wei, Z.C.[Zhi-Cheng],
Yang, M.[Min],
Wang, N.N.[Nan-Nan],
Yuan, N.J.[Nicholas Jing],
Multi-Turn Video Question Generation via Reinforced Multi-Choice
Attention Network,
CirSysVideo(31), No. 5, 2021, pp. 1697-1710.
IEEE DOI
2105
BibRef
Xue, H.Y.[Hong-Yang],
Chu, W.,
Zhao, Z.[Zhou],
Cai, D.[Deng],
A Better Way to Attend: Attention With Trees for Video Question
Answering,
IP(27), No. 11, November 2018, pp. 5563-5574.
IEEE DOI
1809
computational linguistics, feature extraction, grammars,
natural language processing, scene understanding
BibRef
Xue, H.Y.[Hong-Yang],
Zhao, Z.[Zhou],
Cai, D.[Deng],
Unifying the Video and Question Attentions for Open-Ended Video
Question Answering,
IP(26), No. 12, December 2017, pp. 5656-5666.
IEEE DOI
1710
image retrieval, video coding,
temporal question attention, temporal structures,
Adaptation models, Coherence, Hair, Knowledge discovery,
BibRef
Zhao, Z.[Zhou],
Xiao, S.W.[Shu-Wen],
Song, Z.[Zehan],
Lu, C.J.[Chu-Jie],
Xiao, J.[Jun],
Zhuang, Y.T.[Yue-Ting],
Open-Ended Video Question Answering via Multi-Modal Conditional
Adversarial Networks,
IP(29), 2020, pp. 3859-3870.
IEEE DOI
2002
Open-ended video question answering, multi-modal neural network
BibRef
Zhao, Z.[Zhou],
Zhang, Z.[Zhu],
Xiao, S.W.[Shu-Wen],
Xiao, Z.X.[Zhen-Xin],
Yan, X.H.[Xiao-Hui],
Yu, J.[Jun],
Cai, D.[Deng],
Wu, F.[Fei],
Long-Form Video Question Answering via Dynamic Hierarchical
Reinforced Networks,
IP(28), No. 12, December 2019, pp. 5939-5952.
IEEE DOI
1909
Knowledge discovery, Semantics, Visualization, Natural languages,
Road transportation, Task analysis, Decoding,
reinforcement learning
BibRef
Yu, T.[Ting],
Yu, J.[Jun],
Yu, Z.[Zhou],
Huang, Q.M.[Qing-Ming],
Tian, Q.[Qi],
Long-Term Video Question Answering via Multimodal Hierarchical Memory
Attentive Networks,
CirSysVideo(31), No. 3, March 2021, pp. 931-944.
IEEE DOI
2103
Knowledge discovery, Cognition, Visualization, Task analysis,
Semantics, Engines, Computational modeling, Long-term,
in-depth reasoning
BibRef
Jang, Y.[Yunseok],
Song, Y.[Yale],
Kim, C.D.[Chris Dongjoo],
Yu, Y.[Youngjae],
Kim, Y.[Youngjin],
Kim, G.[Gunhee],
Video Question Answering with Spatio-Temporal Reasoning,
IJCV(127), No. 10, October 2019, pp. 1385-1412.
Springer DOI
1909
BibRef
Earlier: A1, A2, A4, A5, A6, Only:
TGIF-QA:
Toward Spatio-Temporal Reasoning in Visual Question Answering,
CVPR17(1359-1367)
IEEE DOI
1711
Cognition, Crowdsourcing, Image color analysis,
Knowledge discovery, Motion pictures, Visualization
BibRef
Yu, T.,
Yu, J.,
Yu, Z.,
Tao, D.,
Compositional Attention Networks With Two-Stream Fusion for Video
Question Answering,
IP(29), No. , 2020, pp. 1204-1218.
IEEE DOI
1911
Visualization, Streaming media, Knowledge discovery,
Feature extraction, Proposals, Task analysis, Semantics,
action pooling stream
BibRef
Wang, W.N.[Wei-Ning],
Huang, Y.[Yan],
Wang, L.[Liang],
Long video question answering: A Matching-guided Attention Model,
PR(102), 2020, pp. 107248.
Elsevier DOI
2003
Long video QA, Matching-guided attention
BibRef
Zhang, W.,
Tang, S.,
Cao, Y.,
Pu, S.,
Wu, F.,
Zhuang, Y.,
Frame Augmented Alternating Attention Network for Video Question
Answering,
MultMed(22), No. 4, April 2020, pp. 1032-1041.
IEEE DOI
2004
Feature extraction, Visualization, Knowledge discovery,
Task analysis, Data mining, Neural networks, Semantics, Video QA,
neural network
BibRef
Chen, J.[Jie],
Shao, J.[Jie],
He, C.[Chengkun],
Movie fill in the blank by joint learning from video and text with
adaptive temporal attention,
PRL(132), 2020, pp. 62-68.
Elsevier DOI
2005
Video question answering, Adaptive temporal attention, Text information fusion
BibRef
Wang, A.,
Luu, A.T.,
Foo, C.,
Zhu, H.,
Tay, Y.,
Chandrasekhar, V.,
Holistic Multi-Modal Memory Network for Movie Question Answering,
IP(29), No. 1, 2020, pp. 489-499.
IEEE DOI
1910
question answering (information retrieval),
holistic multimodal memory network, multimodal context,
MovieQA
BibRef
Yuan, Z.Q.[Zhao-Quan],
Sun, S.Y.[Si-Yuan],
Duan, L.X.[Li-Xin],
Li, C.S.[Chang-Sheng],
Wu, X.[Xiao],
Xu, C.S.[Chang-Sheng],
Adversarial Multimodal Network for Movie Story Question Answering,
MultMed(23), 2021, pp. 1744-1756.
IEEE DOI
2106
Knowledge discovery, Motion pictures, Visualization, Task analysis,
Generators, Natural languages,
multimodal understanding
BibRef
Gu, M.,
Zhao, Z.,
Jin, W.,
Hong, R.,
Wu, F.,
Graph-Based Multi-Interaction Network for Video Question Answering,
IP(30), 2021, pp. 2758-2770.
IEEE DOI
2102
Visualization, Knowledge discovery, Cats, Semantics, Task analysis,
Image segmentation, Adaptation models, Video question answering,
graph-based relation-aware neural network
BibRef
Xie, Z.[Zhao],
Wu, K.W.[Ke-Wei],
Zhang, X.Y.[Xiao-Yu],
Yang, X.M.[Xing-Ming],
Hou, J.K.[Jin-Kui],
Learning continuous temporal embedding of videos using pattern theory,
PRL(146), 2021, pp. 222-229.
Elsevier DOI
2105
Action Recognition, Continuous Temporal Embedding, Pattern Theory, CNN, LSTM
BibRef
Liu, Y.[Yun],
Zhang, X.M.[Xiao-Ming],
Zhang, Q.Y.[Qian-Yun],
Li, C.Z.[Chao-Zhuo],
Huang, F.[Feiran],
Tang, X.H.[Xiang-Hong],
Li, Z.J.[Zhou-Jun],
Dual self-attention with co-attention networks for visual question
answering,
PR(117), 2021, pp. 107956.
Elsevier DOI
2106
Self-attention, Visual-textual co-attention, Visual question answering
BibRef
Liu, Y.[Yun],
Zhang, X.M.[Xiao-Ming],
Huang, F.[Feiran],
Shen, S.X.[Shi-Xun],
Tian, P.[Peng],
Li, L.[Lang],
Li, Z.J.[Zhou-Jun],
Dynamic Self-Attention with Vision Synchronization Networks for Video
Question Answering,
PR(132), 2022, pp. 108959.
Elsevier DOI
2209
Video question answering, Dynamic self-attention, Vision synchronization
BibRef
Liu, Y.[Yun],
Zhang, X.M.[Xiao-Ming],
Huang, F.[Feiran],
Zhang, B.[Bo],
Li, Z.J.[Zhou-Jun],
Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video
Question Answering,
IP(31), 2022, pp. 1684-1696.
IEEE DOI
2202
Semantics, Correlation, Cognition, Visualization,
Knowledge discovery, Task analysis, Head, Video question answering,
inter- and intra-modality correlations
BibRef
Jin, W.[Weike],
Zhao, Z.[Zhou],
Cao, X.C.[Xiao-Chun],
Zhu, J.M.[Jie-Ming],
He, X.Q.[Xiu-Qiang],
Zhuang, Y.T.[Yue-Ting],
Adaptive Spatio-Temporal Graph Enhanced Vision-Language
Representation for Video QA,
IP(30), 2021, pp. 5477-5489.
IEEE DOI
2106
Visualization, Task analysis, Adaptation models, Bit error rate,
Knowledge discovery, Cognition, Training,
video question answering
BibRef
Gao, L.[Lianli],
Chen, T.M.[Tang-Ming],
Li, X.P.[Xiang-Peng],
Zeng, P.P.[Peng-Peng],
Zhao, L.[Lei],
Li, Y.F.[Yuan-Fang],
Generalized pyramid co-attention with learnable aggregation net for
video question answering,
PR(120), 2021, pp. 108145.
Elsevier DOI
2109
Video question answering, Diversity learning,
Learnable aggregation, Cascaded pyramid transformer co-attention
BibRef
Le, T.M.[Thao Minh],
Le, V.[Vuong],
Venkatesh, S.[Svetha],
Tran, T.[Truyen],
Hierarchical Conditional Relation Networks for Multimodal Video
Question Answering,
IJCV(129), No. 11, November 2021, pp. 3027-3050.
Springer DOI
2110
BibRef
Earlier:
Hierarchical Conditional Relation Networks for Video Question
Answering,
CVPR20(9969-9978)
IEEE DOI
2008
Linguistics, Cognition, Visualization,
Context modeling, Encoding, Buildings
BibRef
Su, H.T.[Hung-Ting],
Chang, C.H.[Chen-Hsi],
Shen, P.W.[Po-Wei],
Wang, Y.S.[Yu-Siang],
Chang, Y.L.[Ya-Liang],
Chang, Y.C.[Yu-Cheng],
Cheng, P.J.[Pu-Jen],
Hsu, W.H.[Winston H.],
End-to-End Video Question-Answer Generation With Generator-Pretester
Network,
CirSysVideo(31), No. 11, November 2021, pp. 4497-4507.
IEEE DOI
2112
Training, Task analysis, Knowledge discovery, Proposals,
Streaming media, Generators, Data models, Video question answering,
pretester network
BibRef
Gao, L.L.[Lian-Li],
Lei, Y.[Yu],
Zeng, P.P.[Peng-Peng],
Song, J.K.[Jing-Kuan],
Wang, M.[Meng],
Shen, H.T.[Heng Tao],
Hierarchical Representation Network With Auxiliary Tasks for Video
Captioning and Video Question Answering,
IP(31), 2022, pp. 202-215.
IEEE DOI
2112
Task analysis, Visualization, Semantics, Knowledge discovery,
Artificial neural networks, Syntactics, Decoding, Video captioning,
auxiliary task
BibRef
Zhang, J.P.[Ji-Peng],
Shao, J.[Jie],
Cao, R.[Rui],
Gao, L.L.[Lian-Li],
Xu, X.[Xing],
Shen, H.T.[Heng Tao],
Action-Centric Relation Transformer Network for Video Question
Answering,
CirSysVideo(32), No. 1, January 2022, pp. 63-74.
IEEE DOI
2201
Feature extraction, Visualization, Cognition, Task analysis,
Knowledge discovery, Proposals, Encoding, Video question answering,
relation reasoning
BibRef
Zhang, H.[Hao],
Sun, A.[Aixin],
Jing, W.[Wei],
Zhen, L.L.[Liang-Li],
Zhou, J.T.Y.[Joey Tian-Yi],
Goh, R.S.M.[Rick Siow Mong],
Natural Language Video Localization: A Revisit in Span-Based Question
Answering Framework,
PAMI(44), No. 8, August 2022, pp. 4252-4266.
IEEE DOI
2207
Location awareness, Knowledge discovery, Task analysis, Standards,
Feature extraction, Degradation, Semantics,
cross-modal interaction
BibRef
Wang, J.Y.[Jian-Yu],
Bao, B.K.[Bing-Kun],
Xu, C.S.[Chang-Sheng],
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question
Answering,
MultMed(24), 2022, pp. 3369-3380.
IEEE DOI
2207
Cognition, Visualization, Task analysis, Knowledge discovery,
Feature extraction, Fuses, Dogs, Video question answering, multi-modal
BibRef
Zeng, P.P.[Peng-Peng],
Zhang, H.N.[Hao-Nan],
Gao, L.[Lianli],
Song, J.K.[Jing-Kuan],
Shen, H.T.[Heng Tao],
Video Question Answering With Prior Knowledge and Object-Sensitive
Learning,
IP(31), 2022, pp. 5936-5948.
IEEE DOI
2209
Cognition, Visualization, Task analysis,
Question answering (information retrieval), Semantics, Ice, object learning
BibRef
Gan, Z.[Zhe],
Li, L.J.[Lin-Jie],
Li, C.Y.[Chun-Yuan],
Wang, L.J.[Li-Juan],
Liu, Z.C.[Zi-Cheng],
Gao, J.F.[Jian-Feng],
Vision-Language Pre-Training:
Basics, Recent Advances, and Future Trends,
FTCGV(14), No. 3-4, 2022, pp. 163-352.
DOI Link Video analysis and event recognition, Learning and statistical methods,
Object and scene recognition, Image and video retrieval
BibRef
2200
Zhang, F.[Fuwei],
Wang, R.M.[Ruo-Mei],
Zhou, F.[Fan],
Luo, Y.M.[Yuan-Mao],
ERM: Energy-Based Refined-Attention Mechanism for Video Question
Answering,
CirSysVideo(33), No. 3, March 2023, pp. 1454-1467.
IEEE DOI
2303
Spatiotemporal phenomena, Visualization,
Object oriented modeling, Transformers, Task analysis, Neurons,
pseudo-related information
BibRef
Yang, J.[Jonghyeon],
Jang, H.[Hanme],
Yu, K.[Kiyun],
Analyzing Geographic Questions Using Embedding-based Topic Modeling,
IJGI(12), No. 2, 2023, pp. xx-yy.
DOI Link
2303
BibRef
Zhao, S.W.[Sheng-Wei],
Liu, Y.Y.[Yu-Ying],
Du, S.[Shaoyi],
Tian, Z.Q.[Zhi-Qiang],
Qu, T.[Ting],
Xu, L.H.[Lin-Hai],
CMFG: Cross-model Fine-grained Feature Interaction for Text-video
Retrieval,
MMMod23(II: 435-445).
Springer DOI
2304
BibRef
Luo, H.N.[Hao-Nan],
Lin, G.S.[Guo-Sheng],
Yao, Y.Z.[Ya-Zhou],
Liu, F.Y.[Fa-Yao],
Liu, Z.C.[Zi-Chuan],
Tang, Z.M.[Zhen-Min],
Depth and Video Segmentation Based Visual Attention for Embodied
Question Answering,
PAMI(45), No. 6, June 2023, pp. 6807-6819.
IEEE DOI
2305
BibRef
Earlier: A1, A2, A5, A4, A6, A3:
SegEQA: Video Segmentation Based Visual Attention for Embodied
Question Answering,
ICCV19(9666-9675)
IEEE DOI
2004
Visualization, Semantics, Navigation, Task analysis,
Image segmentation, Knowledge discovery, Feature extraction, navigation.
feature extraction, image fusion, question answering (information retrieval),
feature fusion
BibRef
Zhang, X.[Xi],
Zhang, F.F.[Fei-Fei],
Xu, C.S.[Chang-Sheng],
Reducing Vision-Answer Biases for Multiple-Choice VQA,
IP(32), 2023, pp. 4621-4634.
IEEE DOI
2309
BibRef
Xiao, J.B.[Jun-Bin],
Zhou, P.[Pan],
Yao, A.[Angela],
Li, Y.C.[Yi-Cong],
Hong, R.C.[Ri-Chang],
Yan, S.C.[Shui-Cheng],
Chua, T.S.[Tat-Seng],
Contrastive Video Question Answering via Video Graph Transformer,
PAMI(45), No. 11, November 2023, pp. 13265-13280.
IEEE DOI
2310
BibRef
Earlier: A1, A2, A7, A6, Only:
Video Graph Transformer for Video Question Answering,
ECCV22(XXXVI:39-58).
Springer DOI
2211
BibRef
Shen, W.X.[Wen-Xue],
Song, J.K.[Jing-Kuan],
Zhu, X.S.[Xiao-Su],
Li, G.F.[Gong-Fu],
Shen, H.T.[Heng Tao],
End-to-End Pre-Training With Hierarchical Matching and Momentum
Contrast for Text-Video Retrieval,
IP(32), 2023, pp. 5017-5030.
IEEE DOI Code:
WWW Link.
2310
BibRef
Jiang, J.J.[Jing-Jing],
Liu, Z.[Ziyi],
Zheng, N.N.[Nan-Ning],
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video
Question Answering,
MultMed(25), 2023, pp. 5002-5013.
IEEE DOI
2311
BibRef
Xu, F.F.[Fei-Fei],
Zhu, Y.[Yitao],
Wang, C.[Chun],
Cao, Y.Z.[Yang-Ze],
Zhong, Z.[Zheng],
Li, X.M.[Xiong-Min],
Spatio-Temporal Two-stage Fusion for video question answering,
CVIU(237), 2023, pp. 103821.
Elsevier DOI
2311
Video question answering, Vision transformer, Spatio-temporal two-stage fusion
BibRef
Wang, Y.Y.[Yuan-Yuan],
Liu, M.[Meng],
Wu, J.L.[Jian-Long],
Nie, L.Q.[Li-Qiang],
Multi-Granularity Interaction and Integration Network for Video
Question Answering,
CirSysVideo(33), No. 12, December 2023, pp. 7684-7695.
IEEE DOI
2312
BibRef
Bai, Z.[Ziyi],
Wang, R.P.[Rui-Ping],
Gao, D.F.[Di-Fei],
Chen, X.L.[Xi-Lin],
Event Graph Guided Compositional Spatial-Temporal Reasoning for Video
Question Answering,
IP(33), 2024, pp. 1109-1121.
IEEE DOI Code:
WWW Link.
2402
Visualization, Cognition, Transformers, Semantics,
Feature extraction, Context modeling, Task analysis, VideoQA,
compositional reasoning
BibRef
Qian, T.W.[Tian-Wen],
Cui, R.[Ran],
Chen, J.J.[Jing-Jing],
Peng, P.[Pai],
Guo, X.W.[Xiao-Wei],
Jiang, Y.G.[Yu-Gang],
Locate Before Answering: Answer Guided Question Localization for
Video Question Answering,
MultMed(26), 2024, pp. 4554-4563.
IEEE DOI
2403
Location awareness, Proposals, Feature extraction,
Question answering (information retrieval), Annotations, cross-modal learning
BibRef
Cheng, Y.[Yi],
Fan, H.[Hehe],
Lin, D.Y.[Dong-Yun],
Sun, Y.[Ying],
Kankanhalli, M.[Mohan],
Lim, J.H.[Joo-Hwee],
Keyword-Aware Relative Spatio-Temporal Graph Networks for Video
Question Answering,
MultMed(26), 2024, pp. 6131-6141.
IEEE DOI
2404
Cognition, Dogs, Feature extraction, Visualization, Semantics,
Question answering (information retrieval), Task analysis,
spatial-temporal graph
BibRef
Jiang, Y.M.[Yi-Min],
Yan, T.[Tingfei],
Yao, M.Z.[Ming-Ze],
Wang, H.[Huibing],
Liu, W.Z.[Wen-Zhe],
Cascade transformers with dynamic attention for video question
answering,
CVIU(242), 2024, pp. 103983.
Elsevier DOI
2404
Video question answering, Cascade transformers, Dynamic attention
BibRef
Yu, T.[Ting],
Fu, K.[Kunhao],
Zhang, J.[Jian],
Huang, Q.M.[Qing-Ming],
Yu, J.[Jun],
Multi-Granularity Contrastive Cross-Modal Collaborative Generation
for End-to-End Long-Term Video Question Answering,
IP(33), 2024, pp. 3115-3129.
IEEE DOI
2405
Task analysis, Semantics, Object oriented modeling, Collaboration,
Cognition, Visualization, Self-supervised learning,
end-to-end modeling
BibRef
Liu, J.[Jin],
Wang, G.X.[Guo-Xiang],
Xie, J.L.[Jia-Long],
Zhou, F.Y.[Feng-Yu],
Xu, H.J.[Hui-Juan],
Video Question Answering with Semantic Disentanglement and Reasoning,
CirSysVideo(34), No. 5, May 2024, pp. 3663-3673.
IEEE DOI
2405
Semantics, Visualization,
Question answering (information retrieval), Task analysis,
multi-granularity language module
BibRef
Nie, J.[Jie],
Wang, X.[Xin],
Hou, R.[Runze],
Li, G.H.[Guo-Hao],
Chen, H.[Hong],
Zhu, W.W.[Wen-Wu],
Dynamic Spatio-Temporal Graph Reasoning for VideoQA With
Self-Supervised Event Recognition,
IP(33), 2024, pp. 4145-4158.
IEEE DOI
2407
Videos, Visualization, Cognition, Task analysis, Semantics,
Question answering (information retrieval), Feature extraction,
spatio-temporal graph
BibRef
Lee, S.[Sangmin],
Kim, H.I.[Hyung-Il],
Ro, Y.M.[Yong Man],
Text-guided distillation learning to diversify video embeddings for
text-video retrieval,
PR(156), 2024, pp. 110754.
Elsevier DOI
2408
text-video retrieval, One-to-many correspondence,
Diverse video embedding, Text-guided distillation learning, Text-agnostic
BibRef
Fei, H.[Hao],
Wu, S.Q.[Sheng-Qiong],
Zhang, M.[Meishan],
Zhang, M.[Min],
Chua, T.S.[Tat-Seng],
Yan, S.C.[Shui-Cheng],
Enhancing Video-Language Representations With Structural
Spatio-Temporal Alignment,
PAMI(46), No. 12, December 2024, pp. 7701-7719.
IEEE DOI
2411
Videos, Semantics, Transformers, Task analysis, Dynamics, Data models,
Training, Scene graphs, spatio-temporal grounding,
video-language understanding
BibRef
Min, J.[Juhong],
Buch, S.[Shyamal],
Nagrani, A.[Arsha],
Cho, M.[Minsu],
Schmid, C.[Cordelia],
MoReVQA: Exploring Modular Reasoning Models for Video Question
Answering,
CVPR24(13235-13245)
IEEE DOI
2410
Visualization, Grounding, Pipelines, Benchmark testing,
Question answering (information retrieval), Cognition, captioning
BibRef
Zou, B.[Bo],
Yang, C.[Chao],
Qiao, Y.[Yu],
Quan, C.B.[Cheng-Bin],
Zhao, Y.J.[You-Jian],
Language-aware Visual Semantic Distillation for Video Question
Answering,
CVPR24(27103-27113)
IEEE DOI
2410
Visualization, Computational modeling, Semantics,
Benchmark testing, Question answering (information retrieval),
Question-Answering
BibRef
Liang, T.M.[Tian-Ming],
Tan, C.[Chaolei],
Xia, B.[Beihao],
Zheng, W.S.[Wei-Shi],
Hu, J.F.[Jian-Fang],
Ranking Distillation for Open-Ended Video Question Answering with
Insufficient Labels,
CVPR24(13161-13170)
IEEE DOI
2410
Adaptation models, Visualization, Annotations, Resists, Manuals,
Benchmark testing, Question answering (information retrieval), Video-QA
BibRef
Wu, J.M.[Jin-Meng],
Shu, P.C.[Peng-Cheng],
Hong, H.Y.[Han-Yu],
Ma, L.[Lei],
Zhu, Y.[Ying],
Wang, L.[Lei],
Pre-trained Bidirectional Dynamic Memory Network For Long Video
Question Answering,
Crowded24(5550-5557)
IEEE DOI
2410
Visualization, Computational modeling, Video sequences, Semantics,
Stars, Feature extraction, Motion pictures, Memory network,
Multi-event reasoning
BibRef
Inoue, Y.[Yuichi],
Yada, Y.[Yuki],
Tanahashi, K.[Kotaro],
Yamaguchi, Y.[Yu],
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous
Driving Datasets using Markup Annotations,
LLVMCrive24(930-938)
IEEE DOI Code:
WWW Link.
2404
Visualization, Annotations, Computational modeling, Focusing,
Question answering (information retrieval)
BibRef
Park, S.Y.[Sung-Yeon],
Lee, M.J.[Min-Jae],
Kang, J.H.[Ji-Hyuk],
Choi, H.[Hahyeon],
Park, Y.[Yoonah],
Cho, J.[Juhwan],
Lee, A.[Adam],
Kim, D.K.[Dong-Kyu],
VLAAD: Vision and Language Assistant for Autonomous Driving,
LLVMCrive24(980-987)
IEEE DOI Code:
WWW Link.
2404
Visualization, Refining, Natural languages, Decision making,
Oral communication, Data models, Task analysis
BibRef
Fang, J.Z.Y.[Jacob Zhi-Yuan],
Zheng, S.[Skyler],
Sharma, V.[Vasu],
Piramuthu, R.[Robinson],
epislon-ViLM: Efficient Video-Language Model via Masked Video Modeling
with Semantic Vector-Quantized Tokenizer,
Pretrain24(529-540)
IEEE DOI
2404
Visualization, Computational modeling, Architecture, Semantics,
Buildings, Focusing, Question answering (information retrieval)
BibRef
Zonneveld, A.[Anne],
Gatt, A.[Albert],
Calixto, I.[Iacer],
Video-and-Language (VidL) models and their cognitive relevance,
MMFM23(325-338)
IEEE DOI
2401
BibRef
Momeni, L.[Liliane],
Caron, M.[Mathilde],
Nagrani, A.[Arsha],
Zisserman, A.[Andrew],
Schmid, C.[Cordelia],
Verbs in Action: Improving verb understanding in video-language
models,
ICCV23(15533-15545)
IEEE DOI
2401
BibRef
Jin, P.[Peng],
Li, H.[Hao],
Cheng, Z.[Zesen],
Li, K.[Kehan],
Ji, X.Y.[Xiang-Yang],
Liu, C.[Chang],
Yuan, L.[Li],
Chen, J.[Jie],
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model,
ICCV23(2470-2481)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, P.D.[Pan-Deng],
Xie, C.W.[Chen-Wei],
Zhao, L.M.[Li-Ming],
Xie, H.T.[Hong-Tao],
Ge, J.N.[Jian-Nan],
Zheng, Y.[Yun],
Zhao, D.L.[De-Li],
Zhang, Y.D.[Yong-Dong],
Progressive Spatio-Temporal Prototype Matching for Text-Video
Retrieval,
ICCV23(4077-4087)
IEEE DOI
2401
BibRef
Guan, P.Y.[Pei-Yan],
Pei, R.J.[Ren-Jing],
Shao, B.[Bin],
Liu, J.Z.[Jian-Zhuang],
Li, W.[Weimian],
Gu, J.X.[Jia-Xi],
Xu, H.[Hang],
Xu, S.C.[Song-Cen],
Yan, Y.[Youliang],
Lam, E.Y.[Edmund Y.],
PIDRo: Parallel Isomeric Attention with Dynamic Routing for
Text-Video Retrieval,
ICCV23(11130-11139)
IEEE DOI
2401
BibRef
Deng, C.R.[Chao-Rui],
Chen, Q.[Qi],
Qin, P.[Pengda],
Chen, D.[Da],
Wu, Q.[Qi],
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval,
ICCV23(15602-15612)
IEEE DOI
2401
BibRef
Pirhadi, M.J.[Mohammad Javad],
Mirzaei, M.[Motahhare],
Eetemadi, S.[Sauleh],
Just Ask Plus: Using Transcripts for VideoQA,
ASI23(3074-3077)
IEEE DOI
2401
BibRef
Ahmad, M.[Mobeen],
Park, G.[Geonwoo],
Park, D.[Dongchan],
Park, S.[Sanguk],
MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question
Answering,
VLAR23(4659-4664)
IEEE DOI
2401
BibRef
Engin, D.[Deniz],
Avrithis, Y.[Yannis],
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal
Prompts,
CLVL23(2797-2802)
IEEE DOI Code:
WWW Link.
2401
BibRef
Nuthalapati, S.V.[Sai Vidyaranya],
Tunga, A.[Anirudh],
Coarse to Fine Frame Selection for Online Open-ended Video Question
Answering,
MMFM23(353-361)
IEEE DOI
2401
BibRef
Li, Y.C.[Yi-Cong],
Xiao, J.B.[Jun-Bin],
Feng, C.[Chun],
Wang, X.[Xiang],
Chua, T.S.[Tat-Seng],
Discovering Spatio-Temporal Rationales for Video Question Answering,
ICCV23(13823-13832)
IEEE DOI Code:
WWW Link.
2401
BibRef
Ko, D.[Dohwan],
Lee, J.S.[Ji Soo],
Choi, M.[Miso],
Chu, J.W.[Jae-Won],
Park, J.[Jihwan],
Kim, H.W.J.[Hyun-Woo J.],
Open-Vocabulary Video Question Answering: A New Benchmark for
Evaluating the Generalizability of Video Question Answering Models,
ICCV23(3078-3089)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, J.[Jiangtong],
Niu, L.[Li],
Zhang, L.Q.[Li-Qing],
Knowledge Proxy Intervention for Deconfounded Video Question
Answering,
ICCV23(2770-2781)
IEEE DOI
2401
BibRef
Chen, G.Y.[Guang-Yi],
Liu, X.[Xiao],
Wang, G.[Guangrun],
Zhang, K.[Kun],
Torr, P.H.S.[Philip H.S.],
Zhang, X.P.[Xiao-Ping],
Tang, Y.S.[Yan-Song],
Tem-adapter:
Adapting Image-Text Pretraining for Video Question Answer,
ICCV23(13899-13909)
IEEE DOI
2401
BibRef
Jahagirdar, S.[Soumya],
Mathew, M.[Minesh],
Karatzas, D.[Dimosthenis],
Jawahar, C.V.,
Understanding Video Scenes through Text:
Insights from Text-based Video Question Answering,
VLAR23(4648-4652)
IEEE DOI
2401
BibRef
Peng, M.[Min],
Liu, L.C.[Liang-Chen],
Li, Z.H.[Zheng-Hao],
Shi, Y.[Yu],
Zhou, X.D.[Xiang-Dong],
Multi-Semantic Alignment Co-Reasoning Network for Video Question
Answering,
ICIP23(2090-2094)
IEEE DOI
2312
BibRef
Ye, S.H.[Shu-Hong],
Kong, W.[Weikai],
Yao, C.[Chenglin],
Ren, J.F.[Jian-Feng],
Jiang, X.D.[Xu-Dong],
Video Question Answering Using Clip-Guided Visual-Text Attention,
ICIP23(81-85)
IEEE DOI
2312
BibRef
Khan, Z.[Zaid],
Kumar, B.V.[BG Vijay],
Schulter, S.[Samuel],
Yu, X.[Xiang],
Fu, Y.[Yun],
Chandraker, M.[Manmohan],
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA
Tasks? A: Self-Train on Unlabeled Images!,
CVPR23(15005-15015)
IEEE DOI
2309
BibRef
Su, H.T.[Hung-Ting],
Niu, Y.[Yulei],
Lin, X.D.[Xu-Dong],
Hsu, W.H.[Winston H.],
Chang, S.F.[Shih-Fu],
Language Models are Causal Knowledge Extractors for Zero-shot Video
Question Answering,
L3D-IVU23(4951-4960)
IEEE DOI
2309
BibRef
Zang, C.Q.[Chuan-Qi],
Wang, H.Q.[Han-Qing],
Pei, M.T.[Ming-Tao],
Liang, W.[Wei],
Discovering the Real Association: Multimodal Causal Reasoning in
Video Question Answering,
CVPR23(19027-19036)
IEEE DOI
2309
BibRef
Gao, D.F.[Di-Fei],
Zhou, L.[Luowei],
Ji, L.[Lei],
Zhu, L.C.[Lin-Chao],
Yang, Y.[Yi],
Shou, M.Z.[Mike Zheng],
MIST: Multi-modal Iterative Spatial-Temporal Transformer for
Long-form Video Question Answering,
CVPR23(14773-14783)
IEEE DOI
2309
BibRef
Khan, A.U.[Aisha Urooj],
Kuehne, H.[Hilde],
Wu, B.[Bo],
Chheu, K.[Kim],
Bousselham, W.[Walid],
Gan, C.[Chuang],
Lobo, N.[Niels],
Shah, M.[Mubarak],
Learning Situation Hyper-Graphs for Video Question Answering,
CVPR23(14879-14889)
IEEE DOI
2309
BibRef
Jahagirdar, S.[Soumya],
Mathew, M.[Minesh],
Karatzas, D.[Dimosthenis],
Jawahar, C.V.,
Watching the News: Towards VideoQA Models that can Read,
WACV23(4430-4439)
IEEE DOI
2302
Visualization, Computational modeling, Optical character recognition,
Arts/games/social media
BibRef
Zhang, M.[Mingda],
Hwa, R.[Rebecca],
Kovashka, A.[Adriana],
How to Practice VQA on a Resource-limited Target Domain,
WACV23(4440-4449)
IEEE DOI
2302
Visualization, Adaptation models, Sensitivity,
Computational modeling, Transfer learning, visual reasoning
BibRef
Lee, J.[Jihyeon],
Kang, W.[Wooyoung],
Kim, E.S.[Eun-Sol],
Dense but Efficient VideoQA for Intricate Compositional Reasoning,
WACV23(1114-1123)
IEEE DOI
2302
Representation learning, Deformable models, Visualization,
Computational modeling, Semantics, Transformers,
Vision + language and/or other modalities
BibRef
Shen, R.[Ruoyue],
Inoue, N.[Nakamasa],
Shinoda, K.[Koichi],
Text-Guided Object Detector for Multi-modal Video Question Answering,
WACV23(1032-1042)
IEEE DOI
2302
Training, Measurement, Visualization, Annotations, Semantics,
Detectors, Object detection,
Vision + language and/or other modalities
BibRef
Fang, S.[Sheng],
Wang, S.H.[Shu-Hui],
Zhuo, J.[Junbao],
Han, X.Z.[Xin-Zhe],
Huang, Q.M.[Qing-Ming],
Learning Linguistic Association Towards Efficient Text-Video Retrieval,
ECCV22(XXXVI:254-270).
Springer DOI
2211
BibRef
Piergiovanni, A.J.,
Morton, K.[Kairo],
Kuo, W.C.[Wei-Cheng],
Ryoo, M.S.[Michael S.],
Angelova, A.[Anelia],
Video Question Answering with Iterative Video-Text Co-tokenization,
ECCV22(XXXVI:76-94).
Springer DOI
2211
BibRef
Bärmann, L.[Leonard],
Waibel, A.[Alex],
Where did I leave my keys?: Episodic-Memory-Based Question Answering
on Egocentric Videos,
Ego4D-EPIC22(1559-1567)
IEEE DOI
2210
Limiting, Codes, Computational modeling,
Memory management, Question answering (information retrieval)
BibRef
Li, J.T.[Jiang-Tong],
Niu, L.[Li],
Zhang, L.Q.[Li-Qing],
From Representation to Reasoning: Towards both Evidence and
Commonsense Reasoning for Video Question-Answering,
CVPR22(21241-21250)
IEEE DOI
2210
Representation learning, Visualization, Grounding,
Benchmark testing, Distance measurement,
Visual reasoning
BibRef
Datta, S.[Samyak],
Dharur, S.[Sameer],
Cartillier, V.[Vincent],
Desai, R.[Ruta],
Khanna, M.[Mukul],
Batra, D.[Dhruv],
Parikh, D.[Devi],
Episodic Memory Question Answering,
CVPR22(19097-19106)
IEEE DOI
2210
Visualization, Semantics, Memory management, Video sequences,
Question answering (information retrieval), Robustness,
Vision + language
BibRef
Gandhi, M.[Mona],
Gul, M.O.[Mustafa Omer],
Prakash, E.[Eva],
Grunde-McLaughlin, M.[Madeleine],
Krishna, R.[Ranjay],
Agrawala, M.[Maneesh],
Measuring Compositional Consistency for Video Question Answering,
CVPR22(5036-5045)
IEEE DOI
2210
Measurement, Visualization, Directed acyclic graph, Image analysis,
Benchmark testing, Cognition, Vision + language,
Visual reasoning
BibRef
Gorti, S.K.[Satya Krishna],
Vouitsis, N.[Noël],
Ma, J.W.[Jun-Wei],
Golestan, K.[Keyvan],
Volkovs, M.[Maksims],
Garg, A.[Animesh],
Yu, G.[Guangwei],
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval,
CVPR22(4996-5005)
IEEE DOI
2210
Visualization, Codes, Computational modeling, Benchmark testing,
Question answering (information retrieval), Cognition,
Video analysis and understanding
BibRef
Li, J.C.[Jun-Cheng],
Tang, S.L.[Si-Liang],
Zhu, L.C.[Lin-Chao],
Shi, H.[Haochen],
Huang, X.[Xuanwen],
Wu, F.[Fei],
Yang, Y.[Yi],
Zhuang, Y.T.[Yue-Ting],
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for
Video-and-Language Inference,
ICCV21(1847-1857)
IEEE DOI
2203
Visualization, Adaptation models, Adaptive systems, Semantics,
Coherence, Linguistics, Vision + language,
Vision + other modalities
BibRef
Zhang, M.X.[Ming-Xing],
Yang, Y.[Yang],
Chen, X.[Xinghan],
Ji, Y.L.[Yan-Li],
Xu, X.[Xing],
Li, J.J.[Jing-Jing],
Shen, H.T.[Heng Tao],
Multi-stage Aggregated Transformer Network for Temporal Language
Localization in Videos,
CVPR21(12664-12673)
IEEE DOI
2111
Location awareness, Visualization,
Computational modeling, Scalability, Transformers
BibRef
Kim, N.[Nayoung],
Ha, S.J.[Seong Jong],
Kang, J.W.[Je-Won],
Video Question Answering Using Language-Guided Deep Compressed-Domain
Video Feature,
ICCV21(1688-1697)
IEEE DOI
2203
Deep learning, Training, Visualization, Computational modeling,
Neural networks, Video compression, Feature extraction,
Vision + other modalities
BibRef
Liu, F.[Fei],
Liu, J.[Jing],
Wang, W.N.[Wei-Ning],
Lu, H.Q.[Han-Qing],
HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video
Question Answering,
ICCV21(1678-1687)
IEEE DOI
2203
Hair, Heart, Visualization, Semantics, Benchmark testing, Cognition,
Vision + language,
BibRef
Yang, A.[Antoine],
Miech, A.[Antoine],
Sivic, J.[Josef],
Laptev, I.[Ivan],
Schmid, C.[Cordelia],
Just Ask:
Learning to Answer Questions from Millions of Narrated Videos,
ICCV21(1666-1677)
IEEE DOI
2203
Training, Visualization, Vocabulary, Annotations, Scalability, Manuals,
Transformers, Vision + language,
BibRef
Gao, D.F.[Di-Fei],
Wang, R.P.[Rui-Ping],
Bai, Z.[Ziyi],
Chen, X.L.[Xi-Lin],
Env-QA: A Video Question Answering Benchmark for Comprehensive
Understanding of Dynamic Environments,
ICCV21(1655-1665)
IEEE DOI
2203
Visualization, Layout, Feature extraction, Transformers, Cognition,
Data mining, Vision + language, Video analysis and understanding,
Visual reasoning and logical representation
BibRef
Yun, H.[Heeseung],
Yu, Y.[Youngjae],
Yang, W.[Wonsuk],
Lee, K.[Kangil],
Kim, G.[Gunhee],
Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos,
ICCV21(2011-2021)
IEEE DOI
2203
Training, Navigation, Grounding, Semantics, Benchmark testing,
Transformers, Vision + language, Vision + other modalities
BibRef
Xu, L.[Li],
Huang, H.[He],
Liu, J.[Jun],
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient
Network for Video Reasoning over Traffic Events,
CVPR21(9873-9883)
IEEE DOI
2111
Transportation, Benchmark testing, Knowledge discovery, Cognition,
Computational efficiency, Reliability
BibRef
Park, J.[Jungin],
Lee, J.Y.[Ji-Young],
Sohn, K.H.[Kwang-Hoon],
Bridge to Answer: Structure-aware Graph Interaction Network for Video
Question Answering,
CVPR21(15521-15530)
IEEE DOI
2111
Bridges, Visualization, Computational modeling, Message passing,
Semantics, Benchmark testing, Linguistics
BibRef
Chen, X.W.[Xuan-Wei],
Liu, R.[Rui],
Song, X.M.[Xiao-Meng],
Han, Y.H.[Ya-Hong],
Locating Visual Explanations for Video Question Answering,
MMMod21(I:290-302).
Springer DOI
2106
BibRef
Garcia, N.[Noa],
Nakashima, Y.[Yuta],
Knowledge-based Video Question Answering with Unsupervised Scene
Descriptions,
ECCV20(XVIII:581-598).
Springer DOI
2012
BibRef
Kim, J.,
Ma, M.,
Pham, T.,
Kim, K.,
Yoo, C.D.,
Modality Shifting Attention Network for Multi-Modal Video Question
Answering,
CVPR20(10103-10112)
IEEE DOI
2008
Cognition, Visualization, Task analysis, Knowledge discovery,
Proposals, Modulation, Context modeling
BibRef
Jiang, M.,
Chen, S.,
Yang, J.,
Zhao, Q.,
Fantastic Answers and Where to Find Them: Immersive Question-Directed
Visual Attention,
CVPR20(2977-2986)
IEEE DOI
2008
Task analysis, Videos, Visualization, Computational modeling, Head, Resists
BibRef
Yang, Z.,
Garcia, N.,
Chu, C.,
Otani, M.,
Nakashima, Y.,
Takemura, H.,
BERT Representations for Video Question Answering,
WACV20(1545-1554)
IEEE DOI
2006
Visualization, Bit error rate, Feature extraction,
Knowledge discovery, Task analysis, Semantics, Standards
BibRef
Fan, C.Y.[Chen-You],
Zhang, X.F.[Xiao-Fan],
Zhang, S.[Shu],
Wang, W.S.[Wen-Sheng],
Zhang, C.[Chi],
Huang, H.[Heng],
Heterogeneous Memory Enhanced Multimodal Attention Model for Video
Question Answering,
CVPR19(1999-2007).
IEEE DOI
2002
BibRef
Kim, J.Y.[Jun-Yeong],
Ma, M.[Minuk],
Kim, K.[Kyungsu],
Kim, S.[Sungjin],
Yoo, C.D.[Chang D.],
Progressive Attention Memory Network for Movie Story Question Answering,
CVPR19(8329-8338).
IEEE DOI
2002
BibRef
Liu, C.N.[Chao-Ning],
Chen, D.J.[Ding-Jie],
Chen, H.T.[Hwann-Tzong],
Liu, T.L.[Tyng-Luh],
A2A: Attention to Attention Reasoning for Movie Question Answering,
ACCV18(VI:404-419).
Springer DOI
1906
BibRef
Gao, J.,
Ge, R.,
Chen, K.,
Nevatia, R.,
Motion-Appearance Co-memory Networks for Video Question Answering,
CVPR18(6576-6585)
IEEE DOI
1812
Knowledge discovery, Cognition, Task analysis, Dynamics,
Memory modules, Micromechanical devices, Logic gates
BibRef
Kim, K.M.[Kyung-Min],
Choi, S.H.[Seong-Ho],
Kim, J.H.[Jin-Hwa],
Zhang, B.T.[Byoung-Tak],
Multimodal Dual Attention Memory for Video Story Question Answering,
ECCV18(XV: 698-713).
Springer DOI
1810
BibRef
Yu, Y.J.[Young-Jae],
Kim, J.S.[Jong-Seok],
Kim, G.[Gunhee],
A Joint Sequence Fusion Model for Video Question Answering and
Retrieval,
ECCV18(VII: 487-503).
Springer DOI
1810
BibRef
Hasan Chowdhury, M.I.,
Nguyen, K.,
Sridharan, S.,
Fookes, C.,
Hierarchical Relational Attention for Video Question Answering,
ICIP18(599-603)
IEEE DOI
1809
Feature extraction, Knowledge discovery, Visualization,
Task analysis, Mathematical model, Natural languages, scene understanding
BibRef
Mun, J.[Jonghwan],
Seo, P.H.[Paul Hongsuck],
Jung, I.[Ilchae],
Han, B.H.[Bo-Hyung],
MarioQA: Answering Questions by Watching Gameplay Videos,
ICCV17(2886-2894)
IEEE DOI
1802
computer games, inference mechanisms, neural nets,
question answering (information retrieval), VideoQA problems, Visualization
BibRef
Yu, Y.,
Ko, H.,
Choi, J.,
Kim, G.,
End-to-End Concept Word Detection for Video Captioning, Retrieval,
and Question Answering,
CVPR17(3261-3269)
IEEE DOI
1711
Detectors, Knowledge discovery, Motion pictures, Semantics, Training,
Visualization
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Visual Question Answering, Datasets, Benchmarks, Surveys .