Qiu, Z.F.[Zhao-Fan],
Yao, T.[Ting],
Mei, T.[Tao],
Learning Deep Spatio-Temporal Dependence for Semantic Video
Segmentation,
MultMed(20), No. 4, April 2018, pp. 939-949.
IEEE DOI
1804
BibRef
Earlier:
Learning Spatio-Temporal Representation with Pseudo-3D Residual
Networks,
ICCV17(5534-5542)
IEEE DOI
1802
3D from 2D nets.
Image segmentation, Semantics,
Streaming media,
video segmentation
convolution, feature extraction, image classification,
image recognition, image representation,
Visualization
BibRef
Qiu, Z.F.[Zhao-Fan],
Yao, T.[Ting],
Ngo, C.W.[Chong-Wah],
Tian, X.M.[Xin-Mei],
Mei, T.[Tao],
Learning Spatio-Temporal Representation With Local and Global Diffusion,
CVPR19(12048-12057).
IEEE DOI
2002
BibRef
Yao, T.,
Pan, Y.,
Li, Y.,
Qiu, Z.,
Mei, T.,
Boosting Image Captioning with Attributes,
ICCV17(4904-4912)
IEEE DOI
1802
BibRef
And: A2, A1, A3, A5, Only:
Video Captioning with Transferred Semantic Attributes,
CVPR17(984-992)
IEEE DOI
1711
image representation,
learning (artificial intelligence),
Semantics.
Natural languages,
Probability distribution, Recurrent neural networks, Visualization
BibRef
Zhao, B.,
Li, X.,
Lu, X.,
CAM-RNN: Co-Attention Model Based RNN for Video Captioning,
IP(28), No. 11, November 2019, pp. 5552-5565.
IEEE DOI
1909
Visualization, Task analysis, Logic gates,
Recurrent neural networks, Dogs, Semantics, Decoding,
recurrent neural network
BibRef
Yan, C.,
Tu, Y.,
Wang, X.,
Zhang, Y.,
Hao, X.,
Zhang, Y.,
Dai, Q.,
STAT: Spatial-Temporal Attention Mechanism for Video Captioning,
MultMed(22), No. 1, January 2020, pp. 229-241.
IEEE DOI
2001
BibRef
And:
Corrections:
MultMed(22), No. 3, March 2020, pp. 830-830.
IEEE DOI
2003
Video captioning, spatial-temporal attention mechanism,
encoder-decoder neural networks.
Mechatronics, Automation, Streaming media
BibRef
Aafaq, N.[Nayyer],
Mian, A.[Ajmal],
Liu, W.[Wei],
Gilani, S.Z.[Syed Zulqarnain],
Shah, M.[Mubarak],
Video Description:
A Survey of Methods, Datasets, and Evaluation Metrics,
Surveys(52), No. 6, October 2019, pp. xx-yy.
DOI Link
2001
video to text, Video description, video captioning, language in vision
BibRef
Zhang, Z.,
Xu, D.,
Ouyang, W.,
Tan, C.,
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue
Aided Sentence Summarization,
CirSysVideo(30), No. 9, September 2020, pp. 3130-3139.
IEEE DOI
2009
Proposals, Visualization, Image segmentation, Feature extraction,
Semantics, Decoding, Task analysis, Dense video captioning,
hierarchical attention mechanism
BibRef
Zhang, W.[Wei],
Wang, B.R.[Bai-Rui],
Ma, L.[Lin],
Liu, W.[Wei],
Reconstruct and Represent Video Contents for Captioning via
Reinforcement Learning,
PAMI(42), No. 12, December 2020, pp. 3088-3101.
IEEE DOI
2011
Decoding, Image reconstruction, Semantics, Training data,
Visualization, Video sequences, Video captioning,
backward information
BibRef
Lee, S.[Sujin],
Kim, I.[Incheol],
DVC-Net: A deep neural network model for dense video captioning,
IET-CV(15), No. 1, 2021, pp. 12-23.
DOI Link
2106
BibRef
Qi, S.S.[Shan-Shan],
Yang, L.X.[Lu-Xi],
Video captioning via a symmetric bidirectional decoder,
IET-CV(15), No. 4, 2021, pp. 283-296.
DOI Link
2106
BibRef
Li, L.[Linghui],
Zhang, Y.D.[Yong-Dong],
Tang, S.[Sheng],
Xie, L.X.[Ling-Xi],
Li, X.Y.[Xiao-Yong],
Tian, Q.[Qi],
Adaptive Spatial Location With Balanced Loss for Video Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 17-30.
IEEE DOI
2201
Task analysis, Redundancy, Feature extraction, Visualization,
Detectors, Training, Convolutional neural network,
balanced loss
BibRef
Zheng, Y.[Yi],
Zhang, Y.[Yuejie],
Feng, R.[Rui],
Zhang, T.[Tao],
Fan, W.G.[Wei-Guo],
Stacked Multimodal Attention Network for Context-Aware Video
Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 31-42.
IEEE DOI
2201
Feature extraction, Visualization, Decoding, Training,
Biological system modeling, Context modeling, Predictive models,
reinforcement learning
BibRef
Li, L.[Liang],
Gao, X.Y.[Xing-Yu],
Deng, J.[Jincan],
Tu, Y.[Yunbin],
Zha, Z.J.[Zheng-Jun],
Huang, Q.M.[Qing-Ming],
Long Short-Term Relation Transformer With Global Gating for Video
Captioning,
IP(31), 2022, pp. 2726-2738.
IEEE DOI
2204
Transformers, Cognition, Visualization, Feature extraction, Decoding,
Task analysis, Semantics, Video captioning, relational reasoning, transformer
BibRef
Munusamy, H.[Hemalatha],
Sekhar, C.C.[C. Chandra],
Video captioning using Semantically Contextual Generative Adversarial
Network,
CVIU(221), 2022, pp. 103453.
Elsevier DOI
2206
Video captioning, Generative adversarial network,
Reinforcement learning, Generator, Discriminator
BibRef
Wang, H.[Hao],
Lin, G.S.[Guo-Sheng],
Hoi, S.C.H.[Steven C. H.],
Miao, C.Y.[Chun-Yan],
Cross-Modal Graph With Meta Concepts for Video Captioning,
IP(31), 2022, pp. 5150-5162.
IEEE DOI
2208
Semantics, Visualization, Feature extraction, Predictive models,
Task analysis, Computational modeling, Location awareness, vision-and-language
BibRef
Xiao, H.[Huanhou],
Shi, J.L.[Jing-Lun],
Diverse video captioning through latent variable expansion,
PRL(160), 2022, pp. 19-25.
Elsevier DOI
2208
Latent variables, Diverse captions, CGAN
BibRef
Prudviraj, J.[Jeripothula],
Reddy, M.I.[Malipatel Indrakaran],
Vishnu, C.[Chalavadi],
Mohan, C.K.[Chalavadi Krishna],
AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated
Transformer for Multisentence Video Description,
IP(31), 2022, pp. 5559-5569.
IEEE DOI
2209
Transformers, Streaming media, Task analysis, Visualization,
Video description, Correlation, Natural languages,
transformers
BibRef
Xu, W.[Wanru],
Miao, Z.J.[Zhen-Jiang],
Yu, J.[Jian],
Tian, Y.[Yi],
Wan, L.[Lili],
Ji, Q.[Qiang],
Bridging Video and Text:
A Two-Step Polishing Transformer for Video Captioning,
CirSysVideo(32), No. 9, September 2022, pp. 6293-6307.
IEEE DOI
2209
Semantics, Visualization, Decoding, Transformers, Task analysis,
Planning, Training, Video captioning, transformer,
cross-modal modeling
BibRef
Wu, B.F.[Bo-Feng],
Niu, G.C.[Guo-Cheng],
Yu, J.[Jun],
Xiao, X.Y.[Xin-Yan],
Zhang, J.[Jian],
Wu, H.[Hua],
Towards Knowledge-Aware Video Captioning via Transitive Visual
Relationship Detection,
CirSysVideo(32), No. 10, October 2022, pp. 6753-6765.
IEEE DOI
2210
Visualization, Task analysis, Semantics, Feature extraction,
Decoding, Training, Vocabulary, Video captioning,
natural language process
BibRef
Yan, L.Q.[Li-Qi],
Ma, S.Q.[Si-Qi],
Wang, Q.F.[Qi-Fan],
Chen, Y.J.[Ying-Jie],
Zhang, X.Y.[Xiang-Yu],
Savakis, A.[Andreas],
Liu, D.F.[Dong-Fang],
Video Captioning Using Global-Local Representation,
CirSysVideo(32), No. 10, October 2022, pp. 6642-6656.
IEEE DOI
2210
Training, Task analysis, Visualization, Vocabulary, Semantics,
Decoding, Correlation, video captioning, video representation,
visual analysis
BibRef
Subramaniam, A.[Arulkumar],
Vaidya, J.[Jayesh],
Ameen, M.A.M.[Muhammed Abdul Majeed],
Nambiar, A.[Athira],
Mittal, A.[Anurag],
Co-segmentation inspired attention module for video-based computer
vision tasks,
CVIU(223), 2022, pp. 103532.
Elsevier DOI
2210
Attention, Co-segmentation, Person re-ID, Video-captioning, Video classification
BibRef
Liu, F.L.[Feng-Lin],
Wu, X.[Xian],
You, C.Y.[Chen-Yu],
Ge, S.[Shen],
Zou, Y.X.[Yue-Xian],
Sun, X.[Xu],
Aligning Source Visual and Target Language Domains for Unpaired Video
Captioning,
PAMI(44), No. 12, December 2022, pp. 9255-9268.
IEEE DOI
2212
Visualization, Pipelines, Training, Data models, Decoding,
Task analysis, Feature extraction, Video captioning, adversarial training
BibRef
Yuan, Y.T.[Yi-Tian],
Ma, L.[Lin],
Zhu, W.W.[Wen-Wu],
Syntax Customized Video Captioning by Imitating Exemplar Sentences,
PAMI(44), No. 12, December 2022, pp. 10209-10221.
IEEE DOI
2212
Syntactics, Semantics, Task analysis, Training, Decoding, Encoding,
Recurrent neural networks, Video captioning,
recurrent neural network
BibRef
Chen, H.R.[Hao-Ran],
Li, J.M.[Jian-Min],
Frintrop, S.[Simone],
Hu, X.L.[Xiao-Lin],
The MSR-Video to Text dataset with clean annotations,
CVIU(225), 2022, pp. 103581.
Elsevier DOI
2212
MSR-VTT dataset, Data cleaning, Data analysis, Video captioning
BibRef
Moctezuma, D.[Daniela],
Ramírez-delReal, T.[Tania],
Ruiz, G.[Guillermo],
González-Chávez, O.[Othón],
Video captioning: A comparative review of where we are and which
could be the route,
CVIU(231), 2023, pp. 103671.
Elsevier DOI
2305
Natural language processing, Video captioning, Image understanding
BibRef
Aafaq, N.[Nayyer],
Mian, A.[Ajmal],
Akhtar, N.[Naveed],
Liu, W.[Wei],
Shah, M.[Mubarak],
Dense Video Captioning With Early Linguistic Information Fusion,
MultMed(25), 2023, pp. 2309-2322.
IEEE DOI
2306
Proposals, Task analysis, Semantics, Visualization, Linguistics,
Transformers, Event detection, Context modeling,
video captioning
BibRef
Wang, J.W.[Jing-Wen],
Jiang, W.H.[Wen-Hao],
Ma, L.[Lin],
Liu, W.[Wei],
Xu, Y.[Yong],
Bidirectional Attentive Fusion with Context Gating for Dense Video
Captioning,
CVPR18(7190-7198)
IEEE DOI
1812
Proposals, Visualization, Task analysis, Video sequences, Fuses,
Semantics, Feature extraction
BibRef
He, M.G.[Meng-Ge],
Du, W.J.[Wen-Jing],
Wen, Z.Q.[Zhi-Quan],
Du, Q.[Qing],
Xie, Y.T.[Yu-Tong],
Wu, Q.[Qi],
Multi-Granularity Aggregation Transformer for Joint Video-Audio-Text
Representation Learning,
CirSysVideo(33), No. 6, June 2023, pp. 2990-3002.
IEEE DOI
2306
Videos, Representation learning, Transformers, Aggregates, Semantics,
Feature extraction, Task analysis, video captioning
BibRef
Qian, Y.[Yong],
Mao, Y.C.[Ying-Chi],
Chen, Z.H.[Zhi-Hao],
Li, C.[Chang],
Bloh, O.T.[Olano Teah],
Huang, Q.[Qian],
Dense video captioning based on local attention,
IET-IPR(17), No. 9, 2023, pp. 2673-2685.
DOI Link
2307
2D temporal differential CNN, dense video captioning,
event proposal, feature extraction, local attention
BibRef
Tang, M.K.[Ming-Kang],
Wang, Z.Y.[Zhan-Yu],
Zeng, Z.Y.[Zhao-Yang],
Li, X.[Xiu],
Zhou, L.P.[Lu-Ping],
Stay in Grid: Improving Video Captioning via Fully Grid-Level
Representation,
CirSysVideo(33), No. 7, July 2023, pp. 3319-3332.
IEEE DOI
2307
Decoding, Feature extraction, Correlation, Visualization, Aggregates,
Semantics, Detectors, Video captioning, sequential attention,
bilinear pooling
BibRef
Velda, V.[Vania],
Immanuel, S.A.[Steve Andreas],
Hendria, W.F.[Willy Fitra],
Jeong, C.[Cheol],
Improving distinctiveness in video captioning with text-video
similarity,
IVC(136), 2023, pp. 104728.
Elsevier DOI
2308
Distinctiveness, Similarity scores, Video captioning, Video retrieval
BibRef
Zhu, J.K.[Jin-Kuan],
Zeng, P.P.[Peng-Peng],
Gao, L.L.[Lian-Li],
Li, G.F.[Gong-Fu],
Liao, D.L.[Dong-Liang],
Song, J.K.[Jing-Kuan],
Complementarity-Aware Space Learning for Video-Text Retrieval,
CirSysVideo(33), No. 8, August 2023, pp. 4362-4374.
IEEE DOI
2308
Videos, Feature extraction, Task analysis, Visualization, Semantics,
Learning systems, Layout, Video-text retrieval, video captioning,
deep learning
BibRef
Wang, H.[Hao],
Zhang, L.[Libo],
Fan, H.[Heng],
Luo, T.J.[Tie-Jian],
Collaborative three-stream transformers for video captioning,
CVIU(235), 2023, pp. 103799.
Elsevier DOI
2310
Video captioning, Multi-modal, Cross-granularity, Spatial-temporal domain
BibRef
Gu, X.[Xin],
Chen, G.[Guang],
Wang, Y.F.[Yu-Fei],
Zhang, L.[Libo],
Luo, T.J.[Tie-Jian],
Wen, L.Y.[Long-Yin],
Text with Knowledge Graph Augmented Transformer for Video Captioning,
CVPR23(18941-18951)
IEEE DOI
2309
BibRef
Xu, T.[Tao],
Cui, Y.Y.[Yuan-Yuan],
He, X.Y.[Xin-Yu],
Liu, C.H.[Cai-Hua],
A latent topic-aware network for dense video captioning,
IET-CV(17), No. 7, 2023, pp. 795-803.
DOI Link
2310
feature selection, natural language processing, video signal processing
BibRef
Lu, M.[Min],
Li, X.Y.[Xue-Yong],
Liu, C.H.[Cai-Hua],
Context Visual Information-based Deliberation Network for Video
Captioning,
ICPR21(9812-9818)
IEEE DOI
2105
Visualization, Semantics, Coherence, Benchmark testing,
Decoding
BibRef
Wu, B.[Bofeng],
Liu, B.[Buyu],
Huang, P.[Peng],
Bao, J.[Jun],
Xi, P.[Peng],
Yu, J.[Jun],
Concept Parser With Multimodal Graph Learning for Video Captioning,
CirSysVideo(33), No. 9, September 2023, pp. 4484-4495.
IEEE DOI
2310
BibRef
Liu, S.[Sheng],
Li, A.[Annan],
Wang, J.H.[Jia-Hao],
Wang, Y.H.[Yun-Hong],
Bidirectional Maximum Entropy Training With Word Co-Occurrence for
Video Captioning,
MultMed(25), 2023, pp. 4494-4507.
IEEE DOI
2310
BibRef
Yang, B.[Bang],
Cao, M.[Meng],
Zou, Y.X.[Yue-Xian],
Concept-Aware Video Captioning:
Describing Videos With Effective Prior Information,
IP(32), 2023, pp. 5366-5378.
IEEE DOI Code:
WWW Link.
2310
BibRef
Luo, X.M.[Xue-Mei],
Luo, X.T.[Xiao-Tong],
Wang, D.[Di],
Liu, J.H.[Jin-Hui],
Wan, B.[Bo],
Zhao, L.[Lin],
Global semantic enhancement network for video captioning,
PR(145), 2024, pp. 109906.
Elsevier DOI
2311
Video captioning, Feature aggregation, Semantic enhancement
BibRef
Liu, Z.[Zhu],
Wang, T.[Teng],
Zhang, J.[Jinrui],
Zheng, F.[Feng],
Jiang, W.H.[Wen-Hao],
Lu, K.[Ke],
Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage
Progressive Training,
MultMed(25), 2023, pp. 7894-7905.
IEEE DOI
2312
BibRef
Rao, Q.[Qi],
Yu, X.[Xin],
Li, G.[Guang],
Zhu, L.C.[Lin-Chao],
CMGNet: Collaborative multi-modal graph network for video captioning,
CVIU(238), 2024, pp. 103864.
Elsevier DOI
2312
Video Captioning, Multiple Modality Learning, Graph Neural Networks
BibRef
Li, G.R.[Guo-Rong],
Ye, H.H.[Han-Hua],
Qi, Y.[Yuankai],
Wang, S.H.[Shu-Hui],
Qing, L.Y.[Lai-Yun],
Huang, Q.M.[Qing-Ming],
Yang, M.H.[Ming-Hsuan],
Learning Hierarchical Modular Networks for Video Captioning,
PAMI(46), No. 2, February 2024, pp. 1049-1064.
IEEE DOI
2401
BibRef
Earlier: A2, A1, A3, A4, A6, A7, Only:
Hierarchical Modular Network for Video Captioning,
CVPR22(17918-17927)
IEEE DOI
2210
Bridges, Representation learning, Visualization, Semantics,
Supervised learning, Linguistics, Vision + language
BibRef
Xie, Y.L.[Yu-Lai],
Niu, J.J.[Jing-Jing],
Zhang, Y.[Yang],
Ren, F.[Fang],
Global-Shared Text Representation Based Multi-Stage Fusion
Transformer Network for Multi-Modal Dense Video Captioning,
MultMed(26), 2024, pp. 3164-3179.
IEEE DOI
2402
Proposals, Visualization, Task analysis, Semantics, Transformers,
Correlation, Fuses, Anchor-free target detection, multi-stage fusion
BibRef
Jing, S.[Shuaiqi],
Zhang, H.[Haonan],
Zeng, P.P.[Peng-Peng],
Gao, L.L.[Lian-Li],
Song, J.K.[Jing-Kuan],
Shen, H.T.[Heng Tao],
Memory-Based Augmentation Network for Video Captioning,
MultMed(26), 2024, pp. 2367-2379.
IEEE DOI
2402
Visualization, Decoding, Task analysis, Semantics, Transformers,
Context modeling, Linguistics, Attention mechanism, video captioning
BibRef
Liang, Y.Z.[Yuan-Zhi],
Zhu, L.C.[Lin-Chao],
Wang, X.H.[Xiao-Han],
Yang, Y.[Yi],
IcoCap: Improving Video Captioning by Compounding Images,
MultMed(26), 2024, pp. 4389-4400.
IEEE DOI
2403
Semantics, Visualization, Task analysis, Integrated circuits,
Compounds, Training, Multi-modal understanding, video captioning
BibRef
Wang, Z.H.[Zhi-Hao],
Li, L.[Lin],
Xie, Z.[Zhongwei],
Liu, C.B.[Chuan-Bo],
Video Frame-wise Explanation Driven Contrastive Learning for
Procedural Text Generation,
CVIU(241), 2024, pp. 103954.
Elsevier DOI
2403
Video procedural captioning, Contrastive learning, Frame-wise explanation
BibRef
Chen, Y.X.[Yu-Xin],
Zhang, Z.Q.[Zi-Qi],
Qi, Z.A.[Zhong-Ang],
Yuan, C.F.[Chun-Feng],
Wang, J.[Jie],
Shan, Y.[Ying],
Li, B.[Bing],
Hu, W.M.[Wei-Ming],
Qie, X.[Xiaohu],
Wu, J.P.[Jian-Ping],
DARTScore: DuAl-Reconstruction Transformer for Video Captioning
Evaluation,
CirSysVideo(34), No. 4, April 2024, pp. 2041-2055.
IEEE DOI
2404
Measurement, Image reconstruction, Visualization,
Transformers, Task analysis, Semantics,
dual-reconstruction transformer
BibRef
Liu, C.S.[Chun-Sheng],
Zhang, X.[Xiao],
Chang, F.[Faliang],
Li, S.[Shuang],
Hao, P.H.[Peng-Hui],
Lu, Y.[Yansha],
Wang, Y.[Yinhai],
Traffic Scenario Understanding and Video Captioning via Guidance
Attention Captioning Network,
ITS(25), No. 5, May 2024, pp. 3615-3627.
IEEE DOI
2405
Task analysis, Feature extraction, Semantics, Decoding, Cameras,
Behavioral sciences, Visualization, attention mechanism
BibRef
Zhang, Y.J.[Yun-Jie],
Xu, T.Y.[Tian-Yang],
Song, X.N.[Xiao-Ning],
Zhu, X.F.[Xue-Feng],
Feng, Z.H.[Zheng-Hua],
Wu, X.J.[Xiao-Jun],
Towards accurate unsupervised video captioning with implicit visual
feature injection and explicit,
PRL(183), 2024, pp. 133-139.
Elsevier DOI
2406
Unsupervised video captioning, Text generation,
Visual information, Sentence keywords
BibRef
Im, S.K.[Sio-Kei],
Chan, K.H.[Ka-Hou],
Local feature-based video captioning with multiple classifier and
CARU-attention,
IET-IPR(18), No. 9, 2024, pp. 2304-2317.
DOI Link
2407
convolutional neural nets, feature extraction,
pattern classification, recurrent neural nets, video signal processing
BibRef
Putra, B.H.H.[Bahy Helmi Hartoyo],
Jeong, C.[Cheol],
Video captioning based on dual learning via multiple reconstruction
blocks,
IVC(148), 2024, pp. 105119.
Elsevier DOI
2407
Dual learning, Reconstruction network, Video captioning
BibRef
Chou, S.H.[Shih-Han],
Little, J.J.[James J.],
Sigal, L.[Leonid],
Implicit and explicit commonsense for multi-sentence video captioning,
CVIU(247), 2024, pp. 104064.
Elsevier DOI
2408
Instruction generation, Video captioning, Commonsense reasoning
BibRef
Tian, M.[Mingkai],
Li, G.R.[Guo-Rong],
Qi, Y.[Yuankai],
Wang, S.H.[Shu-Hui],
Sheng, Q.Z.[Quan Z.],
Huang, Q.M.[Qing-Ming],
Rethink video retrieval representation for video captioning,
PR(156), 2024, pp. 110744.
Elsevier DOI Code:
WWW Link.
2408
Video captioning, Video-text retrieval, Token shift, Cross-attention
BibRef
Liu, S.[Sheng],
Li, A.[Annan],
Zhao, Y.W.[Yu-Wei],
Wang, J.H.[Jia-Hao],
Wang, Y.H.[Yun-Hong],
EvCap: Element-Aware Video Captioning,
CirSysVideo(34), No. 10, October 2024, pp. 9718-9731.
IEEE DOI
2411
Visualization, Feature extraction, Linguistics, Semantics, Dogs,
Decoding, Video captioning, multimodal application
BibRef
Shen, Y.H.[Yu-Han],
Yang, L.J.[Lin-Jie],
Wen, L.[Longyin],
Yu, H.C.[Hai-Chao],
Elhamifar, E.[Ehsan],
Wang, H.[Heng],
Exploring the Role of Audio in Video Captioning,
MULA24(2090-2100)
IEEE DOI
2410
Measurement, Computational modeling, Predictive models, Acoustics, Data mining
BibRef
Shoman, M.[Maged],
Wang, D.D.[Dong-Dong],
Aboah, A.[Armstrong],
Abdel-Aty, M.[Mohamed],
Enhancing Traffic Safety with Parallel Dense Video Captioning for
End-to-End Event Analysis,
AICity24(7125-7133)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Adaptation models, Urban areas,
Feature extraction, Weaving, Tokenization, dense video captioning,
cross-modality learning
BibRef
Wu, H.[Hao],
Liu, H.[Huabin],
Qiao, Y.[Yu],
Sun, X.[Xiao],
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via
Pseudo Boundary Enrichment and Online Refinement,
CVPR24(18699-18708)
IEEE DOI
2410
Training, Measurement, Large language models, Coherence,
Standards
BibRef
Zhou, X.Y.[Xing-Yi],
Arnab, A.[Anurag],
Buch, S.[Shyamal],
Yan, S.[Shen],
Myers, A.[Austin],
Xiong, X.[Xuehan],
Nagrani, A.[Arsha],
Schmid, C.[Cordelia],
Streaming Dense Video Captioning,
CVPR24(18243-18252)
IEEE DOI Code:
WWW Link.
2410
Codes, Computational modeling, Clustering algorithms,
Memory modules, Streaming media, Predictive models, captioning,
video captioning
BibRef
Kim, M.[Minkuk],
Kim, H.B.[Hyeon Bae],
Moon, J.[Jinyoung],
Choi, J.[Jinwoo],
Kim, S.T.[Seong Tae],
Do You Remember? Dense Video Captioning with Cross-Modal Memory
Retrieval,
CVPR24(13894-13904)
IEEE DOI Code:
WWW Link.
2410
Location awareness, Visualization, Cognitive processes,
Computational modeling, Semantics, Information processing, Vision,
Retrieval Agumented Generation
BibRef
Xu, J.[Jilan],
Huang, Y.F.[Yi-Fei],
Hou, J.L.[Jun-Lin],
Chen, G.[Guo],
Zhang, Y.[Yuejie],
Feng, R.[Rui],
Xie, W.[Weidi],
Retrieval-Augmented Egocentric Video Captioning,
CVPR24(13525-13536)
IEEE DOI
2410
Training, Representation learning, Visualization,
Computational modeling, Semantics, Pipelines,
instructional videos
BibRef
Malakan, Z.M.[Zainy M.],
Hassan, G.M.[Ghulam Mubashar],
Mian, A.[Ajmal],
Sequential Image Storytelling Model Based on Transformer Attention
Pooling,
IVCNZ23(1-6)
IEEE DOI
2403
Measurement, Visualization, Computational modeling,
Computer architecture, Linguistics, Transformers, Task analysis,
Image Captioning and Video Captioning
BibRef
Sakaino, H.[Hidetomo],
Unseen and Adverse Outdoor Scenes Recognition Through Event-based
Captions,
VCL23(3596-3603)
IEEE DOI
2401
BibRef
Ma, Z.Y.[Zong-Yang],
Zhang, Z.Q.[Zi-Qi],
Chen, Y.X.[Yu-Xin],
Qi, Z.A.[Zhong-Ang],
Luo, Y.M.[Ying-Min],
Li, Z.K.[Ze-Kun],
Yuan, C.F.[Chun-Feng],
Li, B.[Bing],
Qie, X.[Xiaohu],
Shan, Y.[Ying],
Hu, W.M.[Wei-Ming],
Order-Prompted Tag Sequence Generation for Video Tagging,
ICCV23(15635-15644)
IEEE DOI
2401
BibRef
Bulat, A.[Adrian],
Sanchez, E.[Enrique],
Martinez, B.[Brais],
Tzimiropoulos, G.[Georgios],
ReGen: A good Generative zero-shot video classifier should be
Rewarded,
ICCV23(13477-13487)
IEEE DOI
2401
Turn a generative video captioning model into an open-world
video/action classification model.
BibRef
Shen, Y.J.[Yao-Jie],
Gu, X.[Xin],
Xu, K.[Kai],
Fan, H.[Heng],
Wen, L.[Longyin],
Zhang, L.[Libo],
Accurate and Fast Compressed Video Captioning,
ICCV23(15512-15521)
IEEE DOI Code:
WWW Link.
2401
BibRef
Lin, W.[Wang],
Jin, T.[Tao],
Wang, Y.[Ye],
Pan, W.W.[Wen-Wen],
Li, L.J.[Lin-Jun],
Cheng, X.[Xize],
Zhao, Z.[Zhou],
Exploring Group Video Captioning with Efficient Relational
Approximation,
ICCV23(15235-15244)
IEEE DOI
2401
BibRef
Damaceno, R.J.P.[Rafael J. Pezzuto],
Cesar Jr., R.M.[Roberto M.],
An End-to-end Deep Learning Approach for Video Captioning Through
Mobile Devices,
CIARP23(I:715-729).
Springer DOI
2312
BibRef
Munusamy, H.[Hemalatha],
Sekhar, C.C.[C. Chandra],
Multi-Modal Hierarchical Attention-Based Dense Video Captioning,
ICIP23(475-479)
IEEE DOI
2312
BibRef
Chen, K.X.[Kai-Xuan],
Di, Q.J.[Qian-Ji],
Lu, Y.[Yang],
Wang, H.Z.[Han-Zi],
Semantic Learning Network for Controllable Video Captioning,
ICIP23(880-884)
IEEE DOI
2312
BibRef
Nadeem, A.[Asmar],
Hilton, A.[Adrian],
Dawes, R.[Robert],
Thomas, G.[Graham],
Mustafa, A.[Annin],
SEM-POS: Grammatically and Semantically Correct Video Captioning,
MULA23(2606-2616)
IEEE DOI
2309
BibRef
Ullah, N.[Nasib],
Mohanta, P.P.[Partha Pratim],
Thinking Hallucination for Video Captioning,
ACCV22(IV:623-640).
Springer DOI
2307
BibRef
Seo, P.H.[Paul Hongsuck],
Nagrani, A.[Arsha],
Arnab, A.[Anurag],
Schmid, C.[Cordelia],
End-to-end Generative Pretraining for Multimodal Video Captioning,
CVPR22(17938-17947)
IEEE DOI
2210
Representation learning, Computational modeling,
Bidirectional control, Benchmark testing, Decoding,
Self- semi- meta- unsupervised learning
BibRef
Lin, K.[Kevin],
Li, L.J.[Lin-Jie],
Lin, C.C.[Chung-Ching],
Ahmed, F.[Faisal],
Gan, Z.[Zhe],
Liu, Z.C.[Zi-Cheng],
Lu, Y.[Yumao],
Wang, L.J.[Li-Juan],
SwinBERT: End-to-End Transformers with Sparse Attention for Video
Captioning,
CVPR22(17928-17937)
IEEE DOI
2210
Adaptation models, Video sequences, Redundancy, Natural languages,
Transformers, Feature extraction, Vision + language
BibRef
Shi, Y.[Yaya],
Yang, X.[Xu],
Xu, H.Y.[Hai-Yang],
Yuan, C.F.[Chun-Feng],
Li, B.[Bing],
Hu, W.M.[Wei-Ming],
Zha, Z.J.[Zheng-Jun],
EMScore: Evaluating Video Captioning via Coarse-Grained and
Fine-Grained Embedding Matching,
CVPR22(17908-17917)
IEEE DOI
2210
Measurement, Visualization, Correlation, Systematics,
Computational modeling, Semantics, Vision + language
BibRef
Chen, S.X.[Shao-Xiang],
Jiang, Y.G.[Yu-Gang],
Motion Guided Region Message Passing for Video Captioning,
ICCV21(1523-1532)
IEEE DOI
2203
Location awareness, Visualization, Message passing,
Computational modeling, Detectors, Feature extraction,
Video analysis and understanding
BibRef
Joshi, P.,
Saharia, C.,
Singh, V.,
Gautam, D.,
Ramakrishnan, G.,
Jyothi, P.,
A Tale of Two Modalities for Video Captioning,
MMVAMTC19(3708-3712)
IEEE DOI
2004
audio signal processing, learning (artificial intelligence),
natural language processing, text analysis, multi modal
BibRef
Wang, T.[Teng],
Zhang, R.M.[Rui-Mao],
Lu, Z.C.[Zhi-Chao],
Zheng, F.[Feng],
Cheng, R.[Ran],
Luo, P.[Ping],
End-to-End Dense Video Captioning with Parallel Decoding,
ICCV21(6827-6837)
IEEE DOI
2203
Location awareness, Handheld computers, Stacking, Redundancy,
Pipelines, Transformers, Decoding,
Vision + language
BibRef
Yang, B.[Bang],
Zou, Y.X.[Yue-Xian],
Visual Oriented Encoder: Integrating Multimodal and Multi-Scale
Contexts for Video Captioning,
ICPR21(188-195)
IEEE DOI
2105
Visualization, Semantics, Natural languages, Benchmark testing,
Feature extraction, Encoding, Data mining
BibRef
Perez-Martin, J.[Jesus],
Bustos, B.[Benjamin],
Pérez, J.[Jorge],
Attentive Visual Semantic Specialized Network for Video Captioning,
ICPR21(5767-5774)
IEEE DOI
2105
Visualization, Adaptation models, Video description, Semantics,
Logic gates, Syntactics,
video captioning
BibRef
Olivastri, S.,
Singh, G.,
Cuzzolin, F.,
End-to-End Video Captioning,
HVU19(1474-1482)
IEEE DOI
2004
convolutional neural nets, decoding, image recognition,
learning (artificial intelligence), recurrent neural nets,
BibRef
Li, L.,
Gong, B.,
End-to-End Video Captioning With Multitask Reinforcement Learning,
WACV19(339-348)
IEEE DOI
1904
convolutional neural nets,
learning (artificial intelligence), recurrent neural nets,
Hardware
BibRef
Wang, B.,
Ma, L.,
Zhang, W.,
Liu, W.,
Reconstruction Network for Video Captioning,
CVPR18(7622-7631)
IEEE DOI
1812
Decoding, Semantics, Image reconstruction, Video sequences,
Visualization, Feature extraction, Natural languages
BibRef
Li, Y.,
Yao, T.,
Pan, Y.,
Chao, H.,
Mei, T.,
Jointly Localizing and Describing Events for Dense Video Captioning,
CVPR18(7492-7500)
IEEE DOI
1812
Proposals, Dogs, Complexity theory, Task analysis, Training, Optimization
BibRef
Wu, X.,
Li, G.,
Cao, Q.,
Ji, Q.,
Lin, L.,
Interpretable Video Captioning via Trajectory Structured Localization,
CVPR18(6829-6837)
IEEE DOI
1812
Trajectory, Feature extraction, Decoding, Visualization, Semantics,
Recurrent neural networks
BibRef
Wang, X.,
Chen, W.,
Wu, J.,
Wang, Y.,
Wang, W.Y.,
Video Captioning via Hierarchical Reinforcement Learning,
CVPR18(4213-4222)
IEEE DOI
1812
Task analysis, Semantics, Dogs, Neural networks,
Portable computers
BibRef
Zhou, L.,
Zhou, Y.,
Corso, J.J.,
Socher, R.,
Xiong, C.,
End-to-End Dense Video Captioning with Masked Transformer,
CVPR18(8739-8748)
IEEE DOI
1812
Proposals, Decoding, Encoding, Hidden Markov models, Feeds, Training,
Visualization
BibRef
Yang, D.,
Yuan, C.,
Hierarchical Context Encoding for Events Captioning in Videos,
ICIP18(1288-1292)
IEEE DOI
1809
Videos, Proposals, Task analysis, Mathematical model,
Computational modeling, Decoding, Measurement, Video captioning,
video summarization
BibRef
Shen, Z.Q.[Zhi-Qiang],
Li, J.G.[Jian-Guo],
Su, Z.[Zhou],
Li, M.J.[Min-Jun],
Chen, Y.R.[Yu-Rong],
Jiang, Y.G.[Yu-Gang],
Xue, X.Y.[Xiang-Yang],
Weakly Supervised Dense Video Captioning,
CVPR17(5159-5167)
IEEE DOI
1711
Motion segmentation, Neural networks, Training,
Visualization, Vocabulary
BibRef
Baraldi, L.,
Grana, C.,
Cucchiara, R.,
Hierarchical Boundary-Aware Neural Encoder for Video Captioning,
CVPR17(3185-3194)
IEEE DOI
1711
Encoding, Logic gates, Microprocessors,
Motion pictures, Streaming media, Visualization
BibRef
Pan, P.B.[Ping-Bo],
Xu, Z.W.[Zhong-Wen],
Yang, Y.[Yi],
Wu, F.[Fei],
Zhuang, Y.T.[Yue-Ting],
Hierarchical Recurrent Neural Encoder for Video Representation with
Application to Captioning,
CVPR16(1029-1038)
IEEE DOI
1612
video captioning where temporal information plays a crucial role.
BibRef
Yu, H.N.[Hao-Nan],
Wang, J.[Jiang],
Huang, Z.H.[Zhi-Heng],
Yang, Y.[Yi],
Xu, W.[Wei],
Video Paragraph Captioning Using Hierarchical Recurrent Neural
Networks,
CVPR16(4584-4593)
IEEE DOI
1612
Generating one or multiple sentences to describe a realistic video
BibRef
Shin, A.[Andrew],
Ohnishi, K.[Katsunori],
Harada, T.[Tatsuya],
Beyond caption to narrative: Video captioning with multiple sentences,
ICIP16(3364-3368)
IEEE DOI
1610
Feature extraction
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Video Summarization, Abstract, MPEG Based, AVC, H264, MPEG Metadata .