Ding, X.P.[Xin-Peng],
Wang, N.N.[Nan-Nan],
Zhang, S.W.[Shi-Wei],
Huang, Z.Y.[Zi-Yuan],
Li, X.M.[Xiao-Meng],
Tang, M.Q.[Ming-Qian],
Liu, T.L.[Tong-Liang],
Gao, X.B.[Xin-Bo],
Exploring Language Hierarchy for Video Grounding,
IP(31), 2022, pp. 4693-4706.
IEEE DOI
2207
Proposals, Grounding, Training, Location awareness, Task analysis,
Semantics, Feature extraction, Video and language, language hierarchy
BibRef
Xu, Z.[Zhe],
Chen, D.[Da],
Wei, K.[Kun],
Deng, C.[Cheng],
Xue, H.[Hui],
HiSA: Hierarchically Semantic Associating for Video Temporal
Grounding,
IP(31), 2022, pp. 5178-5188.
IEEE DOI
2208
Grounding, Feature extraction, Proposals, Task analysis, Semantics,
Representation learning, Image segmentation,
cross-guided contrast
BibRef
Gao, J.L.[Jia-Lin],
Sun, X.[Xin],
Ghanem, B.[Bernard],
Zhou, X.[Xi],
Ge, S.M.[Shi-Ming],
Efficient Video Grounding With Which-Where Reading Comprehension,
CirSysVideo(32), No. 10, October 2022, pp. 6900-6913.
IEEE DOI
2210
Grounding, Proposals, Visualization, Location awareness,
Task analysis, Reinforcement learning, deep learning
BibRef
Zhou, H.[Hao],
Zhang, C.Y.[Chong-Yang],
Luo, Y.[Yan],
Hu, C.P.[Chuan-Ping],
Zhang, W.J.[Wen-Jun],
Thinking Inside Uncertainty: Interest Moment Perception for Diverse
Temporal Grounding,
CirSysVideo(32), No. 10, October 2022, pp. 7190-7203.
IEEE DOI
2210
Annotations, Grounding, Task analysis, Uncertainty, Measurement,
Predictive models, Optimization, Temporal grounding, label uncertainty
BibRef
Tang, Z.H.[Zong-Heng],
Liao, Y.[Yue],
Liu, S.[Si],
Li, G.B.[Guan-Bin],
Jin, X.J.[Xiao-Jie],
Jiang, H.X.[Hong-Xu],
Yu, Q.[Qian],
Xu, D.[Dong],
Human-Centric Spatio-Temporal Video Grounding With Visual
Transformers,
CirSysVideo(32), No. 12, December 2022, pp. 8238-8249.
IEEE DOI
2212
Grounding, Visualization, Electron tubes, Location awareness,
Power transformers, Spatial temporal resolution, dataset
BibRef
Wang, W.[Wei],
Gao, J.Y.[Jun-Yu],
Xu, C.S.[Chang-Sheng],
Weakly-Supervised Video Object Grounding via Causal Intervention,
PAMI(45), No. 3, March 2023, pp. 3933-3948.
IEEE DOI
2302
Grounding, Visualization, Task analysis, Dairy products, Annotations,
Context modeling, Proposals, Weakly-supervised learning,
adversarial contrastive learning
See also Multimodal Evidential Learning for Open-World Weakly-Supervised Video Anomaly Detection.
BibRef
Wang, W.[Wei],
Gao, J.Y.[Jun-Yu],
Xu, C.S.[Chang-Sheng],
Weakly-Supervised Video Object Grounding via Learning Uni-Modal
Associations,
MultMed(25), 2023, pp. 6329-6340.
IEEE DOI
2311
BibRef
Xu, Z.[Zhe],
Wei, K.[Kun],
Yang, X.[Xu],
Deng, C.[Cheng],
Point-Supervised Video Temporal Grounding,
MultMed(25), 2023, pp. 6121-6131.
IEEE DOI
2311
BibRef
Lu, Y.[Yu],
Quan, R.J.[Rui-Jie],
Zhu, L.C.[Lin-Chao],
Yang, Y.[Yi],
Zero-Shot Video Grounding With Pseudo Query Lookup and Verification,
IP(33), 2024, pp. 1643-1654.
IEEE DOI
2403
Grounding, Detectors, Proposals, Training, Task analysis, Visualization,
Semantics, Video grounding, zero-shot learning, vision and language
BibRef
Shi, F.Y.[Feng-Yuan],
Huang, W.L.[Wei-Lin],
Wang, L.M.[Li-Min],
End-to-end dense video grounding via parallel regression,
CVIU(242), 2024, pp. 103980.
Elsevier DOI
2404
Visual grounding, Dense grounding, Query based detection
BibRef
Xiong, Z.[Zeyu],
Liu, D.Z.[Dai-Zong],
Fang, X.[Xiang],
Qu, X.Y.[Xiao-Ye],
Dong, J.F.[Jian-Feng],
Zhu, J.H.[Jia-Hao],
Tang, K.[Keke],
Zhou, P.[Pan],
Rethinking Video Sentence Grounding from a Tracking Perspective With
Memory Network and Masked Attention,
MultMed(26), 2024, pp. 11204-11218.
IEEE DOI
2412
Target tracking, Semantics, Task analysis, Object tracking,
Grounding, Feature extraction, Visualization, Cross-modal, VSG
BibRef
Fang, X.[Xiang],
Xiong, Z.[Zeyu],
Fang, W.L.[Wan-Long],
Qu, X.Y.[Xiao-Ye],
Chen, C.[Chen],
Dongd, J.F.[Jian-Feng],
Tang, K.[Keke],
Zhou, P.[Pan],
Cheng, Y.[Yu],
Liu, D.Z.[Dai-Zong],
Rethinking Weakly-supervised Video Temporal Grounding From a Game
Perspective,
ECCV24(XLV: 290-311).
Springer DOI
2412
BibRef
Wu, Q.Q.[Qing-Qing],
Guo, L.J.[Li-Jun],
Zhang, R.[Rong],
Qian, J.B.[Jiang-Bo],
Gao, S.[Shangce],
QSMT-net: A query-sensitive proposal and multi-temporal-span matching
network for video grounding,
IVC(149), 2024, pp. 105188.
Elsevier DOI
2408
Video grounding, Multi-modal feature fusion, Cross-attention modeling
BibRef
Dong, J.X.[Jian-Xiang],
Yin, Z.Z.[Zhao-Zheng],
Graph-based Dense Event Grounding with relative positional encoding,
CVIU(251), 2025, pp. 104257.
Elsevier DOI
2501
Dense Event Grounding, Temporal sentence grounding,
Video grounding, Relative positional encoding
BibRef
Tang, K.F.[Ke-Fan],
He, L.H.[Li-Huo],
Wang, N.N.[Nan-Nan],
Gao, X.B.[Xin-Bo],
Dual Semantic Reconstruction Network for Weakly Supervised Temporal
Sentence Grounding,
MultMed(27), 2025, pp. 95-107.
IEEE DOI
2501
Proposals, Grounding, Feature extraction, Image reconstruction,
Annotations, Semantics, Training, Information processing, Decoding,
consistency constraint
BibRef
Liu, K.[Kun],
Qu, M.X.[Meng-Xue],
Liu, Y.[Yang],
Wei, Y.C.[Yun-Chao],
Zhe, W.M.[Wen-Ming],
Zhao, Y.[Yao],
Liu, W.[Wu],
Single-Frame Supervision for Spatio-Temporal Video Grounding,
PAMI(47), No. 7, July 2025, pp. 5177-5191.
IEEE DOI
2506
Annotations, Grounding, Task analysis, Electron tubes, Training,
Visualization, Costs, Curriculum learning, large scale dataset,
spatio-temporal video grounding
BibRef
Hu, J.J.[Jing-Jing],
Guo, D.[Dan],
Li, K.[Kun],
Si, Z.[Zhan],
Yang, X.[Xun],
Chang, X.J.[Xiao-Jun],
Wang, M.[Meng],
Unified Static and Dynamic Network: Efficient Temporal Filtering for
Video Grounding,
PAMI(47), No. 8, August 2025, pp. 6445-6462.
IEEE DOI
2507
Grounding, Visual perception, Semantics, Filtering, Biology, Training,
Convolution, Complexity theory, Solids, Pattern analysis,
vision and language
BibRef
Ran, R.[Ran],
Wei, J.[Jiwei],
He, S.Y.[Shi-Yuan],
Zhou, Y.Y.[Yu-Yang],
Wang, P.[Peng],
Yang, Y.[Yang],
Shen, H.T.[Heng Tao],
Fine-Grained Alignment and Interaction for Video Grounding With
Cross-Modal Semantic Hierarchical Graph,
CirSysVideo(35), No. 11, November 2025, pp. 11641-11654.
IEEE DOI
2511
Semantics, Grounding, Tires, Feature extraction,
Contrastive learning, Visualization, semantic understanding
BibRef
Wang, M.Z.[Meng-Zhao],
Li, H.F.[Hua-Feng],
Zhang, Y.F.[Ya-Fei],
Li, J.X.[Jin-Xing],
Tao, D.P.[Da-Peng],
Yu, Z.T.[Zheng-Tao],
Disentangling Inter- and Intra-Video Relations for Multi-Event
Video-Text Retrieval and Grounding,
IP(34), 2025, pp. 7558-7571.
IEEE DOI Code:
WWW Link.
2512
Videos, Grounding, Proposals, Feature extraction, Training,
Visualization, Accuracy, Contrastive learning, Weak supervision,
multi-event queries
BibRef
Yang, J.[Jin],
Wei, P.[Ping],
Learning unified patterns of multimodalities for video temporal
grounding,
PR(172), 2026, pp. 112484.
Elsevier DOI Code:
WWW Link.
2512
Multimodal learning, Moment retrieval, Highlight detection,
Video temporal grounding
BibRef
Liu, Y.[Yang],
Zheng, M.H.[Ming-Hang],
Chen, Q.C.[Qing-Chao],
Gong, S.G.[Shao-Gang],
Peng, Y.X.[Yu-Xin],
Large-Scale Pre-Trained Models Empowering Phrase Generalization in
Temporal Sentence Localization,
IJCV(134), No. 2, February 2026, pp. 53.
Springer DOI
2601
BibRef
Zheng, M.H.[Ming-Hang],
Cai, X.H.[Xin-Hao],
Chen, Q.C.[Qing-Chao],
Peng, Y.X.[Yu-Xin],
Liu, Y.[Yang],
Training-Free Video Temporal Grounding Using Large-Scale Pre-Trained
Models,
ECCV24(LXXXII: 20-37).
Springer DOI
2412
BibRef
Li, A.[Ao],
Liu, H.J.[Hui-Jun],
Zhu, Y.Q.[Yi-Qing],
Ge, Y.X.[Yong-Xin],
Efficient Pre-Trained Semantics Refinement for Video Temporal
Grounding,
CirSysVideo(36), No. 2, February 2026, pp. 1406-1418.
IEEE DOI
2602
Semantics, Visualization, Feature extraction, Grounding, Training,
Proposals, Natural languages, Tuning, contrast learning
BibRef
Moon, W.J.[Won-Jun],
Hyun, S.[Sangeek],
Lee, S.[Subeen],
Heo, J.P.[Jae-Pil],
Correlation-guided calibration of query dependency for video temporal
grounding,
PR(174), 2026, pp. 112984.
Elsevier DOI
2602
Video temporal grounding, Moment retrieval, Video highlight detection
BibRef
Weerakoon, D.[Dulanga],
Subbaraju, V.[Vigneshwaran],
Lim, J.H.[Joo Hwee],
Misra, A.[Archan],
NeuroViG:
Integrating Event Cameras for Resource-Efficient Video Grounding,
WACV25(5781-5790)
IEEE DOI
2505
Visualization, Technological innovation, Accuracy, Grounding,
Neuromorphics, Pipelines, Neural networks, Cameras, Transformers,
event processing
BibRef
Jin, Y.[Yang],
Mu, Y.D.[Ya-Dong],
Weakly-supervised Spatio-temporal Video Grounding with Variational
Cross-modal Alignment,
ECCV24(XLVIII: 412-429).
Springer DOI
2412
BibRef
Fujiwara, K.[Kent],
Tanaka, M.[Mikihiro],
Yu, Q.[Qing],
Chronologically Accurate Retrieval for Temporal Grounding of
Motion-language Models,
ECCV24(LVIII: 323-339).
Springer DOI
2412
BibRef
Bao, P.J.[Pei-Jun],
Shao, Z.[Zihao],
Yang, W.H.[Wen-Han],
Ng, B.P.[Boon Poh],
Kot, A.C.[Alex C.],
E3m: Zero-shot Spatio-temporal Video Grounding with
Expectation-maximization Multimodal Modulation,
ECCV24(LXXXIII: 227-243).
Springer DOI
2412
BibRef
Hannan, T.[Tanveer],
Islam, M.M.[Md Mohaiminul],
Seidl, T.[Thomas],
Bertasius, G.[Gedas],
RGNET: A Unified Clip Retrieval and Grounding Network for Long Videos,
ECCV24(XXI: 352-369).
Springer DOI
2412
BibRef
Gu, X.[Xin],
Fan, H.[Heng],
Huang, Y.[Yan],
Luo, T.J.[Tie-Jian],
Zhang, L.B.[Li-Bo],
Context-Guided Spatio-Temporal Video Grounding,
CVPR24(18330-18339)
IEEE DOI Code:
WWW Link.
2410
Location awareness, Degradation, Visualization, Codes, Grounding,
spatio-temporal video grounding, instance context learning
BibRef
Chen, B.[Brian],
Shvetsova, N.[Nina],
Rouditchenko, A.[Andrew],
Kondermann, D.[Daniel],
Thomas, S.[Samuel],
Chang, S.F.[Shih-Fu],
Feris, R.[Rogerio],
Glass, J.[James],
Kuehne, H.[Hilde],
What, When, and Where? Self-Supervised Spatio- Temporal Grounding in
Untrimmed Multi-Action Videos from Narrated Instructions,
CVPR24(18419-18429)
IEEE DOI
2410
Representation learning, Grounding, Annotations, Benchmark testing,
Encoding, Self-supervised learning, Grounding
BibRef
Wasim, S.T.[Syed Talal],
Naseer, M.[Muzammal],
Khan, S.[Salman],
Yang, M.H.[Ming-Hsuan],
Khan, F.S.[Fahad Shahbaz],
VideoGrounding-DINO: Towards Open-Vocabulary Spatio- Temporal Video
Grounding,
CVPR24(18909-18918)
IEEE DOI
2410
Visualization, Adaptation models, Vocabulary, Grounding, Semantics,
Natural languages, Training data, Video Grounding, Open Vocabulary,
MultiModal
BibRef
de la Jara, I.M.[Ignacio M.],
Rodriguez-Opazo, C.[Cristian],
Marrese-Taylor, E.[Edison],
Bravo-Marquez, F.[Felipe],
An empirical study of the effect of video encoders on Temporal Video
Grounding,
CLVL23(2842-2847)
IEEE DOI
2401
BibRef
Li, H.X.[Hong-Xiang],
Cao, M.[Meng],
Cheng, X.[Xuxin],
Li, Y.W.[Yao-Wei],
Zhu, Z.H.[Zhi-Hong],
Zou, Y.X.[Yue-Xian],
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic
and Game Theory,
ICCV23(11998-12008)
IEEE DOI
2401
BibRef
Li, H.[Hanjun],
Shu, X.J.[Xiu-Jun],
He, S.[Sunan],
Qiao, R.Z.[Rui-Zhi],
Wen, W.[Wei],
Guo, T.[Taian],
Gan, B.[Bei],
Sun, X.[Xing],
D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with
Glance Annotation,
ICCV23(13688-13700)
IEEE DOI Code:
WWW Link.
2401
BibRef
Pan, Y.L.[Yu-Lin],
He, X.T.[Xiang-Teng],
Gong, B.[Biao],
Lv, Y.L.[Yi-Liang],
Shen, Y.J.[Yu-Jun],
Peng, Y.X.[Yu-Xin],
Zhao, D.L.[De-Li],
Scanning Only Once: An End-to-end Framework for Fast Temporal
Grounding in Long Videos,
ICCV23(13721-13731)
IEEE DOI Code:
WWW Link.
2401
BibRef
Jang, J.[Jinhyun],
Park, J.[Jungin],
Kim, J.[Jin],
Kwon, H.[Hyeongjun],
Sohn, K.H.[Kwang-Hoon],
Knowing Where to Focus: Event-aware Transformer for Video Grounding,
ICCV23(13800-13810)
IEEE DOI Code:
WWW Link.
2401
BibRef
Cao, M.[Meng],
Wei, F.Y.[Fang-Yun],
Xu, C.[Can],
Geng, X.[Xiubo],
Chen, L.[Long],
Zhang, C.[Can],
Zou, Y.X.[Yue-Xian],
Shen, T.[Tao],
Jiang, D.X.[Da-Xin],
Iterative Proposal Refinement for Weakly-Supervised Video Grounding,
CVPR23(6524-6534)
IEEE DOI
2309
BibRef
Lu, Z.J.[Zi-Jia],
Iftekhar, A.S.M.,
Mittal, G.[Gaurav],
Meng, T.J.[Tian-Jian],
Wang, X.[Xiawei],
Zhao, C.[Cheng],
Kukkala, R.[Rohith],
Elhamifar, E.[Ehsan],
Chen, M.[Mei],
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in
Long Videos,
CVPR25(24066-24076)
IEEE DOI
2508
Accuracy, Grounding, Benchmark testing, Feature extraction,
Computational efficiency, Videos, long video temporal grounding,
model efficiency
BibRef
Wang, L.[Lan],
Mittal, G.[Gaurav],
Sajeev, S.[Sandra],
Yu, Y.[Ye],
Hall, M.[Matthew],
Boddeti, V.N.[Vishnu Naresh],
Chen, M.[Mei],
ProTéGé: Untrimmed Pretraining for Video Temporal Grounding by Video
Temporal Grounding,
CVPR23(6575-6585)
IEEE DOI
2309
BibRef
Chen, J.[Joya],
Gao, D.F.[Di-Fei],
Lin, K.Q.H.[Kevin Qing-Hong],
Shou, M.Z.[Mike Zheng],
Affordance Grounding from Demonstration Video to Target Image,
CVPR23(6799-6808)
IEEE DOI
2309
BibRef
Zhang, Y.M.[Yi-Meng],
Chen, X.[Xin],
Jia, J.H.[Jing-Han],
Liu, S.[Sijia],
Ding, K.[Ke],
Text-Visual Prompting for Efficient 2D Temporal Video Grounding,
CVPR23(14794-14804)
IEEE DOI
2309
BibRef
Li, M.Z.[Meng-Ze],
Wang, H.[Han],
Zhang, W.Q.[Wen-Qiao],
Miao, J.X.[Jia-Xu],
Zhao, Z.[Zhou],
Zhang, S.Y.[Sheng-Yu],
Ji, W.[Wei],
Wu, F.[Fei],
WINNER: Weakly-supervised hIerarchical decompositioN and aligNment
for spatio-tEmporal video gRounding,
CVPR23(23090-23099)
IEEE DOI
2309
BibRef
Lin, Z.H.[Zi-Hang],
Tan, C.L.[Chao-Lei],
Hu, J.F.[Jian-Fang],
Jin, Z.[Zhi],
Ye, T.[Tiancai],
Zheng, W.S.[Wei-Shi],
Collaborative Static and Dynamic Vision-Language Streams for
Spatio-Temporal Video Grounding,
CVPR23(23100-23109)
IEEE DOI
2309
BibRef
Yang, L.[Lijin],
Kong, Q.[Quan],
Yang, H.K.[Hsuan-Kung],
Kehl, W.[Wadim],
Sato, Y.[Yoichi],
Kobori, N.[Norimasa],
DeCo: Decomposition and Reconstruction for Compositional Temporal
Grounding via Coarse-to-Fine Contrastive Ranking,
CVPR23(23130-23140)
IEEE DOI
2309
BibRef
Kim, D.[Dahye],
Park, J.[Jungin],
Lee, J.Y.[Ji-Young],
Park, S.[Seongheon],
Sohn, K.H.[Kwang-Hoon],
Language-free Training for Zero-shot Video Grounding,
WACV23(2538-2547)
IEEE DOI
2302
Training, Visualization, Grounding, Annotations, Natural languages, Standards
BibRef
Dvornik, N.[Nikita],
Hadji, I.[Isma],
Pham, H.[Hai],
Bhatt, D.[Dhaivat],
Martinez, B.[Brais],
Fazly, A.[Afsaneh],
Jepson, A.D.[Allan D.],
Flow Graph to Video Grounding for Weakly-Supervised Multi-step
Localization,
ECCV22(XXXV:319-335).
Springer DOI
2211
BibRef
Xiong, Z.[Zeyu],
Liu, D.[Daizong],
Zhou, P.[Pan],
Gaussian Kernel-Based Cross Modal Network for Spatio-Temporal Video
Grounding,
ICIP22(2481-2485)
IEEE DOI
2211
Heating systems, Grounding, Natural languages, Electron tubes,
Task analysis, anchor-free, Gaussian kernel, spatial-temporal video grounding
BibRef
Ding, X.P.[Xin-Peng],
Wang, N.N.[Nan-Nan],
Zhang, S.W.[Shi-Wei],
Cheng, D.[De],
Li, X.M.[Xiao-Meng],
Huang, Z.Y.[Zi-Yuan],
Tang, M.Q.[Ming-Qian],
Gao, X.B.[Xin-Bo],
Support-Set Based Cross-Supervision for Video Grounding,
ICCV21(11553-11562)
IEEE DOI
2203
Training, Visualization, Costs, Correlation, Grounding, Semantics,
Image and video retrieval, Vision + language
BibRef
Su, R.[Rui],
Yu, Q.[Qian],
Xu, D.[Dong],
STVGBert: A Visual-linguistic Transformer based Framework for
Spatio-temporal Video Grounding,
ICCV21(1513-1522)
IEEE DOI
2203
Representation learning, Visualization, Grounding, Detectors,
Benchmark testing, Transformers, Electron tubes,
Vision + language, Video analysis and understanding
BibRef
Soldan, M.[Mattia],
Xu, M.M.[Meng-Meng],
Qu, S.[Sisi],
Tegner, J.[Jesper],
Ghanem, B.[Bernard],
VLG-Net: Video-Language Graph Matching Network for Video Grounding,
CVEU21(3217-3227)
IEEE DOI
2112
Location awareness, Grounding,
Semantics, Syntactics, Graph neural networks
BibRef
Nan, G.S.[Guo-Shun],
Qiao, R.[Rui],
Xiao, Y.[Yao],
Liu, J.[Jun],
Leng, S.C.[Si-Cong],
Zhang, H.[Hao],
Lu, W.[Wei],
Interventional Video Grounding with Dual Contrastive Learning,
CVPR21(2764-2774)
IEEE DOI
2111
Visualization, Correlation, Grounding, Benchmark testing,
Knowledge discovery, Data models
BibRef
Zhao, Y.[Yang],
Zhao, Z.[Zhou],
Zhang, Z.[Zhu],
Lin, Z.J.[Zhi-Jie],
Cascaded Prediction Network via Segment Tree for Temporal Video
Grounding,
CVPR21(4195-4204)
IEEE DOI
2111
Costs, Grounding, Navigation, Fuses, Benchmark testing
BibRef
Zhang, Z.,
Zhao, Z.,
Zhao, Y.,
Wang, Q.,
Liu, H.,
Gao, L.,
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form
Sentences,
CVPR20(10665-10674)
IEEE DOI
2008
Grounding, Task analysis, Visualization, Cognition,
Feature extraction, Natural languages
BibRef
Zeng, R.H.[Run-Hao],
Xu, H.M.[Hao-Ming],
Huang, W.B.[Wen-Bing],
Chen, P.H.[Pei-Hao],
Tan, M.K.[Ming-Kui],
Gan, C.[Chuang],
Dense Regression Network for Video Grounding,
CVPR20(10284-10293)
IEEE DOI
2008
Grounding, Training, Task analysis, Proposals, Semantics,
Magnetic heads, Feature extraction
BibRef
Shi, J.[Jing],
Xu, J.[Jia],
Gong, B.Q.[Bo-Qing],
Xu, C.L.[Chen-Liang],
Not All Frames Are Equal: Weakly-Supervised Video Grounding With
Contextual Similarity and Visual Clustering Losses,
CVPR19(10436-10444).
IEEE DOI
2002
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Language Grounding .