20.4.3.3.18 Video Grounding, Grounding Expressions

Chapter Contents (Back)
Grounding. Video Grounding. An arbitrary subset of visual grounding.
See also Vision-Language Models, Language-Vision Models, VQA.
See also General Spatial Reasoning and Geometric Reasoning Issues, Visual Relations.

Ding, X.P.[Xin-Peng], Wang, N.N.[Nan-Nan], Zhang, S.W.[Shi-Wei], Huang, Z.Y.[Zi-Yuan], Li, X.M.[Xiao-Meng], Tang, M.Q.[Ming-Qian], Liu, T.L.[Tong-Liang], Gao, X.B.[Xin-Bo],
Exploring Language Hierarchy for Video Grounding,
IP(31), 2022, pp. 4693-4706.
IEEE DOI 2207
Proposals, Grounding, Training, Location awareness, Task analysis, Semantics, Feature extraction, Video and language, language hierarchy BibRef

Xu, Z.[Zhe], Chen, D.[Da], Wei, K.[Kun], Deng, C.[Cheng], Xue, H.[Hui],
HiSA: Hierarchically Semantic Associating for Video Temporal Grounding,
IP(31), 2022, pp. 5178-5188.
IEEE DOI 2208
Grounding, Feature extraction, Proposals, Task analysis, Semantics, Representation learning, Image segmentation, cross-guided contrast BibRef

Gao, J.L.[Jia-Lin], Sun, X.[Xin], Ghanem, B.[Bernard], Zhou, X.[Xi], Ge, S.M.[Shi-Ming],
Efficient Video Grounding With Which-Where Reading Comprehension,
CirSysVideo(32), No. 10, October 2022, pp. 6900-6913.
IEEE DOI 2210
Grounding, Proposals, Visualization, Location awareness, Task analysis, Reinforcement learning, deep learning BibRef

Zhou, H.[Hao], Zhang, C.Y.[Chong-Yang], Luo, Y.[Yan], Hu, C.P.[Chuan-Ping], Zhang, W.J.[Wen-Jun],
Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding,
CirSysVideo(32), No. 10, October 2022, pp. 7190-7203.
IEEE DOI 2210
Annotations, Grounding, Task analysis, Uncertainty, Measurement, Predictive models, Optimization, Temporal grounding, label uncertainty BibRef

Tang, Z.H.[Zong-Heng], Liao, Y.[Yue], Liu, S.[Si], Li, G.B.[Guan-Bin], Jin, X.J.[Xiao-Jie], Jiang, H.X.[Hong-Xu], Yu, Q.[Qian], Xu, D.[Dong],
Human-Centric Spatio-Temporal Video Grounding With Visual Transformers,
CirSysVideo(32), No. 12, December 2022, pp. 8238-8249.
IEEE DOI 2212
Grounding, Visualization, Electron tubes, Location awareness, Power transformers, Spatial temporal resolution, dataset BibRef

Wang, W.[Wei], Gao, J.Y.[Jun-Yu], Xu, C.S.[Chang-Sheng],
Weakly-Supervised Video Object Grounding via Causal Intervention,
PAMI(45), No. 3, March 2023, pp. 3933-3948.
IEEE DOI 2302
Grounding, Visualization, Task analysis, Dairy products, Annotations, Context modeling, Proposals, Weakly-supervised learning, adversarial contrastive learning
See also Multimodal Evidential Learning for Open-World Weakly-Supervised Video Anomaly Detection. BibRef

Wang, W.[Wei], Gao, J.Y.[Jun-Yu], Xu, C.S.[Chang-Sheng],
Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations,
MultMed(25), 2023, pp. 6329-6340.
IEEE DOI 2311
BibRef

Xu, Z.[Zhe], Wei, K.[Kun], Yang, X.[Xu], Deng, C.[Cheng],
Point-Supervised Video Temporal Grounding,
MultMed(25), 2023, pp. 6121-6131.
IEEE DOI 2311
BibRef

Lu, Y.[Yu], Quan, R.J.[Rui-Jie], Zhu, L.C.[Lin-Chao], Yang, Y.[Yi],
Zero-Shot Video Grounding With Pseudo Query Lookup and Verification,
IP(33), 2024, pp. 1643-1654.
IEEE DOI 2403
Grounding, Detectors, Proposals, Training, Task analysis, Visualization, Semantics, Video grounding, zero-shot learning, vision and language BibRef

Shi, F.Y.[Feng-Yuan], Huang, W.L.[Wei-Lin], Wang, L.M.[Li-Min],
End-to-end dense video grounding via parallel regression,
CVIU(242), 2024, pp. 103980.
Elsevier DOI 2404
Visual grounding, Dense grounding, Query based detection BibRef

Xiong, Z.[Zeyu], Liu, D.Z.[Dai-Zong], Fang, X.[Xiang], Qu, X.Y.[Xiao-Ye], Dong, J.F.[Jian-Feng], Zhu, J.H.[Jia-Hao], Tang, K.[Keke], Zhou, P.[Pan],
Rethinking Video Sentence Grounding from a Tracking Perspective With Memory Network and Masked Attention,
MultMed(26), 2024, pp. 11204-11218.
IEEE DOI 2412
Target tracking, Semantics, Task analysis, Object tracking, Grounding, Feature extraction, Visualization, Cross-modal, VSG BibRef

Fang, X.[Xiang], Xiong, Z.[Zeyu], Fang, W.L.[Wan-Long], Qu, X.Y.[Xiao-Ye], Chen, C.[Chen], Dongd, J.F.[Jian-Feng], Tang, K.[Keke], Zhou, P.[Pan], Cheng, Y.[Yu], Liu, D.Z.[Dai-Zong],
Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective,
ECCV24(XLV: 290-311).
Springer DOI 2412
BibRef

Wu, Q.Q.[Qing-Qing], Guo, L.J.[Li-Jun], Zhang, R.[Rong], Qian, J.B.[Jiang-Bo], Gao, S.[Shangce],
QSMT-net: A query-sensitive proposal and multi-temporal-span matching network for video grounding,
IVC(149), 2024, pp. 105188.
Elsevier DOI 2408
Video grounding, Multi-modal feature fusion, Cross-attention modeling BibRef

Dong, J.X.[Jian-Xiang], Yin, Z.Z.[Zhao-Zheng],
Graph-based Dense Event Grounding with relative positional encoding,
CVIU(251), 2025, pp. 104257.
Elsevier DOI 2501
Dense Event Grounding, Temporal sentence grounding, Video grounding, Relative positional encoding BibRef

Tang, K.F.[Ke-Fan], He, L.H.[Li-Huo], Wang, N.N.[Nan-Nan], Gao, X.B.[Xin-Bo],
Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding,
MultMed(27), 2025, pp. 95-107.
IEEE DOI 2501
Proposals, Grounding, Feature extraction, Image reconstruction, Annotations, Semantics, Training, Information processing, Decoding, consistency constraint BibRef

Liu, K.[Kun], Qu, M.X.[Meng-Xue], Liu, Y.[Yang], Wei, Y.C.[Yun-Chao], Zhe, W.M.[Wen-Ming], Zhao, Y.[Yao], Liu, W.[Wu],
Single-Frame Supervision for Spatio-Temporal Video Grounding,
PAMI(47), No. 7, July 2025, pp. 5177-5191.
IEEE DOI 2506
Annotations, Grounding, Task analysis, Electron tubes, Training, Visualization, Costs, Curriculum learning, large scale dataset, spatio-temporal video grounding BibRef

Hu, J.J.[Jing-Jing], Guo, D.[Dan], Li, K.[Kun], Si, Z.[Zhan], Yang, X.[Xun], Chang, X.J.[Xiao-Jun], Wang, M.[Meng],
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding,
PAMI(47), No. 8, August 2025, pp. 6445-6462.
IEEE DOI 2507
Grounding, Visual perception, Semantics, Filtering, Biology, Training, Convolution, Complexity theory, Solids, Pattern analysis, vision and language BibRef

Ran, R.[Ran], Wei, J.[Jiwei], He, S.Y.[Shi-Yuan], Zhou, Y.Y.[Yu-Yang], Wang, P.[Peng], Yang, Y.[Yang], Shen, H.T.[Heng Tao],
Fine-Grained Alignment and Interaction for Video Grounding With Cross-Modal Semantic Hierarchical Graph,
CirSysVideo(35), No. 11, November 2025, pp. 11641-11654.
IEEE DOI 2511
Semantics, Grounding, Tires, Feature extraction, Contrastive learning, Visualization, semantic understanding BibRef

Wang, M.Z.[Meng-Zhao], Li, H.F.[Hua-Feng], Zhang, Y.F.[Ya-Fei], Li, J.X.[Jin-Xing], Tao, D.P.[Da-Peng], Yu, Z.T.[Zheng-Tao],
Disentangling Inter- and Intra-Video Relations for Multi-Event Video-Text Retrieval and Grounding,
IP(34), 2025, pp. 7558-7571.
IEEE DOI Code:
WWW Link. 2512
Videos, Grounding, Proposals, Feature extraction, Training, Visualization, Accuracy, Contrastive learning, Weak supervision, multi-event queries BibRef

Yang, J.[Jin], Wei, P.[Ping],
Learning unified patterns of multimodalities for video temporal grounding,
PR(172), 2026, pp. 112484.
Elsevier DOI Code:
WWW Link. 2512
Multimodal learning, Moment retrieval, Highlight detection, Video temporal grounding BibRef

Liu, Y.[Yang], Zheng, M.H.[Ming-Hang], Chen, Q.C.[Qing-Chao], Gong, S.G.[Shao-Gang], Peng, Y.X.[Yu-Xin],
Large-Scale Pre-Trained Models Empowering Phrase Generalization in Temporal Sentence Localization,
IJCV(134), No. 2, February 2026, pp. 53.
Springer DOI 2601
BibRef

Zheng, M.H.[Ming-Hang], Cai, X.H.[Xin-Hao], Chen, Q.C.[Qing-Chao], Peng, Y.X.[Yu-Xin], Liu, Y.[Yang],
Training-Free Video Temporal Grounding Using Large-Scale Pre-Trained Models,
ECCV24(LXXXII: 20-37).
Springer DOI 2412
BibRef

Li, A.[Ao], Liu, H.J.[Hui-Jun], Zhu, Y.Q.[Yi-Qing], Ge, Y.X.[Yong-Xin],
Efficient Pre-Trained Semantics Refinement for Video Temporal Grounding,
CirSysVideo(36), No. 2, February 2026, pp. 1406-1418.
IEEE DOI 2602
Semantics, Visualization, Feature extraction, Grounding, Training, Proposals, Natural languages, Tuning, contrast learning BibRef

Moon, W.J.[Won-Jun], Hyun, S.[Sangeek], Lee, S.[Subeen], Heo, J.P.[Jae-Pil],
Correlation-guided calibration of query dependency for video temporal grounding,
PR(174), 2026, pp. 112984.
Elsevier DOI 2602
Video temporal grounding, Moment retrieval, Video highlight detection BibRef


Cao, Z.[Zhuo], Zhang, B.Q.[Bing-Qing], Du, H.M.[He-Ming], Yu, X.[Xin], Li, X.[Xue], Wang, S.[Sen],
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding,
WACV25(9226-9236)
IEEE DOI Code:
WWW Link. 2505
Training, Accuracy, Codes, Adaptive systems, Grounding, Benchmark testing, Robustness, Decoding, moment retrieval BibRef

Weerakoon, D.[Dulanga], Subbaraju, V.[Vigneshwaran], Lim, J.H.[Joo Hwee], Misra, A.[Archan],
NeuroViG: Integrating Event Cameras for Resource-Efficient Video Grounding,
WACV25(5781-5790)
IEEE DOI 2505
Visualization, Technological innovation, Accuracy, Grounding, Neuromorphics, Pipelines, Neural networks, Cameras, Transformers, event processing BibRef

Jin, Y.[Yang], Mu, Y.D.[Ya-Dong],
Weakly-supervised Spatio-temporal Video Grounding with Variational Cross-modal Alignment,
ECCV24(XLVIII: 412-429).
Springer DOI 2412
BibRef

Fujiwara, K.[Kent], Tanaka, M.[Mikihiro], Yu, Q.[Qing],
Chronologically Accurate Retrieval for Temporal Grounding of Motion-language Models,
ECCV24(LVIII: 323-339).
Springer DOI 2412
BibRef

Bao, P.J.[Pei-Jun], Shao, Z.[Zihao], Yang, W.H.[Wen-Han], Ng, B.P.[Boon Poh], Kot, A.C.[Alex C.],
E3m: Zero-shot Spatio-temporal Video Grounding with Expectation-maximization Multimodal Modulation,
ECCV24(LXXXIII: 227-243).
Springer DOI 2412
BibRef

Hannan, T.[Tanveer], Islam, M.M.[Md Mohaiminul], Seidl, T.[Thomas], Bertasius, G.[Gedas],
RGNET: A Unified Clip Retrieval and Grounding Network for Long Videos,
ECCV24(XXI: 352-369).
Springer DOI 2412
BibRef

Gu, X.[Xin], Fan, H.[Heng], Huang, Y.[Yan], Luo, T.J.[Tie-Jian], Zhang, L.B.[Li-Bo],
Context-Guided Spatio-Temporal Video Grounding,
CVPR24(18330-18339)
IEEE DOI Code:
WWW Link. 2410
Location awareness, Degradation, Visualization, Codes, Grounding, spatio-temporal video grounding, instance context learning BibRef

Chen, B.[Brian], Shvetsova, N.[Nina], Rouditchenko, A.[Andrew], Kondermann, D.[Daniel], Thomas, S.[Samuel], Chang, S.F.[Shih-Fu], Feris, R.[Rogerio], Glass, J.[James], Kuehne, H.[Hilde],
What, When, and Where? Self-Supervised Spatio- Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions,
CVPR24(18419-18429)
IEEE DOI 2410
Representation learning, Grounding, Annotations, Benchmark testing, Encoding, Self-supervised learning, Grounding BibRef

Wasim, S.T.[Syed Talal], Naseer, M.[Muzammal], Khan, S.[Salman], Yang, M.H.[Ming-Hsuan], Khan, F.S.[Fahad Shahbaz],
VideoGrounding-DINO: Towards Open-Vocabulary Spatio- Temporal Video Grounding,
CVPR24(18909-18918)
IEEE DOI 2410
Visualization, Adaptation models, Vocabulary, Grounding, Semantics, Natural languages, Training data, Video Grounding, Open Vocabulary, MultiModal BibRef

de la Jara, I.M.[Ignacio M.], Rodriguez-Opazo, C.[Cristian], Marrese-Taylor, E.[Edison], Bravo-Marquez, F.[Felipe],
An empirical study of the effect of video encoders on Temporal Video Grounding,
CLVL23(2842-2847)
IEEE DOI 2401
BibRef

Li, H.X.[Hong-Xiang], Cao, M.[Meng], Cheng, X.[Xuxin], Li, Y.W.[Yao-Wei], Zhu, Z.H.[Zhi-Hong], Zou, Y.X.[Yue-Xian],
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory,
ICCV23(11998-12008)
IEEE DOI 2401
BibRef

Li, H.[Hanjun], Shu, X.J.[Xiu-Jun], He, S.[Sunan], Qiao, R.Z.[Rui-Zhi], Wen, W.[Wei], Guo, T.[Taian], Gan, B.[Bei], Sun, X.[Xing],
D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation,
ICCV23(13688-13700)
IEEE DOI Code:
WWW Link. 2401
BibRef

Pan, Y.L.[Yu-Lin], He, X.T.[Xiang-Teng], Gong, B.[Biao], Lv, Y.L.[Yi-Liang], Shen, Y.J.[Yu-Jun], Peng, Y.X.[Yu-Xin], Zhao, D.L.[De-Li],
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos,
ICCV23(13721-13731)
IEEE DOI Code:
WWW Link. 2401
BibRef

Jang, J.[Jinhyun], Park, J.[Jungin], Kim, J.[Jin], Kwon, H.[Hyeongjun], Sohn, K.H.[Kwang-Hoon],
Knowing Where to Focus: Event-aware Transformer for Video Grounding,
ICCV23(13800-13810)
IEEE DOI Code:
WWW Link. 2401
BibRef

Cao, M.[Meng], Wei, F.Y.[Fang-Yun], Xu, C.[Can], Geng, X.[Xiubo], Chen, L.[Long], Zhang, C.[Can], Zou, Y.X.[Yue-Xian], Shen, T.[Tao], Jiang, D.X.[Da-Xin],
Iterative Proposal Refinement for Weakly-Supervised Video Grounding,
CVPR23(6524-6534)
IEEE DOI 2309
BibRef

Lu, Z.J.[Zi-Jia], Iftekhar, A.S.M., Mittal, G.[Gaurav], Meng, T.J.[Tian-Jian], Wang, X.[Xiawei], Zhao, C.[Cheng], Kukkala, R.[Rohith], Elhamifar, E.[Ehsan], Chen, M.[Mei],
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos,
CVPR25(24066-24076)
IEEE DOI 2508
Accuracy, Grounding, Benchmark testing, Feature extraction, Computational efficiency, Videos, long video temporal grounding, model efficiency BibRef

Wang, L.[Lan], Mittal, G.[Gaurav], Sajeev, S.[Sandra], Yu, Y.[Ye], Hall, M.[Matthew], Boddeti, V.N.[Vishnu Naresh], Chen, M.[Mei],
ProTéGé: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding,
CVPR23(6575-6585)
IEEE DOI 2309
BibRef

Chen, J.[Joya], Gao, D.F.[Di-Fei], Lin, K.Q.H.[Kevin Qing-Hong], Shou, M.Z.[Mike Zheng],
Affordance Grounding from Demonstration Video to Target Image,
CVPR23(6799-6808)
IEEE DOI 2309
BibRef

Zhang, Y.M.[Yi-Meng], Chen, X.[Xin], Jia, J.H.[Jing-Han], Liu, S.[Sijia], Ding, K.[Ke],
Text-Visual Prompting for Efficient 2D Temporal Video Grounding,
CVPR23(14794-14804)
IEEE DOI 2309
BibRef

Li, M.Z.[Meng-Ze], Wang, H.[Han], Zhang, W.Q.[Wen-Qiao], Miao, J.X.[Jia-Xu], Zhao, Z.[Zhou], Zhang, S.Y.[Sheng-Yu], Ji, W.[Wei], Wu, F.[Fei],
WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding,
CVPR23(23090-23099)
IEEE DOI 2309
BibRef

Lin, Z.H.[Zi-Hang], Tan, C.L.[Chao-Lei], Hu, J.F.[Jian-Fang], Jin, Z.[Zhi], Ye, T.[Tiancai], Zheng, W.S.[Wei-Shi],
Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding,
CVPR23(23100-23109)
IEEE DOI 2309
BibRef

Yang, L.[Lijin], Kong, Q.[Quan], Yang, H.K.[Hsuan-Kung], Kehl, W.[Wadim], Sato, Y.[Yoichi], Kobori, N.[Norimasa],
DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-to-Fine Contrastive Ranking,
CVPR23(23130-23140)
IEEE DOI 2309
BibRef

Kim, D.[Dahye], Park, J.[Jungin], Lee, J.Y.[Ji-Young], Park, S.[Seongheon], Sohn, K.H.[Kwang-Hoon],
Language-free Training for Zero-shot Video Grounding,
WACV23(2538-2547)
IEEE DOI 2302
Training, Visualization, Grounding, Annotations, Natural languages, Standards BibRef

Dvornik, N.[Nikita], Hadji, I.[Isma], Pham, H.[Hai], Bhatt, D.[Dhaivat], Martinez, B.[Brais], Fazly, A.[Afsaneh], Jepson, A.D.[Allan D.],
Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization,
ECCV22(XXXV:319-335).
Springer DOI 2211
BibRef

Xiong, Z.[Zeyu], Liu, D.[Daizong], Zhou, P.[Pan],
Gaussian Kernel-Based Cross Modal Network for Spatio-Temporal Video Grounding,
ICIP22(2481-2485)
IEEE DOI 2211
Heating systems, Grounding, Natural languages, Electron tubes, Task analysis, anchor-free, Gaussian kernel, spatial-temporal video grounding BibRef

Ding, X.P.[Xin-Peng], Wang, N.N.[Nan-Nan], Zhang, S.W.[Shi-Wei], Cheng, D.[De], Li, X.M.[Xiao-Meng], Huang, Z.Y.[Zi-Yuan], Tang, M.Q.[Ming-Qian], Gao, X.B.[Xin-Bo],
Support-Set Based Cross-Supervision for Video Grounding,
ICCV21(11553-11562)
IEEE DOI 2203
Training, Visualization, Costs, Correlation, Grounding, Semantics, Image and video retrieval, Vision + language BibRef

Su, R.[Rui], Yu, Q.[Qian], Xu, D.[Dong],
STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding,
ICCV21(1513-1522)
IEEE DOI 2203
Representation learning, Visualization, Grounding, Detectors, Benchmark testing, Transformers, Electron tubes, Vision + language, Video analysis and understanding BibRef

Soldan, M.[Mattia], Xu, M.M.[Meng-Meng], Qu, S.[Sisi], Tegner, J.[Jesper], Ghanem, B.[Bernard],
VLG-Net: Video-Language Graph Matching Network for Video Grounding,
CVEU21(3217-3227)
IEEE DOI 2112
Location awareness, Grounding, Semantics, Syntactics, Graph neural networks BibRef

Nan, G.S.[Guo-Shun], Qiao, R.[Rui], Xiao, Y.[Yao], Liu, J.[Jun], Leng, S.C.[Si-Cong], Zhang, H.[Hao], Lu, W.[Wei],
Interventional Video Grounding with Dual Contrastive Learning,
CVPR21(2764-2774)
IEEE DOI 2111
Visualization, Correlation, Grounding, Benchmark testing, Knowledge discovery, Data models BibRef

Zhao, Y.[Yang], Zhao, Z.[Zhou], Zhang, Z.[Zhu], Lin, Z.J.[Zhi-Jie],
Cascaded Prediction Network via Segment Tree for Temporal Video Grounding,
CVPR21(4195-4204)
IEEE DOI 2111
Costs, Grounding, Navigation, Fuses, Benchmark testing BibRef

Zhang, Z., Zhao, Z., Zhao, Y., Wang, Q., Liu, H., Gao, L.,
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences,
CVPR20(10665-10674)
IEEE DOI 2008
Grounding, Task analysis, Visualization, Cognition, Feature extraction, Natural languages BibRef

Zeng, R.H.[Run-Hao], Xu, H.M.[Hao-Ming], Huang, W.B.[Wen-Bing], Chen, P.H.[Pei-Hao], Tan, M.K.[Ming-Kui], Gan, C.[Chuang],
Dense Regression Network for Video Grounding,
CVPR20(10284-10293)
IEEE DOI 2008
Grounding, Training, Task analysis, Proposals, Semantics, Magnetic heads, Feature extraction BibRef

Shi, J.[Jing], Xu, J.[Jia], Gong, B.Q.[Bo-Qing], Xu, C.L.[Chen-Liang],
Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses,
CVPR19(10436-10444).
IEEE DOI 2002
BibRef

Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Language Grounding .


Last update:Apr 18, 2026 at 20:43:46