Pang, B.[Bo],
Peng, G.[Gao],
Li, Y.Z.[Yi-Zhuo],
Lu, C.[Cewu],
Markov Progressive Framework, a Universal Paradigm for Modeling Long
Videos,
PAMI(46), No. 12, December 2024, pp. 9749-9765.
IEEE DOI
2411
Videos, Computational modeling, Semantics, Training, Transformers,
Task analysis, Solid modeling, Video understanding, progressive modeling
BibRef
You, Z.[Zeng],
Wen, Z.Q.[Zhi-Quan],
Chen, Y.F.[Yao-Fo],
Li, X.[Xin],
Zeng, R.H.[Run-Hao],
Wang, Y.W.[Yao-Wei],
Tan, M.K.[Ming-Kui],
Toward Long Video Understanding via Fine-Detailed Video Story
Generation,
CirSysVideo(35), No. 5, May 2025, pp. 4592-4607.
IEEE DOI
2505
Visualization, Termination of employment, Semantics,
Large language models, Feature extraction
BibRef
Jang, H.[Huiwon],
Yu, S.[Sihyun],
Shin, J.[Jinwoo],
Abbeel, P.[Pieter],
Seo, Y.[Younggyo],
Efficient Long Video Tokenization via Coordinate-based Patch
Reconstruction,
CVPR25(22853-22863)
IEEE DOI
2508
Chunks of long videos.
Training, Solid modeling, Dynamics, Coherence, Transformers, Tokenization,
Encoding, Video codecs, Videos, video tokenization, video generation
BibRef
Man, Y.B.[Yuan-Bin],
Huang, Y.[Ying],
Zhang, C.M.[Cheng-Ming],
Li, B.Z.[Bing-Zhe],
Niu, W.[Wei],
Yin, M.[Miao],
AdaCM2: On Understanding Extremely Long-Term Video with Adaptive
Cross-Modality Memory Reduction,
CVPR25(8534-8544)
IEEE DOI
2508
Visualization, Adaptation models, Large language models, Memory management,
Graphics processing units, Propulsion, multimodal language model
BibRef
Ren, W.M.[Wei-Ming],
Yang, H.[Huan],
Min, J.[Jie],
Wei, C.[Cong],
Chen, W.[Wenhu],
VISTA: Enhancing Long-Duration and High-Resolution Video
Understanding by VIdeo SpatioTemporal Augmentation,
CVPR25(3804-3814)
IEEE DOI
2508
Accuracy, Benchmark testing, Performance gain, Robustness,
Spatiotemporal phenomena, Spatial resolution, Faces, Videos, synthetic dataset
BibRef
Wang, Z.Y.[Zi-Yang],
Yu, S.[Shoubin],
Stengel-Eskin, E.[Elias],
Yoon, J.[Jaehong],
Cheng, F.[Feng],
Bertasius, G.[Gedas],
Bansal, M.[Mohit],
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning
on Long Videos,
CVPR25(3272-3282)
IEEE DOI
2508
Training, Accuracy, Refining, Redundancy, Cognition, Data mining,
Iterative methods, Feeds, Videos, long video understanding,
LLM-based video understanding
BibRef
Ye, J.H.[Jin-Hui],
Wang, Z.[Zihan],
Sun, H.[Haosen],
Chandrasegaran, K.[Keshigeyan],
Durante, Z.[Zane],
Eyzaguirre, C.[Cristobal],
Bisk, Y.[Yonatan],
Niebles, J.C.[Juan Carlos],
Adeli, E.[Ehsan],
Fei-Fei, L.[Li],
Wu, J.J.[Jia-Jun],
Li, M.[Manling],
Re-thinking Temporal Search for Long-Form Video Understanding,
CVPR25(8579-8591)
IEEE DOI
2508
Training, Measurement, Location awareness, Visualization,
Computational modeling, Benchmark testing, Search problems, Videos,
temporal searching
BibRef
Wang, L.[Lan],
Chen, Y.J.[Yu-Jia],
Tran, D.[Du],
Boddeti, V.N.[Vishnu Naresh],
Chu, W.S.[Wen-Sheng],
SEAL: SEmantic Attention Learning for Long Video Representation,
CVPR25(26192-26201)
IEEE DOI
2508
Grounding, Computational modeling, Semantics, Redundancy, Seals,
Question answering (information retrieval),
long video understanding
BibRef
Pan, Y.[Yulu],
Zhang, C.[Ce],
Bertasius, G.[Gedas],
Basket: A Large-Scale Video Dataset for Fine-Grained Skill Estimation,
CVPR25(28952-28962)
IEEE DOI Code:
WWW Link.
2508
Analytical models, Codes, Accuracy, Computational modeling,
Estimation, Predictive models, Videos, large-scale video dataset,
long video understanding
BibRef
Zhou, J.J.[Jun-Jie],
Shu, Y.[Yan],
Zhao, B.[Bo],
Wu, B.[Boya],
Liang, Z.Y.[Zheng-Yang],
Xiao, S.T.[Shi-Tao],
Qin, M.H.[Ming-Hao],
Yang, X.[Xi],
Xiong, Y.P.[Yong-Ping],
Zhang, B.[Bo],
Huang, T.J.[Tie-Jun],
Liu, Z.[Zheng],
MLVU: Benchmarking Multi-task Long Video Understanding,
CVPR25(13691-13701)
IEEE DOI Code:
WWW Link.
2508
Degradation, Technological innovation, Surveillance,
Benchmark testing, Multitasking, Motion pictures, Optimization, Videos
BibRef
Shu, Y.[Yan],
Liu, Z.[Zheng],
Zhang, P.[Peitian],
Qin, M.H.[Ming-Hao],
Zhou, J.J.[Jun-Jie],
Liang, Z.Y.[Zheng-Yang],
Huang, T.J.[Tie-Jun],
Zhao, B.[Bo],
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video
Understanding,
CVPR25(26160-26169)
IEEE DOI
2508
Training, Learning systems, Visualization, Soft sensors,
Large language models, Graphics processing units,
Synthetic data
BibRef
Tang, X.[Xi],
Qiu, J.[Jihao],
Xie, L.X.[Ling-Xi],
Tian, Y.J.[Yun-Jie],
Jiao, J.B.[Jian-Bin],
Ye, Q.X.[Qi-Xiang],
Adaptive Keyframe Sampling for Long Video Understanding,
CVPR25(29118-29128)
IEEE DOI Code:
WWW Link.
2508
Visualization, Adaptation models, Codes, Large language models,
Benchmark testing, Feeds, Optimization, Videos, video understanding,
keyframe sampling
BibRef
Ventura, L.[Lucas],
Yang, A.[Antoine],
Schmid, C.[Cordelia],
Varol, G.[Gül],
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs,
CVPR25(18947-18958)
IEEE DOI
2508
Visualization, Codes, Navigation, Large language models,
Computational modeling, Semantics, Speech recognition, Feeds, Videos,
vidchapters-7m benchmark
BibRef
Geng, T.T.[Tian-Tian],
Zhang, J.[Jinrui],
Wang, Q.[Qingni],
Wang, T.[Teng],
Duan, J.M.[Jin-Ming],
Zheng, F.[Feng],
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware
Omni-Modal Perception of Long Videos,
CVPR25(18959-18969)
IEEE DOI Code:
WWW Link.
2508
Filtering, Large language models, Pipelines, Semantics, Manuals,
Benchmark testing, Data models, Labeling, Videos,
long video understanding
BibRef
Kim, J.[Junho],
Kim, H.[Hyunjun],
Lee, H.[Hosu],
Ro, Y.M.[Yong Man],
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval
and Routing in Long-Form Video Analysis,
CVPR25(3352-3362)
IEEE DOI
2508
Routing, Spatiotemporal phenomena, Videos, Context modeling
BibRef
Song, E.[Enxin],
Chai, W.H.[Wen-Hao],
Wang, G.[Guanhong],
Zhang, Y.C.[Yu-Cheng],
Zhou, H.Y.[Hao-Yang],
Wu, F.[Feiyang],
Chi, H.Z.[Hao-Zhe],
Guo, X.[Xun],
Ye, T.[Tian],
Zhang, Y.T.[Yan-Ting],
Lu, Y.[Yan],
Hwang, J.N.[Jenq-Neng],
Wang, G.A.[Gao-Ang],
MovieChat: From Dense Token to Sparse Memory for Long Video
Understanding,
CVPR24(18221-18232)
IEEE DOI Code:
WWW Link.
2410
Visualization, Costs, Large language models,
Computational modeling, Manuals, Transformers
BibRef
Korbar, B.[Bruno],
Xian, Y.Q.[Yong-Qin],
Tonioni, A.[Alessio],
Zisserman, A.[Andrew],
Tombari, F.[Federico],
Text-conditioned Resampler For Long Form Video Understanding,
ECCV24(LXXXVI: 271-288).
Springer DOI
2412
BibRef
Wang, X.H.[Xiao-Han],
Zhang, Y.H.[Yu-Hui],
Zohar, O.[Orr],
Yeung-Levy, S.[Serena],
Videoagent: Long-form Video Understanding with Large Language Model as
Agent,
ECCV24(LXXX: 58-76).
Springer DOI
2412
BibRef
Weng, Y.[Yuetian],
Han, M.F.[Ming-Fei],
He, H.Y.[Hao-Yu],
Chang, X.J.[Xiao-Jun],
Zhuang, B.[Bohan],
LongVLM: Efficient Long Video Understanding via Large Language Models,
ECCV24(XXXIII: 453-470).
Springer DOI
2412
BibRef
He, B.[Bo],
Li, H.[Hengduo],
Jang, Y.K.[Young Kyun],
Jia, M.L.[Meng-Lin],
Cao, X.F.[Xue-Fei],
Shah, A.[Ashish],
Shrivastava, A.[Abhinav],
Lim, S.N.[Ser-Nam],
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
Understanding,
CVPR24(13504-13514)
IEEE DOI
2410
Analytical models, Large language models, Memory management,
Video sequences, Graphics processing units,
Long-Term Video Understanding
BibRef
Zhang, C.Y.[Chao-Yi],
Lin, K.[Kevin],
Yang, Z.Y.[Zheng-Yuan],
Wang, J.F.[Jian-Feng],
Li, L.J.[Lin-Jie],
Lin, C.C.[Chung-Ching],
Liu, Z.C.[Zi-Cheng],
Wang, L.J.[Li-Juan],
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
Learning,
CVPR24(13647-13657)
IEEE DOI
2410
Measurement, Visualization, Accuracy, Annotations, Memory management,
Cognition, video understanding, LLM, in-context learning, multimodal,
vision-and-language
BibRef
Ren, S.[Shuhuai],
Yao, L.[Linli],
Li, S.C.[Shi-Cheng],
Sun, X.[Xu],
Hou, L.[Lu],
TimeChat: A Time-sensitive Multimodal Large Language Model for Long
Video Understanding,
CVPR24(14313-14323)
IEEE DOI Code:
WWW Link.
2410
Location awareness, Visualization, Codes, Grounding,
Large language models, Cognition, long video understanding
BibRef
Xu, M.[Ming],
Gould, S.[Stephen],
Temporally Consistent Unbalanced Optimal Transport for Unsupervised
Action Segmentation,
CVPR24(14618-14627)
IEEE DOI
2410
Video on demand, Costs, Pipelines, Encoding,
Web sites, Noise measurement, long-form video understanding, procedural videos
BibRef
Rodin, I.[Ivan],
Furnari, A.[Antonino],
Min, K.[Kyle],
Tripathi, S.[Subarna],
Farinella, G.M.[Giovanni Maria],
Action Scene Graphs for Long-Form Understanding of Egocentric Videos,
CVPR24(18622-18632)
IEEE DOI Code:
WWW Link.
2410
Codes, Annotations, Manuals, Benchmark testing, Cameras,
egocentric vision, scene graphs, long-form video understanding
BibRef
Ataallah, K.[Kirolos],
Shen, X.Q.[Xiao-Qian],
Abdelrahman, E.[Eslam],
Sleiman, E.[Essam],
Zhuge, M.C.[Ming-Chen],
Ding, J.[Jian],
Zhu, D.[Deyao],
Schmidhuber, J.[Jürgen],
Elhoseiny, M.[Mohamed],
Goldfish: Vision-language Understanding of Arbitrarily Long Videos,
ECCV24(XXIX: 251-267).
Springer DOI
2412
BibRef
Afham, M.[Mohamed],
Shukla, S.N.[Satya Narayan],
Poursaeed, O.[Omid],
Zhang, P.[Pengchuan],
Shah, A.[Ashish],
Lim, S.[Sernam],
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for
Long-form Video Understanding,
REDLCV23(1181-1186)
IEEE DOI
2401
BibRef
Strafforello, O.[Ombretta],
Schutte, K.[Klamer],
van Gemert, J.C.[Jan C.],
Are current long-term video understanding datasets long-term?,
CVEU23(2959-2968)
IEEE DOI
2401
BibRef
Yang, X.T.[Xi-Tong],
Chu, F.J.[Fu-Jen],
Feiszli, M.[Matt],
Goyal, R.[Raghav],
Torresani, L.[Lorenzo],
Tran, D.[Du],
Relational Space-Time Query in Long-Form Videos,
CVPR23(6398-6408)
IEEE DOI
2309
BibRef
Wang, J.[Jue],
Zhu, W.T.[Wen-Tao],
Wang, P.[Pichao],
Yu, X.[Xiang],
Liu, L.[Linda],
Omar, M.[Mohamed],
Hamid, R.[Raffay],
Selective Structured State-Spaces for Long-Form Video Understanding,
CVPR23(6387-6397)
IEEE DOI
2309
BibRef
Islam, M.M.[Md Mohaiminul],
Bertasius, G.[Gedas],
Long Movie Clip Classification with State-Space Video Models,
ECCV22(XXXV:87-104).
Springer DOI
2211
BibRef
Wu, C.Y.[Chao-Yuan],
Krähenbühl, P.[Philipp],
Towards Long-Form Video Understanding,
CVPR21(1884-1894)
IEEE DOI
2111
Visualization, Protocols, Computational modeling,
Machine vision, Benchmark testing
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Surveillance Video Summarization, Surveillance Synopsis .