Zhang, J.[Ji],
Mei, K.Z.[Kui-Zhi],
Zheng, Y.[Yu],
Fan, J.P.[Jian-Ping],
Exploiting Mid-Level Semantics for Large-Scale Complex Video
Classification,
MultMed(21), No. 10, October 2019, pp. 2518-2530.
IEEE DOI
1910
feature extraction, image classification,
image motion analysis, image representation,
large-scale video classification
BibRef
Zhang, J.[Ji],
Mei, K.Z.[Kui-Zhi],
Wang, X.,
Zheng, Y.[Yu],
Fan, J.P.[Jian-Ping],
From Text to Video: Exploiting Mid-Level Semantics for Large-Scale
Video Classification,
ICPR18(1695-1700)
IEEE DOI
1812
Semantics, Task analysis, Visualization, Streaming media, Detectors,
Encoding, Bridges
BibRef
Yang, M.[Min],
Liu, J.H.[Jun-Hao],
Shen, Y.[Ying],
Zhao, Z.[Zhou],
Chen, X.J.[Xiao-Jun],
Wu, Q.Y.[Qing-Yao],
Li, C.M.[Cheng-Ming],
An Ensemble of Generation- and Retrieval-Based Image Captioning With
Dual Generator Generative Adversarial Network,
IP(29), 2020, pp. 9627-9640.
IEEE DOI
2011
Generators, Decoding, Generative adversarial networks, Training,
Computational modeling, Task analysis, Image captioning,
adversarial learning
BibRef
Chen, Q.[Qi],
Wu, Q.[Qi],
Chen, J.[Jian],
Wu, Q.Y.[Qing-Yao],
van den Hengel, A.J.[Anton J.],
Tan, M.K.[Ming-Kui],
Scripted Video Generation With a Bottom-Up Generative Adversarial
Network,
IP(29), 2020, pp. 7454-7467.
IEEE DOI
2007
Generative adversarial networks, video generation,
semantic alignment, temporal coherence
BibRef
Sheng, L.[Lu],
Pan, J.T.[Jun-Ting],
Guo, J.M.[Jia-Ming],
Shao, J.[Jing],
Loy, C.C.[Chen Change],
High-Quality Video Generation from Static Structural Annotations,
IJCV(128), No. 10-11, November 2020, pp. 2552-2569.
Springer DOI
2009
BibRef
Sener, F.[Fadime],
Saraf, R.[Rishabh],
Yao, A.[Angela],
Transferring Knowledge From Text to Video:
Zero-Shot Anticipation for Procedural Actions,
PAMI(45), No. 6, June 2023, pp. 7836-7852.
IEEE DOI
2305
Visualization, Robots, Data models, Task analysis, Predictive models,
Natural languages, Text recognition, Deep learning, video analysis
BibRef
Liu, S.Y.[Si-Ying],
Dragotti, P.L.[Pier Luigi],
Sensing Diversity and Sparsity Models for Event Generation and Video
Reconstruction from Events,
PAMI(45), No. 10, October 2023, pp. 12444-12458.
IEEE DOI
2310
Event to video.
BibRef
Köksal, A.[Ali],
Ak, K.E.[Kenan E.],
Sun, Y.[Ying],
Rajan, D.[Deepu],
Lim, J.H.[Joo Hwee],
Controllable Video Generation With Text-Based Instructions,
MultMed(26), 2024, pp. 190-201.
IEEE DOI
2401
BibRef
Liu, J.W.[Jia-Wei],
Wang, W.N.[Wei-Ning],
Chen, S.[Sihan],
Zhu, X.X.[Xin-Xin],
Liu, J.[Jing],
Sounding Video Generator: A Unified Framework for Text-Guided
Sounding Video Generation,
MultMed(26), 2024, pp. 141-153.
IEEE DOI
2401
BibRef
Hu, Y.[Yaosi],
Luo, C.[Chong],
Chen, Z.Z.[Zhen-Zhong],
A Benchmark for Controllable Text-Image-to-Video Generation,
MultMed(26), 2024, pp. 1706-1719.
IEEE DOI
2402
Task analysis, Measurement, Generators, Uncertainty, Visualization, Dynamics,
Benchmark testing, Video generation, text-image-to-video,
multimodal-conditioned generation
BibRef
Fang, S.[Sheng],
Dang, T.T.[Tian-Tian],
Wang, S.H.[Shu-Hui],
Huang, Q.M.[Qing-Ming],
Linguistic Hallucination for Text-Based Video Retrieval,
CirSysVideo(34), No. 10, October 2024, pp. 9692-9705.
IEEE DOI Code:
WWW Link.
2411
Linguistics, Training, Testing, Encoding, Context modeling,
Feature extraction, Task analysis, Text-video retrieval, curriculum learning
BibRef
Nadeem, M.[Mohammad],
Sohail, S.S.[Shahab Saquib],
Cambria, E.[Erik],
Schuller, B.W.[Björn W.],
Hussain, A.[Amir],
Gender Bias in Text-to-Video Generation Models: A Case Study of Sora,
IEEE_Int_Sys(40), No. 3, May 2025, pp. 10-15.
IEEE DOI
2506
Analytical models, Leadership, Ethics, Generative AI,
Prevention and mitigation, Training data, Focusing, Intelligent systems
BibRef
Kuang, Q.[Qi],
Chen, Y.[Ying],
Visual-Aware Text as Query for Referring Video Object Segmentation,
IVC(161), 2025, pp. 105608.
Elsevier DOI
2509
Referring video object segmentation, Text-to-video, CLIP
BibRef
Deo, A.[Anurag],
Bhat, S.[Savita],
Karande, S.[Shirish],
VisualFusion: Enhancing Blog Content with Advanced Infographic
Pipeline,
WACV25(5591-5600)
IEEE DOI
2505
Measurement, Image quality, Visualization, Pipelines, Blogs,
Text to image, Streaming media, Data models, Standards, Videos, NLP
BibRef
Shin, C.[Chaehun],
Choi, J.Y.[Joo-Young],
Kim, H.[Heeseung],
Yoon, S.[Sungroh],
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot
Subject-Driven Image Generator,
CVPR25(7986-7996)
IEEE DOI
2508
Visualization, Image synthesis, Semantics, Text to image, Generators,
Videos, Context modeling, zero-shot subject-driven text-to-image generation
BibRef
Liang, F.[Feng],
Ma, H.Y.[Hao-Yu],
He, Z.C.[Ze-Cheng],
Hou, T.B.[Ting-Bo],
Hou, J.[Ji],
Li, K.[Kunpeng],
Dai, X.L.[Xiao-Liang],
Juefei-Xu, F.[Felix],
Azadi, S.[Samaneh],
Sinha, A.[Animesh],
Zhang, P.Z.[Pei-Zhao],
Vajda, P.[Peter],
Marculescu, D.[Diana],
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with
Anchored Prompts,
CVPR25(13146-13156)
IEEE DOI
2508
Limiting, Accuracy, Animals, Face recognition, Motion pictures, Videos
BibRef
Nguyen, T.[Thao],
Singh, K.K.[Krishna Kumar],
Shi, J.[Jing],
Bui, T.[Trung],
Lee, Y.J.[Yong Jae],
Li, Y.H.[Yu-Heng],
Yo'Chameleon: Personalized Vision and Language Generation,
CVPR25(14438-14448)
IEEE DOI
2508
Training, Image quality, Adaptation models, Visualization,
Image synthesis, Statistical analysis, Tuning, Optimization,
personalization
BibRef
Wang, H.L.[Han-Lin],
Ouyang, H.[Hao],
Wang, Q.Y.[Qiu-Yu],
Wang, W.[Wen],
Cheng, K.L.[Ka Leong],
Chen, Q.F.[Qi-Feng],
Shen, Y.J.[Yu-Jun],
Wang, L.M.[Li-Min],
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis,
CVPR25(12490-12500)
IEEE DOI Code:
WWW Link.
2508
Solid modeling, Image segmentation, Tracking,
Computational modeling, Pipelines, Aerospace electronics,
3d trajectory control
BibRef
Lai, Z.H.[Zi-Hang],
Vedaldi, A.[Andrea],
Tracktention: Leveraging Point Tracking to Attend Videos Faster and
Better,
CVPR25(22809-22819)
IEEE DOI
2508
Temporal consistency in synthesis.
Tracking, Computational modeling, Dynamics, Predictive models,
Transformers, Computational efficiency, Videos
BibRef
Ji, B.[Bin],
Pan, Y.[Ye],
Liu, Z.M.[Zhi-Meng],
Tan, S.[Shuai],
Jin, X.G.[Xiao-Gang],
Yang, X.K.[Xiao-Kang],
POMP: Physics-constrainable Motion Generative Model through Phase
Manifolds,
CVPR25(22690-22701)
IEEE DOI
2508
Manifolds, Training, Target tracking, Simulation, Dynamics, Kinematics,
Encoding, Real-time systems, Topology, Physics
BibRef
Wang, Y.F.[Yi-Fan],
Yang, P.[Peishan],
Xu, Z.[Zhen],
Sun, J.M.[Jia-Ming],
Zhang, Z.[Zhanhua],
Chen, Y.[Yong],
Bao, H.J.[Hu-Jun],
Peng, S.[Sida],
Zhou, X.W.[Xiao-Wei],
FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic
Scene Reconstruction,
CVPR25(21750-21760)
IEEE DOI
2508
Surface reconstruction, Solid modeling, Deformation, Dynamics,
Redundancy, Rendering (computer graphics)
BibRef
Wang, S.J.[Shi-Jie],
Azadi, S.[Samaneh],
Girdhar, R.[Rohit],
Rambhatla, S.[Saketh],
Sun, C.[Chen],
Yin, X.[Xi],
MotiF: Making Text Count in Image Animation with Motion Focal Loss,
CVPR25(7773-7783)
IEEE DOI
2508
Training, Optical losses, Heating systems, Protocols, Benchmark testing,
Animation, Optical flow, Videos, video generation, image animation
BibRef
Bian, W.[Weikang],
Huang, Z.Y.[Zhao-Yang],
Shi, X.Y.[Xiao-Yu],
Li, Y.J.[Yi-Jin],
Wang, F.Y.[Fu-Yun],
Li, H.S.[Hong-Sheng],
GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields
through Efficient Dense 3D Point Tracking,
CVPR25(21717-21727)
IEEE DOI Code:
WWW Link.
2508
Training, Tracking, Dynamics, Production, Cameras, Controllability,
Transformers, Videos, video generation, point tracking,
camera pose control
BibRef
Ren, X.[Xuanchi],
Shen, T.[Tianchang],
Huang, J.H.[Jia-Hui],
Ling, H.[Huan],
Lu, Y.F.[Yi-Fan],
Nimier-David, M.[Merlin],
Müller, T.[Thomas],
Keller, A.[Alexander],
Fidler, S.[Sanja],
Gao, J.[Jun],
Gen3C: 3D-Informed World-Consistent Video Generation with Precise
Camera Control,
CVPR25(6121-6132)
IEEE DOI
2508
Training, Point cloud compression, Solid modeling, Dynamics, Cameras,
Rendering (computer graphics), Trajectory, Videos,
camera control
BibRef
Wang, X.[Xi],
Courant, R.[Robin],
Christie, M.[Marc],
Kalogeiton, V.[Vicky],
AKiRa: Augmentation Kit on Rays for optical video generation,
CVPR25(2609-2619)
IEEE DOI
2508
Visualization, Mood, Optical distortion, Cameras, Quality assessment,
Optical films, Videos, Optical control, Lenses
BibRef
Han, H.[Haonan],
Wu, X.Z.[Xiang-Zuo],
Liao, H.[Huan],
Xu, Z.[Zunnan],
Hu, Z.Y.[Zhong-Yuan],
Li, R.H.[Rong-Hui],
Zhang, Y.C.[Ya-Chao],
Li, X.[Xiu],
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision
Reward,
CVPR25(22746-22755)
IEEE DOI Code:
WWW Link.
2508
Visualization, Annotations, Computational modeling,
Reinforcement learning, Generators
BibRef
Wu, Y.S.[Yu-Shu],
Zhang, Z.X.[Zhi-Xing],
Li, Y.[Yanyu],
Xu, Y.[Yanwu],
Kag, A.[Anil],
Sui, Y.[Yang],
Coskun, H.[Huseyin],
Ma, K.[Ke],
Lebedev, A.[Aleksei],
Hu, J.[Ju],
Metaxas, D.N.[Dimitris N.],
Wang, Y.Z.[Yan-Zhi],
Tulyakov, S.[Sergey],
Ren, J.[Jian],
SnapGen-V: Generating a Five-Second Video within Five Seconds on a
Mobile Device,
CVPR25(2479-2490)
IEEE DOI Code:
WWW Link.
2508
Limiting, Image synthesis, Computational modeling,
Image edge detection, Noise reduction, Network architecture, efficiency
BibRef
Han, H.[Hui],
Li, S.Y.[Si-Yuan],
Chen, J.Q.[Jia-Qi],
Yuan, Y.W.[Yi-Wen],
Wu, Y.L.[Yu-Ling],
Deng, Y.F.[Yu-Fan],
Leong, C.T.[Chak Tou],
Du, H.[Hanwen],
Fu, J.C.[Jun-Chen],
Li, Y.[Youhua],
Zhang, J.[Jie],
Zhang, C.[Chi],
Li, L.J.[Li-Jia],
Ni, Y.X.[Yong-Xin],
Video-Bench: Human-Aligned Video Generation Benchmark,
CVPR25(18858-18868)
IEEE DOI
2508
Measurement, Visualization, Large language models,
Computational modeling, Benchmark testing, Cognition, video generation
BibRef
Shen, X.Q.[Xiao-Qian],
Elhoseiny, M.[Mohamed],
StoryGPT-V: Large Language Models as Consistent Story Visualizers,
CVPR25(13273-13283)
IEEE DOI Code:
WWW Link.
2508
Visualization, Image segmentation, Image resolution, Accuracy,
Navigation, Large language models, Semantics, Character generation
BibRef
Chen, T.S.[Tsai-Shien],
Siarohin, A.[Aliaksandr],
Menapace, W.[Willi],
Fang, Y.W.[Yu-Wei],
Lee, K.S.[Kwot Sin],
Skorokhodov, I.[Ivan],
Aberman, K.[Kfir],
Zhu, J.Y.[Jun-Yan],
Yang, M.H.[Ming-Hsuan],
Tulyakov, S.[Sergey],
Multi-subject Open-set Personalization in Video Generation,
CVPR25(6099-6110)
IEEE DOI
2508
Training, Computational modeling, Pipelines, Benchmark testing,
Transformers, Image augmentation, Optimization, Videos, Overfitting,
customization
BibRef
Huang, Z.P.[Zhi-Peng],
Zhuang, S.[Shaobin],
Fu, C.[Canmiao],
Yang, B.X.[Bin-Xin],
Zhang, Y.[Ying],
Sun, C.[Chong],
Zhang, Z.Z.[Zhi-Zheng],
Wang, Y.[Yali],
Li, C.[Chen],
Zha, Z.J.[Zheng-Jun],
WeGen: A Unified Model for Interactive Multimodal Generation as We
Chat,
CVPR25(23679-23689)
IEEE DOI Code:
WWW Link.
2508
Visualization, Foundation models, Refining, Pipelines, Internet,
Iterative methods, Creativity, Image reconstruction, Videos
BibRef
Su, T.T.[Tong-Tong],
Wang, C.Y.[Cheng-Yu],
Liu, B.Y.[Bing-Yan],
Huang, J.[Jun],
Lu, D.M.[Dong-Ming],
Encapsulated Composition of Text-to-Image and Text-to-Video Models
for High-Quality Video Synthesis,
CVPR25(18209-18218)
IEEE DOI
2508
Adaptation models, Visualization, Synthesizers, Noise reduction,
Refining, Pipelines, Text to image, Imaging, Quality assessment, Text to video
BibRef
Wang, L.Z.[Luo-Zhou],
Li, Y.J.[Yi-Jun],
Chen, Z.F.[Zhi-Fei],
Wang, J.H.[Jui-Hsien],
Zhang, Z.F.[Zhi-Fei],
Zhang, H.[He],
Lin, Z.[Zhe],
Chen, Y.C.[Ying-Cong],
TransPixeler: Advancing Text-to-Video Generation with Transparency,
CVPR25(18229-18239)
IEEE DOI Code:
WWW Link.
2508
Adaptation models, Training data, Entertainment industry,
Visual effects, Transformers, Robustness, Reflection, Generators, Text to video
BibRef
Wang, J.R.[Jia-Rui],
Duan, H.Y.[Hui-Yu],
Zhai, G.T.[Guang-Tao],
Wang, J.T.[Jun-Tong],
Min, X.K.[Xiong-Kuo],
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of
Text-to-Video Generation with LMM,
CVPR25(18869-18880)
IEEE DOI Code:
WWW Link.
2508
Visualization, Systematics, Databases, Pressing, Benchmark testing,
Predictive models, Robustness, Quality assessment,
Text to video
BibRef
Reddy, A.[Arun],
Martin, A.[Alexander],
Yang, E.[Eugene],
Yates, A.[Andrew],
Sanders, K.[Kate],
Murray, K.[Kenton],
Kriz, R.[Reno],
de Melo, C.M.[Celso M.],
van Durme, B.[Benjamin],
Chellappa, R.[Rama],
Video-ColBERT: Contextualized Late Interaction for Text-to-Video
Retrieval,
CVPR25(19691-19701)
IEEE DOI
2508
Training, Visualization, Benchmark testing, Encoding, Text to video,
video retrieval, video-language, temporal modeling
BibRef
Zanella, L.[Luca],
Mancini, M.[Massimiliano],
Menapace, W.[Willi],
Tulyakov, S.[Sergey],
Wang, Y.M.[Yi-Ming],
Ricci, E.[Elisa],
Can Text-to-Video Generation help Video-Language Alignment?,
CVPR25(24097-24107)
IEEE DOI
2508
Visualization, Computational modeling, Large language models,
Semantics, Noise, Linguistics, Noise measurement, Text to video,
text-to-video generation
BibRef
Fan, T.[Tiehan],
Nan, K.[Kepan],
Xie, R.[Rui],
Zhou, P.H.[Peng-Hao],
Yang, Z.[Zhenheng],
Fu, C.[Chaoyou],
Li, X.[Xiang],
Yang, J.[Jian],
Tai, Y.[Ying],
InstanceCap: Improving Text-to-Video Generation via Instance-aware
Structured Caption,
CVPR25(28974-28983)
IEEE DOI
2508
Training, Computational modeling, Pipelines, Text to video,
instance-aware, structured caption, high-fidelity video generation
BibRef
Huang, Y.Y.[Yu-Yang],
Chen, Y.[Yabo],
Ding, L.[Li],
Zhang, X.P.[Xiao-Peng],
Dai, W.R.[Wen-Rui],
Zou, J.[Junni],
Xiong, H.K.[Hong-Kai],
Tian, Q.[Qi],
IM-Zero: Instance-level Motion Controllable Video Generation in a
Zero-shot Manner,
CVPR25(7265-7275)
IEEE DOI
2508
Shape, Layout, Text to image, Controllability, Trajectory,
Quality assessment, Text to video, Motion control, Videos, zero-shot
BibRef
Wang, Y.P.[Yi-Ping],
He, X.[Xuehai],
Wang, K.[Kuan],
Ma, L.[Luyao],
Yang, J.W.[Jian-Wei],
Wang, S.[Shuohang],
Du, S.S.L.[Simon Shao-Lei],
Shen, Y.[Yelong],
Is Your World Simulator a Good Story Presenter? A Consecutive
Events-Based Benchmark for Future Long Video Generation,
CVPR25(13629-13638)
IEEE DOI Code:
WWW Link.
2508
Measurement, Current measurement, Computational modeling,
Benchmark testing, Atoms, Reliability, Text to video,
multi-event
BibRef
Chen, S.[Shoufa],
Ge, C.J.[Chong-Jian],
Zhang, Y.Q.[Yu-Qi],
Zhang, Y.[Yida],
Zhu, F.[Fengda],
Yang, H.[Hao],
Hao, H.X.[Hong-Xiang],
Wu, H.[Hui],
Lai, Z.C.[Zhi-Chao],
Hu, Y.F.[Yi-Fei],
Lin, T.C.[Ting-Che],
Zhang, S.L.[Shi-Long],
Li, F.[Fu],
Li, C.[Chuan],
Wang, X.[Xing],
Peng, Y.H.[Yang-Hua],
Sun, P.[Peize],
Luo, P.[Ping],
Jiang, Y.[Yi],
Yuan, Z.H.[Ze-Huan],
Peng, B.[Bingyue],
Liu, X.B.[Xia-Bing],
Goku: Flow Based Video Generative Foundation Models,
CVPR25(23516-23527)
IEEE DOI
2508
Training, Visualization, Computational modeling, Pipelines,
Text to image, Transformers, Data models, Text to video,
flow model
BibRef
Wang, H.J.[Hong-Jie],
Ma, C.Y.[Chih-Yao],
Liu, Y.C.[Yen-Cheng],
Hou, J.[Ji],
Xu, T.[Tao],
Wang, J.L.[Jia-Liang],
Juefei-Xu, F.[Felix],
Luo, Y.Q.[Ya-Qiao],
Zhang, P.Z.[Pei-Zhao],
Hou, T.B.[Ting-Bo],
Vajda, P.[Peter],
Jha, N.K.[Niraj K.],
Dai, X.L.[Xiao-Liang],
LinGen: Towards High-Resolution Minute-Length Text-to-Video
Generation with Linear Computational Complexity,
CVPR25(2578-2588)
IEEE DOI Code:
WWW Link.
2508
Correlation, Reviews, Computational modeling,
Graphics processing units, Transformers, Motion pictures, efficiency
BibRef
Gao, B.J.[Bing-Jie],
Gao, X.Y.[Xin-Yu],
Wu, X.X.[Xiao-Xue],
Zhou, Y.J.[Yu-Jie],
Qiao, Y.[Yu],
Niu, L.[Li],
Chen, X.Y.[Xin-Yuan],
Wang, Y.H.[Yao-Hui],
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization
for Text-To-Video Generation,
CVPR25(3173-3183)
IEEE DOI
2508
Training, Vocabulary, Sensitivity, Large language models,
Instruction sets, Refining, Text to video, Optimization,
prompt optimization
BibRef
Sharan, S.P.[S P],
Choi, M.[Minkyu],
Shah, S.[Sahil],
Goel, H.[Harsh],
Omama, M.[Mohammad],
Chinchali, S.[Sandeep],
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal
Verification,
CVPR25(8395-8405)
IEEE DOI
2508
Measurement, Visualization, Computational modeling, Automata,
Benchmark testing, Logic, Text to video, Autonomous vehicles, dataset
BibRef
Sun, K.Y.[Kai-Yue],
Huang, K.Y.[Kai-Yi],
Liu, X.[Xian],
Wu, Y.[Yue],
Xu, Z.[Zihan],
Li, Z.G.[Zhen-Guo],
Liu, X.H.[Xi-Hui],
T2V-CompBench: A Comprehensive Benchmark for Compositional
Text-to-video Generation,
CVPR25(8406-8416)
IEEE DOI
2508
Measurement, Analytical models, Systematics, Correlation, Tracking,
Computational modeling, Large language models, Benchmark testing,
compositional text-to-video generation
BibRef
Yuan, S.H.[Sheng-Hai],
Huang, J.[Jinfa],
He, X.[Xianyi],
Ge, Y.[Yunyang],
Shi, Y.J.[Yu-Jun],
Chen, L.[Liuhan],
Luo, J.B.[Jie-Bo],
Yuan, L.[Li],
Identity-Preserving Text-To-Video Generation by Frequency
Decomposition,
CVPR25(12978-12988)
IEEE DOI
2508
Frequency-domain analysis, Computational modeling, Pipelines,
Optimal control, Transformers, Text to video, Frequency control,
video generation
BibRef
Bu, J.[Jiazi],
Ling, P.Y.[Peng-Yang],
Zhang, P.[Pan],
Wu, T.[Tong],
Dong, X.Y.[Xiao-Yi],
Zang, Y.H.[Yu-Hang],
Cao, Y.H.[Yu-Hang],
Lin, D.[Dahua],
Wang, J.Q.[Jia-Qi],
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality
in a Training-free Way,
CVPR25(12999-13008)
IEEE DOI Code:
WWW Link.
2508
Visualization, Costs, Correlation, Codes, Decoding, High frequency,
Text to video, generative models, text-to-video generation,
video quality enhancement
BibRef
Weng, S.[Shuchen],
Zheng, H.J.[Hao-Jie],
Zhang, P.X.[Pei-Xuan],
Hong, Y.C.[Yu-Chen],
Jiang, H.[Han],
Li, S.[Si],
Shi, B.X.[Bo-Xin],
VIRES: Video Instance Repainting via Sketch and Text Guided
Generation,
CVPR25(28416-28425)
IEEE DOI Code:
WWW Link.
2508
Training, Adaptation models, Visualization, Semantics, Layout,
Transformers, Sampling methods, Text to video, Videos
BibRef
Huang, H.P.[Hsin-Ping],
Su, Y.C.[Yu-Chuan],
Sun, D.Q.[De-Qing],
Jiang, L.[Lu],
Jia, X.H.[Xu-Hui],
Zhu, Y.K.[Yu-Kun],
Yang, M.H.[Ming-Hsuan],
Fine-grained Controllable Video Generation via Object Appearance and
Context,
WACV25(3698-3708)
IEEE DOI
2505
Visualization, Natural languages, Benchmark testing, Transformers,
Controllability, Trajectory, Text to video, Standards,
subject customization
BibRef
Wu, W.J.[Wei-Jia],
Li, Z.[Zhuang],
Gu, Y.C.[Yu-Chao],
Zhao, R.[Rui],
He, Y.F.[Ye-Fei],
Zhang, D.J.H.[David Jun-Hao],
Shou, M.Z.[Mike Zheng],
Li, Y.[Yan],
Gao, T.T.[Ting-Ting],
Zhang, D.[Di],
DragAnything: Motion Control for Anything Using Entity Representation,
ECCV24(XXII: 331-348).
Springer DOI
2412
Project:
WWW Link. in controllable video generation.
BibRef
Chen, X.[Xi],
Liu, Z.H.[Zhi-Heng],
Chen, M.T.[Meng-Ting],
Feng, Y.T.[Yu-Tong],
Liu, Y.[Yu],
Shen, Y.J.[Yu-Jun],
Zhao, H.S.[Heng-Shuang],
Livephoto: Real Image Animation with Text-guided Motion Control,
ECCV24(XVIII: 475-491).
Springer DOI
2412
Project:
WWW Link.
BibRef
Dai, W.X.[Wen-Xun],
Chen, L.H.[Ling-Hao],
Wang, J.B.[Jing-Bo],
Liu, J.P.[Jin-Peng],
Dai, B.[Bo],
Tang, Y.S.[Yan-Song],
Motionlcm: Real-time Controllable Motion Generation via Latent
Consistency Model,
ECCV24(XVI: 390-408).
Springer DOI
2412
BibRef
Huang, Y.M.[Yi-Ming],
Wan, W.L.[Wei-Lin],
Yang, Y.[Yue],
Callison-Burch, C.[Chris],
Yatskar, M.[Mark],
Liu, L.J.[Ling-Jie],
Como: Controllable Motion Generation Through Language Guided Pose Code
Editing,
ECCV24(XXIX: 180-196).
Springer DOI
2412
BibRef
Zou, Q.[Qiran],
Yuan, S.[Shangyuan],
Du, S.[Shian],
Wang, Y.[Yu],
Liu, C.[Chang],
Xu, Y.[Yi],
Chen, J.[Jie],
Ji, X.Y.[Xiang-Yang],
Parco: Part-coordinating Text-to-motion Synthesis,
ECCV24(LVI: 126-143).
Springer DOI
2412
BibRef
Christie, R.[Robert],
Kitchen, B.[Brian],
Tumilowicz, W.[Wiktor],
Hooper, S.[Steffan],
Wünsche, B.C.[Burkhard C.],
Procedurally Generating Large Synthetic Worlds: Chunked Hierarchical
Wave Function Collapse,
IVCNZ24(1-6)
IEEE DOI
2503
Backtracking, Video games, Runtime, Procedural generation,
Heuristic algorithms, Layout, Wave functions, multithreading
BibRef
Kwon, M.[Mingi],
Oh, S.W.[Seoung Wug],
Zhou, Y.[Yang],
Liu, D.[Difan],
Lee, J.Y.[Joon-Young],
Cai, H.R.[Hao-Ran],
Liu, B.[Baqiao],
Liu, F.[Feng],
Uh, Y.J.[Young-Jung],
Harivo: Harnessing Text-to-image Models for Video Generation,
ECCV24(LIII: 19-36).
Springer DOI
2412
BibRef
Wu, R.Q.[Rui-Qi],
Chen, L.[Liangyu],
Yang, T.[Tong],
Guo, C.[Chunle],
Li, C.Y.[Chong-Yi],
Zhang, X.Y.[Xiang-Yu],
LAMP: Learn A Motion Pattern for Few-Shot Video Generation,
CVPR24(7089-7098)
IEEE DOI Code:
WWW Link.
2410
Training, Computational modeling, Pipelines, Text to image,
Diffusion models, Stability analysis, Quality assessment
BibRef
Yang, S.[Shuai],
Zhou, Y.F.[Yi-Fan],
Liu, Z.W.[Zi-Wei],
Loy, C.C.[Chen Change],
Fresco: Spatial-Temporal Correspondence for Zero-Shot Video
Translation,
CVPR24(8703-8712)
IEEE DOI
2410
Training, Visualization, Attention mechanisms, Superresolution,
Text to image, Coherence, diffusion, video-to-video translation,
intra-frame consistency
BibRef
Bahmani, S.[Sherwin],
Liu, X.[Xian],
Wang, Y.F.[Yi-Fan],
Skorokhodov, I.[Ivan],
Rong, V.[Victor],
Liu, Z.W.[Zi-Wei],
Liu, X.H.[Xi-Hui],
Park, J.J.[Jeong Joon],
Tulyakov, S.[Sergey],
Wetzstein, G.[Gordon],
Tagliasacchi, A.[Andrea],
Lindell, D.B.[David B.],
TC4D: Trajectory-Conditioned Text-to-4D Generation,
ECCV24(XLVI: 53-72).
Springer DOI
2412
BibRef
Fan, K.[Ke],
Tang, J.[Junshu],
Cao, W.J.[Wei-Jian],
Yi, R.[Ran],
Li, M.[Moran],
Gong, J.Y.[Jing-Yu],
Zhang, J.N.[Jiang-Ning],
Wang, Y.B.[Ya-Biao],
Wang, C.J.[Cheng-Jie],
Ma, L.Z.[Li-Zhuang],
Freemotion:
A Unified Framework for Number-Free Text-to-Motion Synthesis,
ECCV24(VIII: 93-109).
Springer DOI
2412
BibRef
Oh, G.[Gyeongrok],
Jeong, J.[Jaehwan],
Kim, S.[Sieun],
Byeon, W.[Wonmin],
Kim, J.[Jinkyu],
Kim, S.[Sungwoong],
Kim, S.[Sangpil],
Mevg: Multi-event Video Generation with Text-to-video Models,
ECCV24(XLIII: 401-418).
Springer DOI
2412
BibRef
Girdhar, R.[Rohit],
Singh, M.[Mannat],
Brown, A.[Andrew],
Duval, Q.[Quentin],
Azadi, S.[Samaneh],
Rambhatla, S.S.[Sai Saketh],
Shah, A.[Akbar],
Yin, X.[Xi],
Parikh, D.[Devi],
Misra, I.[Ishan],
Factorizing Text-to-video Generation by Explicit Image Conditioning,
ECCV24(LXII: 205-224).
Springer DOI
2412
BibRef
Materzynska, J.[Joanna],
Sivic, J.[Josef],
Shechtman, E.[Eli],
Torralba, A.[Antonio],
Zhang, R.[Richard],
Russell, B.[Bryan],
Newmove: Customizing Text-to-video Models with Novel Motions,
ACCV24(V: 113-130).
Springer DOI
2412
BibRef
Huang, Z.Q.[Zi-Qi],
He, Y.[Yinan],
Yu, J.[Jiashuo],
Zhang, F.[Fan],
Si, C.Y.[Chen-Yang],
Jiang, Y.M.[Yu-Ming],
Zhang, Y.H.[Yuan-Han],
Wu, T.X.[Tian-Xing],
Jin, Q.Y.[Qing-Yang],
Chanpaisit, N.[Nattapol],
Wang, Y.H.[Yao-Hui],
Chen, X.Y.[Xin-Yuan],
Wang, L.M.[Li-Min],
Lin, D.[Dahua],
Qiao, Y.[Yu],
Liu, Z.W.[Zi-Wei],
VBench: Comprehensive Benchmark Suite for Video Generative Models,
CVPR24(21807-21818)
IEEE DOI
2410
Measurement, Image synthesis, Annotations, Computational modeling,
Benchmark testing, evaluation, human preference
BibRef
Menapace, W.[Willi],
Siarohin, A.[Aliaksandr],
Skorokhodov, I.[Ivan],
Deyneka, E.[Ekaterina],
Chen, T.S.[Tsai-Shien],
Kag, A.[Anil],
Fang, Y.W.[Yu-Wei],
Stoliar, A.[Aleksei],
Ricci, E.[Elisa],
Ren, J.[Jian],
Tulyakov, S.[Sergey],
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
Synthesis,
CVPR24(7038-7048)
IEEE DOI
2410
Training, Visualization, Image coding, Computational modeling,
Scalability, Transformers, video generation,
efficiency
BibRef
Tian, K.B.[Kai-Bin],
Zhao, R.X.[Rui-Xiang],
Xin, Z.J.[Zi-Jie],
Lan, B.X.[Bang-Xiang],
Li, X.R.[Xi-Rong],
Holistic Features are Almost Sufficient for Text-to-Video Retrieval,
CVPR24(17138-17147)
IEEE DOI
2410
Computational modeling, Scalability, Ad hoc networks,
Text to video
BibRef
Qing, Z.W.[Zhi-Wu],
Zhang, S.W.[Shi-Wei],
Wang, J.[Jiayu],
Wang, X.[Xiang],
Wei, Y.J.[Yu-Jie],
Zhang, Y.Y.[Ying-Ya],
Gao, C.X.[Chang-Xin],
Sang, N.[Nong],
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation,
CVPR24(6635-6645)
IEEE DOI
2410
Training, Source coding, Semantics, Spatial coherence, Cognition,
Stability analysis, Complexity theory
BibRef
Wang, X.[Xiang],
Zhang, S.W.[Shi-Wei],
Yuan, H.J.[Hang-Jie],
Qing, Z.W.[Zhi-Wu],
Gong, B.[Biao],
Zhang, Y.Y.[Ying-Ya],
Shen, Y.J.[Yu-Jun],
Gao, C.X.[Chang-Xin],
Sang, N.[Nong],
A Recipe for Scaling up Text-to-Video Generation with Text-free
Videos,
CVPR24(6572-6582)
IEEE DOI
2410
Training, Video on demand, Scalability, Pipelines, Text to image,
Performance gain
BibRef
Kim, T.[Taehoon],
Kang, C.[ChanHee],
Park, J.[JaeHyuk],
Jeong, D.[Daun],
Yang, C.[ChangHee],
Kang, S.J.[Suk-Ju],
Kong, K.[Kyeongbo],
Human Motion Aware Text-to-Video Generation with Explicit Camera
Control,
WACV24(5069-5078)
IEEE DOI Code:
WWW Link.
2404
Knowledge engineering, Codes, Punching, Cameras, Algorithms,
Generative models for image, video, 3D, etc., Algorithms, Biometrics,
Vision + language and/or other modalities
BibRef
Ji, P.L.[Peng-Liang],
Xiao, C.[Chuyang],
Tai, H.L.[Hui-Lin],
Huo, M.X.[Ming-Xiao],
T2VBench: Benchmarking Temporal Dynamics for Text-to-Video Generation,
GenerativeFM24(5325-5335)
IEEE DOI
2410
Measurement, Analytical models, Computational modeling,
Encyclopedias, Benchmark testing, multimodal
BibRef
Godfrey, W.W.[W. Wilfred],
Ratna, A.[Abhinav],
Enhancing the Video Editing Capabilities of Text-to-Video Generators
Using DDPM Inversion,
ICCVMI23(1-5)
IEEE DOI
2403
Visualization, Computational modeling, Refining, Pipelines,
Noise reduction, Transformers, Probabilistic logic,
DDPM Inversion
BibRef
Lee, H.[Hanbit],
Kim, Y.[Youna],
Lee, S.G.[Sang-Goo],
Multi-scale Contrastive Learning for Complex Scene Generation,
WACV23(764-774)
IEEE DOI
2302
Semantics, Generative adversarial networks, Generators,
Data models, Task analysis, image and video synthesis
BibRef
Loeschcke, S.[Sebastian],
Belongie, S.[Serge],
Benaim, S.[Sagie],
Text-driven Stylization of Video Objects,
CVEU22(594-609).
Springer DOI
2304
BibRef
Mazaheri, A.[Amir],
Shah, M.[Mubarak],
Video Generation from Text Employing Latent Path Construction for
Temporal Modeling,
ICPR22(5010-5016)
IEEE DOI
2212
Interpolation, Visualization, Natural languages,
Stacking, Machine learning
BibRef
Lee, S.H.[Seung Hyun],
Oh, G.[Gyeongrok],
Byeon, W.[Wonmin],
Kim, C.[Chanyoung],
Ryoo, W.J.[Won Jeong],
Yoon, S.H.[Sang Ho],
Cho, H.[Hyunjun],
Bae, J.Y.[Jih-Yun],
Kim, J.[Jinkyu],
Kim, S.[Sangpil],
Sound-Guided Semantic Video Generation,
ECCV22(XVII:34-50).
Springer DOI
2211
BibRef
Zhan, F.N.[Fang-Neng],
Zhang, J.H.[Jia-Hui],
Yu, Y.C.[Ying-Chen],
Wu, R.L.[Rong-Liang],
Lu, S.J.[Shi-Jian],
Modulated Contrast for Versatile Image Synthesis,
CVPR22(18259-18269)
IEEE DOI
2210
Photography, Visualization, Codes, Image synthesis, Force,
Performance gain, Image and video synthesis and generation,
Computational photography
BibRef
Ntavelis, E.[Evangelos],
Shahbazi, M.[Mohamad],
Kastanis, I.[Iason],
Timofte, R.[Radu],
Danelljan, M.[Martin],
Van Gool, L.J.[Luc J.],
Arbitrary-Scale Image Synthesis,
CVPR22(11523-11532)
IEEE DOI
2210
Training, Image coding, Image synthesis, Pipelines,
Generative adversarial networks, Encoding,
Image and video synthesis and generation
BibRef
Yang, Z.P.[Zuo-Peng],
Liu, D.Q.[Da-Qing],
Wang, C.Y.[Chao-Yue],
Yang, J.[Jie],
Tao, D.C.[Da-Cheng],
Modeling Image Composition for Complex Scene Generation,
CVPR22(7754-7763)
IEEE DOI
2210
Training, Measurement, Visualization, Image coding, Layout, Genomics,
Predictive models, Image and video synthesis and generation
BibRef
Aldausari, N.[Nuha],
Sowmya, A.[Arcot],
Marcus, N.[Nadine],
Mohammadi, G.[Gelareh],
Cascaded Siamese Self-supervised Audio to Video GAN,
MULA22(4690-4699)
IEEE DOI
2210
Solid modeling, Correlation, Computational modeling
BibRef
Tao, M.[Ming],
Tang, H.[Hao],
Wu, F.[Fei],
Jing, X.Y.[Xiao-Yuan],
Bao, B.K.[Bing-Kun],
Xu, C.S.[Chang-Sheng],
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis,
CVPR22(16494-16504)
IEEE DOI
2210
Visualization, Codes, Semantics,
Generative adversarial networks, Generators, Vision+language,
Image and video synthesis and generation
BibRef
Zhou, Y.F.[Yu-Fan],
Zhang, R.[Ruiyi],
Chen, C.Y.[Chang-You],
Li, C.Y.[Chun-Yuan],
Tensmeyer, C.[Chris],
Yu, T.[Tong],
Gu, J.X.[Jiu-Xiang],
Xu, J.H.[Jin-Hui],
Sun, T.[Tong],
Towards Language-Free Training for Text-to-Image Generation,
CVPR22(17886-17896)
IEEE DOI
2210
Training, Image synthesis, Semantics, Training data, Tail,
Data collection, Data models, Vision+language,
Image and video synthesis and generation
BibRef
Li, Z.H.[Zhi-Heng],
Min, M.R.[Martin Renqiang],
Li, K.[Kai],
Xu, C.L.[Chen-Liang],
StyleT2I: Toward Compositional and High-Fidelity Text-to-Image
Synthesis,
CVPR22(18176-18186)
IEEE DOI
2210
Measurement, Ethics, Image synthesis, Computational modeling,
Semantics, Robustness, Image and video synthesis and generation,
Vision+language
BibRef
Wang, Y.[Yi],
Qi, L.[Lu],
Chen, Y.C.[Ying-Cong],
Zhang, X.Y.[Xiang-Yu],
Jia, J.Y.[Jia-Ya],
Image Synthesis via Semantic Composition,
ICCV21(13729-13738)
IEEE DOI
2203
Correlation, Image synthesis, Convolution, Semantics, Layout,
Benchmark testing, Image and video synthesis,
Neural generative models
BibRef
Xiang, X.Y.[Xiao-Yu],
Liu, D.[Ding],
Yang, X.[Xiao],
Zhu, Y.H.[Yi-Heng],
Shen, X.H.[Xiao-Hui],
Allebach, J.P.[Jan P.],
Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis,
WACV22(944-954)
IEEE DOI
2202
Training, Image color analysis, Training data,
Distortion, Generators, Optimization, Image and Video Synthesis
BibRef
Dorkenwald, M.[Michael],
Milbich, T.[Timo],
Blattmann, A.[Andreas],
Rombach, R.[Robin],
Derpanis, K.G.[Konstantinos G.],
Ommer, B.[Björn],
Stochastic Image-to-Video Synthesis using cINNs,
CVPR21(3741-3752)
IEEE DOI
2111
Neural networks, Stochastic processes,
Process control, Predictive models, Probabilistic logic
BibRef
Mallya, A.[Arun],
Wang, T.C.[Ting-Chun],
Sapra, K.[Karan],
Liu, M.Y.[Ming-Yu],
World-Consistent Video-to-Video Synthesis,
ECCV20(VIII:359-378).
Springer DOI
2011
BibRef
Nawhal, M.[Megha],
Zhai, M.Y.[Meng-Yao],
Lehrmann, A.[Andreas],
Sigal, L.[Leonid],
Mori, G.[Greg],
Generating Videos of Zero-shot Compositions of Actions and Objects,
ECCV20(XII: 382-401).
Springer DOI
2010
BibRef
Chapter on 3-D Object Description and Computation Techniques, Surfaces, Deformable, View Generation, Video Conferencing continues in
Video Diffusion, Video Sysnthesis, Text to Video .